Stocks Data ETL & Forecast Pipeline on Google Cloud Platform

Overview

This project sets up an end-to-end data pipeline to extract, load, and transform stocks data for MAANG companies (Meta, Amazon, Apple, Netflix, Google), as well as forecast the weighted volume average stock price in the Google Cloud Platform infrastructure. The pipeline consists of fetching data from Polygon.io API with Cloud Function, publishing it to Pub/Sub, storing it in Cloud Storage, transforming and loading it into BigQuery table. The final stage of the pipeline is the forecast with Auto ARIMA model using Vertex Ai and uploading forecasted data to BigQuery.

Components

Cloud Function 1:

get_stocks_api.py: A script fetching stocks data for MAANG companies in JSON format using Polygon.io API.
main.py: A script publishing the JSON data into a Pub/Sub topic.

Cloud Function 2:

preprocess_messages.py: A script to tranform data from Cloud Storage into dataframe with defined schema.
main.py: A script responsible for uploading preprocessed data into a new and historical BigQuery tables.

PubSub Topic: stocks-function-trigger

A PubSub Topic that triggers Cloud Function 1.

PubSub Topic: stocks-data

A PubSub Topic that accepts JSON messages from Cloud Function 1.

Cloud Scheduler:

Responsible for sending a message into PubSub topic "stocks-function-trigger" every midnight.

Cloud Storage: stocks-historical-data

Cloud Storage bucket that stores all data passed from PubSub topic "stocks-data".

VertexAi Notebook

Jupyter Notebook trained with Auto ARIMA model on the historical data (2 years data + daily dump from API). Uploads the forecasted data to the corresponding BigQuery table.

BigQuery

Data warehouse to store daily, historical, and forecasted tables.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Cloud Function 1		Cloud Function 1
Cloud Function 2		Cloud Function 2
README.md		README.md
Stocks_Dashboard.pdf		Stocks_Dashboard.pdf
VertexAi_Forecast_Notebook.ipynb		VertexAi_Forecast_Notebook.ipynb
pipeline_design.png		pipeline_design.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stocks Data ETL & Forecast Pipeline on Google Cloud Platform

Overview

Components

Cloud Function 1:

Cloud Function 2:

PubSub Topic: stocks-function-trigger

PubSub Topic: stocks-data

Cloud Scheduler:

Cloud Storage: stocks-historical-data

VertexAi Notebook

BigQuery

About

Uh oh!

Releases

Packages

Uh oh!

Languages

timtimer11/GCP-Stocks-Processing-and-Forecast

Folders and files

Latest commit

History

Repository files navigation

Stocks Data ETL & Forecast Pipeline on Google Cloud Platform

Overview

Components

Cloud Function 1:

Cloud Function 2:

PubSub Topic: stocks-function-trigger

PubSub Topic: stocks-data

Cloud Scheduler:

Cloud Storage: stocks-historical-data

VertexAi Notebook

BigQuery

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages