This project sets up an end-to-end data pipeline to extract, load, and transform stocks data for MAANG companies (Meta, Amazon, Apple, Netflix, Google), as well as forecast the weighted volume average stock price in the Google Cloud Platform infrastructure. The pipeline consists of fetching data from Polygon.io API with Cloud Function, publishing it to Pub/Sub, storing it in Cloud Storage, transforming and loading it into BigQuery table. The final stage of the pipeline is the forecast with Auto ARIMA model using Vertex Ai and uploading forecasted data to BigQuery.
- get_stocks_api.py: A script fetching stocks data for MAANG companies in JSON format using Polygon.io API.
- main.py: A script publishing the JSON data into a Pub/Sub topic.
- preprocess_messages.py: A script to tranform data from Cloud Storage into dataframe with defined schema.
- main.py: A script responsible for uploading preprocessed data into a new and historical BigQuery tables.
A PubSub Topic that triggers Cloud Function 1.
A PubSub Topic that accepts JSON messages from Cloud Function 1.
Responsible for sending a message into PubSub topic "stocks-function-trigger" every midnight.
Cloud Storage bucket that stores all data passed from PubSub topic "stocks-data".
Jupyter Notebook trained with Auto ARIMA model on the historical data (2 years data + daily dump from API). Uploads the forecasted data to the corresponding BigQuery table.
Data warehouse to store daily, historical, and forecasted tables.
