AMD stock analysis using AWS Kinesis Data Analytics Studio
This data contains historical stock price data for AMD from 2021-2022, organized in a csv format. The dataset includes the following columns:
- Date: The date of the recorded stock price.
- Open: The opening price of the stock on the given date.
- High: The highest price the stock reached during the trading day.
- Low: The lowest price the stock reached during the trading day.
- Close: The closing price of the stock on the given date.
- Adj Close: The adjusted closing price, which takes into account factors like dividends and stock splits.
- Volume: The trading volume (number of shares traded) for the stock on the given date.
- Ignore: A column for internal use or data handling, which can be disregarded for analysis.
To illustrate the usage of this dataset, consider the following query:
SELECT Date, Open, Close
FROM stock_data
WHERE Date BETWEEN '2021-01-01' AND '2021-06-30'This query retrieves the 'Date', 'Open', and 'Close' columns from the 'stock_data' table for records falling within the first half of the year 2021.
This guide presents a comprehensive workflow for analyzing streaming data using AWS services, including AWS Glue, Lambda, and Apache Flink. The workflow revolves around the following components:
-
Data Source: Utilize the dataset named
AMDprices2021-2022.csvas your streaming data source. -
Data Ingestion with Lambda: Employ the script
lambda_function.pyto ingest data from the dataset into an Amazon Kinesis Data Stream. -
Analysis with Flink: Utilize the Jupyter notebook
flink_notebook.ipynbalong with the accompanying notebook history filenotebook.zplnfor conducting real-time analysis with Apache Flink. -
Queries and Results: Apply SQL queries using the files
query1CMGR.sqlfor calculating the Compounded Monthly Growth Rate andquery2EMA.sqlfor the Exponential Moving Average. The results, including visualizations, are presented inquery_result.pdf.
Before embarking on the workflow, ensure you have the following prerequisites:
- An active AWS Account with access to AWS services.
- The dataset
AMDprices2021-2022.csvcontaining your streaming data. - Familiarity with AWS Lambda for data ingestion.
- Understanding of AWS Glue for ETL operations.
- Basic knowledge of Apache Flink for real-time analysis.
- Knowledge of Jupyter notebooks for interactive analysis.
-
Data Source: Your streaming data originates from the
AMDprices2021-2022.csvdataset. -
Lambda Data Ingestion: Implement the
lambda_function.pyscript to send data from the dataset to an Amazon Kinesis Data Stream for seamless ingestion. -
Interactive Analysis: Engage with the Jupyter notebook
flink_notebook.ipynbto interactively analyze the streaming data using Apache Flink. The notebook history is captured innotebook.zpln. -
SQL Queries and Results: Apply SQL queries stored in
query1CMGR.sqlandquery2EMA.sql. View the comprehensive query results, including visualizations, in thequery_result.pdfdocument.
This comprehensive workflow showcases the seamless integration of AWS Glue, Lambda, and Apache Flink to process, analyze, and visualize streaming data. With this approach, you can efficiently extract valuable insights from your data source, ultimately enabling informed decision-making.
For detailed code implementation, interactive notebook demonstrations, and further resources, refer to the documentation and files included in this repository. Feel free to customize the workflow to match your specific streaming data analysis requirements and objectives.