Spotify-Data-Pipeline-with-AWS

This project extracts, transforms, and loads Spotify playlist data into AWS using a serverless architecture with AWS Lambda, Amazon S3, AWS Glue, and Amazon Athena. It showcases a full ETL pipeline built with Python, Spotipy (Spotify API), and AWS services.

Architecture Diagram

Overview

Source: Spotify API
Destination: Amazon S3, AWS Glue Catalog, Amazon Athena
Trigger: AWS CloudWatch (daily)
Processing: AWS Lambda (Python functions)
Data Output: CSVs for Albums, Artists, and Songs

ETL

1. Extract

CloudWatch Event Rule triggers the Lambda function daily.
Lambda Function (Extraction): spotify_api_data_extract.py
Authenticates with the Spotify API using Spotipy.
Downloads playlist data and stores it as JSON in an S3 bucket under raw_data/to_processed/.

2. Transform

Trigger: S3 object creation triggers another Lambda function.
Lambda Function (Transformation): spotify_data_transformation.py
Reads raw JSON files from S3.
Extracts and transforms structured albums, artists, and songs data.
Stores transformed data as CSV files in:
- transformed_data/album_data/
- transformed_data/artist_data/
- transformed_data/songs_data/
Moves processed raw data to raw_data/processed/.

3. Load

AWS Glue Crawler:
- Crawls the transformed_data/ folders.
- Infers schema and updates the Glue Data Catalog.
AWS Glue Data Catalog:
- Creates structured tables from the transformed CSVs:
  - album_data
  - artist_data
  - songs_data

Amazon Athena:

Runs SQL queries on the structured datasets for analytics.

Example:

SELECT artist_name, COUNT(*) AS song_count
FROM songs_data
GROUP BY artist_name
ORDER BY song_count DESC
LIMIT 10;

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
ETL.png		ETL.png
README.md		README.md
spotify_api_data_extract.py		spotify_api_data_extract.py
spotify_data_transformation.py		spotify_data_transformation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spotify-Data-Pipeline-with-AWS

Architecture Diagram

Overview

ETL

1. Extract

2. Transform

3. Load

About

Uh oh!

Releases

Packages

Languages

vishwaraj14/Spotify-Data-Pipeline-with-AWS

Folders and files

Latest commit

History

Repository files navigation

Spotify-Data-Pipeline-with-AWS

Architecture Diagram

Overview

ETL

1. Extract

2. Transform

3. Load

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages