Skip to content

End-to-end mini data platform with SQL analytics, FastAPI, AWS Lambda, Terraform, and CI/CD.

Notifications You must be signed in to change notification settings

Shyqwq5/Data-engineering-mini-platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Mini Platform (SQL + FastAPI + AWS + Terraform)

An end-to-end mini project demonstrating core data engineering and cloud deployment skills:

  • SQL analytics on a relational toy database
  • A fully tested FastAPI service
  • An AWS Lambda pipeline that fetches quotes and writes JSON to S3
  • Infrastructure-as-Code with Terraform
  • CI/CD via GitHub Actions (tests + Terraform deployment on main)

Originally forked from a course repository and extended independently. All implementations, refactors, tests, and CI/CD work in this repository are my own.


Project Overview

This repository contains three parts:

1) 1-data — SQL Challenges

SQL queries to answer business-style questions such as:

  • user purchase counts (including users with zero purchases)
  • top products by total sales value
  • top spender in a specific month (with handling of edge-case records)

2) 2-cloud — AWS Lambda + S3 (Terraform)

A scheduled AWS Lambda function that:

  • fetches 3 random quotes from an external API
  • writes the results as JSON into an S3 bucket
  • runs on a schedule (CloudWatch Event Rule / EventBridge)

Infrastructure is provisioned with Terraform (S3 buckets, IAM roles/policies, Lambda, schedule rule).

3) 3-server — FastAPI Service (TDD)

A simple REST API served locally:

  • GET /healthcheck returns a 200 with a JSON message
  • GET /doughnuts/info returns doughnut data from a local JSON file Supports optional filtering:
    • max_calories (int)
    • allow_nuts (bool)

If no records match the filters, the API returns 200 with an empty doughnuts array.


Tech Stack

  • Python, pytest
  • FastAPI
  • SQL (psql)
  • AWS (Lambda, S3, IAM, EventBridge/CloudWatch rule)
  • Terraform
  • GitHub Actions

CI/CD

GitHub Actions workflow:

  • On Pull Requests: install dependencies and run tests for 2-cloud and 3-server
  • On push to main: run Terraform init/plan/apply to deploy infrastructure

AWS credentials are provided via GitHub repository secrets.


How to Run Locally

3-server (FastAPI)

cd 3-server
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pytest
uvicorn api.main:app --reload

Configuration (GitHub Actions)

To enable automated Terraform deployments via GitHub Actions, the following repository secrets must be configured:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY

These credentials are used only at runtime by GitHub Actions and are never committed to the repository.

Note: For security reasons, secrets are not available to workflows triggered from forked pull requests.

About

End-to-end mini data platform with SQL analytics, FastAPI, AWS Lambda, Terraform, and CI/CD.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 12

Languages