An enterprise-grade Data Engineering project transforming raw NYC Taxi data into actionable insights via a modern stack: Airflow, dbt, PostgreSQL, FastAPI, Power BI, and Slack.
- Project Overview
- Architecture & Data Modeling
- Business Intelligence (Dashboards)
- Orchestration (Airflow)
- Data Products (API)
- Observability & Alerting
- Performance & Optimization
- Installation
- Contact
This project simulates a real-world data platform for a Taxi company. It ingests high-volume trip data, cleanses it, models it into a Star Schema, and serves it to different stakeholders (Executives, Operations, Finance) via Dashboards and APIs.
Key Features:
- ELT Pipeline: Ingestion of raw CSVs into Bronze/Silver/Gold layers using dbt and Postgres.
- Data Quality: Automated testing and "Revenue at Risk" calculation to detect anomalies (negative fares, time travel).
- Microservice API: A standalone FastAPI container serving Gold data to external apps.
- Observability: Slack alerting for data quality breaches.
The project follows the Medallion Architecture (Bronze -> Silver -> Gold).
I transformed the data into a rigorous dimensional model optimized for BI performance.
To handle millions of rows efficiently in Power BI, specific Data Marts aggregate views were created with dbt.
The "_Key Measures" table was created in powerbi to gather the measures created with DAX code.

The final product is a comprehensive Power BI Report (.pbit) containing 4 specialized views.
Focus: Year-over-Year growth, Total Revenue, and High-level trends.

Focus: Filled map, Borough-to-Borough flow, and RPM (Revenue Per Minute) optimization.

Focus: Payment methods adoption (Cash vs Card), Tipping behavior, and Fare buckets.

Focus: Pipeline health, Invalid records tracking, and Revenue at Risk ($).

Feature Highlight: Tooltips allow users to hover over data points for granular details. It works only in the first dashboard, in the line chart.
The entire pipeline is orchestrated via Astro CLI (Airflow).
Handles the end-to-end flow: dbt run (Bronze/Silver/Gold), dbt test, and data freshness checks.

Separate DAG to manage static data to optimize runtime.
Beyond dashboards, this project exposes a REST API for application developers. The API runs in an isolated Docker container but communicates with the same Data Warehouse.
- Endpoint:
/metrics/daily(Supports date filtering) - Architecture: Dockerized FastAPI service networked with Postgres.
I implemented a Reverse ETL logic to proactively notify the team when Data Quality degrades. If the Revenue at Risk exceeds a threshold (e.g., $10k), a Slack alert is triggered automatically.
Alerting DAG
Slack Alert Message

I optimized the pipeline architecture by decoupling static data processing from the daily workflow.
Initially, the DAG was monolithic, rebuilding all Dimensions and Facts on every run.
Strategy: I extracted static dimensions into a separate DAG (static_dimensions_dag) that runs only on-demand, leaving the main pipeline to process only new incoming trip data.
- Docker & Docker Compose
- Astro CLI
- Power BI Desktop (to view
.pbit)
-
Clone the repository
git clone [https://github.com/ChahiriAbderrahmane/modern-data-stack-nyc-taxi.git)
-
Start the Data Platform (Airflow + Postgres)
astro dev start
-
Start the API Microservice
docker compose -f docker-compose-api.yml up --build
-
Access the Interfaces
- Airflow:
http://localhost:8080 - FastAPI Docs:
http://localhost:8000/docs - Power BI: Open
assets/nyc_project_dashboard.pbit
- Airflow:







