Data Pipelines with Airflow

This project teaches Apache Airflow fundamentals by building an end-to-end data pipeline. You'll create custom operators to stage data, load a data warehouse, and run data quality checks.

Project Components

DAG Template: Includes task skeletons and imports; configure task dependencies.
Operators: Four custom operators for staging, fact/dimension loads, and data quality.
Helpers: SQL transformations for ETL operations.

Getting Started

1. Start Airflow

bash docker-compose up -d

Visit http://localhost:8080 (username/password: airflow).

2. Configure Connections

Go to Admin → Connections

Add aws_credentials and redshift

Ensure Redshift cluster is running

3. DAG Execution

Add default_args: no past dependencies, 3 retries every 5 mins, catchup off, no email.

Set task dependencies according to DAG flow.

Run DAG and verify task success.

Operator Implementation

StageToRedshiftOperator: Load JSON files from S3 to Redshift using COPY, supports backfills.

LoadFactOperator & LoadDimensionOperator: Transform and load data using SQL helpers; support append-only and truncate-insert modes.

DataQualityOperator: Run SQL tests; raise exceptions on failures.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
airflow		airflow
README.md		README.md
create_tables.sql		create_tables.sql
final_project.py		final_project.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Pipelines with Airflow

Project Components

Getting Started

1. Start Airflow

2. Configure Connections

3. DAG Execution

Operator Implementation

About

Uh oh!

Languages

rushithareddyyy/Data-Pipeline-Project

Folders and files

Latest commit

History

Repository files navigation

Data Pipelines with Airflow

Project Components

Getting Started

1. Start Airflow

2. Configure Connections

3. DAG Execution

Operator Implementation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages