🚀 Comprehensive Data Warehousing & Analytics Solution
This project demonstrates a complete end-to-end data warehouse implementation — from raw data ingestion to actionable business insights. It follows industry best practices in Data Engineering, ETL, Data-Cleansing, Data Modeling, and Analytics.
- Project Overview
- Project Images
- Architecture
- Folder Structure
- Layers and Data Flow
- ETL Process
- Quality Checks
- Stored Procedures
- Gold Layer Views
- Usage
- Business Insights
- Future Enhancements
- License
Purpose: Build a modern data warehouse using Medallion Architecture for CRM & ERP systems.
Highlights:
- 🗄 Data Architecture – Bronze, Silver, and Gold layers.
- ⚙ ETL Pipelines – Extract, Transform, Load from multiple source systems.
- 📐 Data Modeling – Build fact and dimension tables for analytical queries.
- 📊 Analytics & Reporting – SQL-based insights for business decisions.
This repository contains scripts for Bronze, Silver, and Gold layers, along with quality checks and ETL loading procedures.
- 📂 Datasets: CSV files from CRM and ERP sources.
- 🗄 SQL Server Express: Database hosting for the warehouse.
- 🖥 SSMS: SQL Server Management Studio for querying and management.
- 🛠 Git & GitHub: Version control and collaboration.
- 📐 Draw.io: ER diagrams and data flow visualization.
- 🗒 Notion: Project planning, templates, and documentation.
Bronze Layer (Raw Data)
|
v
Silver Layer (Cleansed & Enriched)
|
v
Gold Layer (Dimensions + Fact Tables / Star Schema)
|
v
Analytics & Reporting
C:.
├───datasets
│ ├───source_crm
│ └───source_erp
├───docs
├───draw_io_diagrams
├───ETL_applied
├───project_planning_notion
├───quality_tests
├───sql_scripts
│ ├───bronze
│ ├───silver
│ └───gold
└───understading_data
- datasets/source_crm & source_erp: Raw CSV/Excel files from CRM and ERP systems
- docs: Documentation files, project briefs
- draw_io_diagrams: ER diagrams and data flow diagrams
- ETL_applied: Scripts or notebooks demonstrating ETL transformations
- project_planning_notion: Project planning and timelines
- quality_tests: SQL scripts and reports for data quality checks
- sql_scripts: All SQL scripts for Bronze, Silver, and Gold layers
- understading_data: Notes, observations, or data profiling results
-
Purpose: Raw ingestion from CRM and ERP systems
-
Tables:
crm_cust_info– Customer detailscrm_prd_info– Product informationcrm_sales_details– Sales transactionserp_cust_az12– ERP customer mastererp_loc_a101– Location mastererp_px_cat_g1v2– Product category master
-
Data Characteristics: May contain duplicates, nulls, inconsistent formats
-
Purpose: Cleansed and standardized dataset
-
Actions Performed:
- Remove duplicates and null primary keys
- Trim unwanted spaces
- Standardize marital status and gender
- Handle invalid dates and negative values
- Map ERP codes to descriptive values
-
Tables: Mirrors Bronze tables with transformed data
-
Purpose: Analytics-ready star schema
-
Views:
dim_customers– Enriched customer dimensiondim_products– Product dimension with category mappingfact_sales– Sales fact table linking customers and products
-
Features:
- Surrogate keys for analytics
- Filter out historical or invalid records
- Consolidated facts ready for reporting
-
Extraction: Read raw data from Bronze tables
-
Transformation:
- Deduplicate using
ROW_NUMBER() - Normalize string fields (trim spaces, standardize codes)
- Handle nulls, missing values, invalid dates
- Map codes to human-readable values
- Deduplicate using
-
Load: Populate Silver tables using stored procedures
-
Gold Layer: Generate analytical views using joins and transformations
- Bronze Layer: Validate primary keys, detect spaces, check date ranges, standardize codes
- Silver Layer: Confirm uniqueness, validate calculations, ensure field consistency
- Gold Layer: Confirm dimensional integrity, validate fact tables, filter historical/invalid data
-
Truncates Silver tables before loading
-
Loads data from Bronze with transformations:
crm_cust_info→ Cleanses and normalizes customer datacrm_prd_info→ Maps product lines, calculates end datescrm_sales_details→ Recalculates sales, derives missing priceserp_cust_az12→ Standardizes gender, filters future birthdateserp_loc_a101→ Maps country codeserp_px_cat_g1v2→ Loads product categories
-
Includes timing logs for monitoring ETL performance
-
Handles errors using
TRY...CATCH
| View Name | Description |
|---|---|
dim_customers |
Consolidated customer data from CRM + ERP + location |
dim_products |
Product dimension with category and line mapping |
fact_sales |
Sales fact table linking products and customers |
- Create Bronze tables and load raw data
- Run Silver ETL:
EXEC silver.load_silver;- Query Gold views for analytics:
SELECT * FROM gold.dim_customers;
SELECT * FROM gold.dim_products;
SELECT * FROM gold.fact_sales;- Run quality checks periodically
- 👥 Customer Behavior: Purchasing patterns & segmentation
- 📦 Product Performance: Bestsellers, low performers, and trends
- 📈 Sales Trends: Time-based revenue and growth insights
- Automate ETL scheduling using SQL Agent
- Implement incremental loads for Silver and Gold layers
- Add additional dimensions (Time, Location)
- Create Gold layer physical tables for faster BI performance
- Add data validation reports to monitor KPIs
🛡 Licensed under the MIT License — you are free to use, modify, and share with proper attribution.
Author: theBappy
Date: 2025-12-01
Project: Data Warehouse with MSSQL Server
