Skip to content

End-to-end retail data engineering pipeline using Azure Databricks, Delta Lake, and Power BI for analytics.

Notifications You must be signed in to change notification settings

MadhurDwivedi/RevStack

Repository files navigation

🚀 RevStack

Retail Sales Analytics using Azure Data Lake, Databricks & Power BI

CapstoneProjectReport

📌 Project Overview

RevStack is an end-to-end Retail Data Engineering Capstone Project designed to build a scalable analytics pipeline for sales, inventory, and customer insights.

The project leverages Azure Data Lake Gen2, Databricks (Delta Lake & PySpark), SQL, and Power BI to transform raw retail data into business-ready analytics with historical tracking and real-time alerts.


🎯 Problem Statement

  • To design a modern data platform that enables:
    • 📈 Sales trend analysis across regions and product categories
    • 📦 Inventory monitoring and stock-out detection
    • 🕒 Historical tracking of customer and product changes (SCD Type 2)
    • 🏆 Identification of top-performing customers and products

🧱 Architecture Overview

Medallion Architecture (Bronze → Silver → Gold)

      Raw CSV Data (ADLS Bronze)
              ↓
      Cleaned & Typed Delta Tables (Silver)
              ↓
      Business Logic + SCD2 + Aggregations (Gold)
              ↓
      Power BI Dashboards

This architecture ensures scalability, data quality, governance, and analytics readiness.


🗂️ Data Sources (Bronze Layer)

  • Raw CSV files stored in Azure Data Lake Gen2:
    • Customers
    • Products
    • Inventory
    • Sales
    • Regions
  • 🔗 Dataset Link: Dataset

🧹 Data Cleansing – Silver Layer

  • Transformations Performed:
    • Schema enforcement (data type casting)
    • Null handling & duplicate removal
    • Date formatting
    • Conversion to Delta Tables
  • Silver Tables:
    • dim_customer
    • dim_product
    • dim_inventory
    • dim_region
    • fact_sales

🏗️ Data Modeling – Gold Layer

  • Gold Tables Created:
    • dim_customer_scd2 – Customer historical tracking
    • dim_product_scd2 – Product change history
    • fact_sales – Transactional sales data
    • inventory_alerts – Low-stock monitoring

🔁 Slowly Changing Dimension (SCD Type 2)

  • Tracks historical changes in customer and product attributes
  • Maintains:
    • start_date
    • end_date
    • is_current
  • Supports Delta Time Travel for versioned data analysis

🚨 Inventory Alerts Logic

  • Business Rule: - IF StockQuantity < MinThreshold → Raise Alert
    • Alerts stored as Delta table in Gold layer
    • Enables near real-time stock-out visibility

📊 Power BI Dashboard

  • Key Dashboards & Insights:

    📈 Sales Trends

    • Monthly sales by Region & Category
    • Filters: Product, Region, Quarter

    📦 Inventory Levels

    • StockQuantity vs Product
    • Highlighted low-stock alerts

    🕒 Historical Customers (SCD2)

    • Active & historical customer records
    • Change tracking over time

    🏆 Top Performers

    • Top customers by lifetime sales
    • Best-selling products

    🔁 All visuals are fully interactive and filter-driven


👉 Power BI Report (View Only):

For a detailed report and interactive analysis, refer to the RevStack


🛠️ Tech Stack

Component Technology
Storage Azure Data Lake Gen2
Processing Azure Databricks
Framework PySpark, Delta Lake
Data Modeling Medallion Architecture
Governance Unity Catalog
Analytics SQL
Visualization Power BI

✅ Key Outcomes

  • ✔ End-to-end retail data pipeline implemented
  • ✔ SCD Type 2 historical tracking enabled
  • ✔ Real-time inventory alerting logic
  • ✔ Curated Gold layer for BI reporting
  • ✔ Enterprise-grade Power BI dashboard

🧠 Conclusion

RevStack demonstrates a real-world data engineering architecture for retail analytics, combining scalable pipelines, historical tracking, and actionable business insights — ready for future enhancements like ML forecasting or automated alerts.

About

End-to-end retail data engineering pipeline using Azure Databricks, Delta Lake, and Power BI for analytics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published