RevStack is an end-to-end Retail Data Engineering Capstone Project designed to build a scalable analytics pipeline for sales, inventory, and customer insights.
The project leverages Azure Data Lake Gen2, Databricks (Delta Lake & PySpark), SQL, and Power BI to transform raw retail data into business-ready analytics with historical tracking and real-time alerts.
- To design a modern data platform that enables:
- 📈 Sales trend analysis across regions and product categories
- 📦 Inventory monitoring and stock-out detection
- 🕒 Historical tracking of customer and product changes (SCD Type 2)
- 🏆 Identification of top-performing customers and products
Raw CSV Data (ADLS Bronze)
↓
Cleaned & Typed Delta Tables (Silver)
↓
Business Logic + SCD2 + Aggregations (Gold)
↓
Power BI Dashboards
This architecture ensures scalability, data quality, governance, and analytics readiness.
- Raw CSV files stored in Azure Data Lake Gen2:
- Customers
- Products
- Inventory
- Sales
- Regions
- 🔗 Dataset Link: Dataset
- Transformations Performed:
- Schema enforcement (data type casting)
- Null handling & duplicate removal
- Date formatting
- Conversion to Delta Tables
- Silver Tables:
- dim_customer
- dim_product
- dim_inventory
- dim_region
- fact_sales
- Gold Tables Created:
- dim_customer_scd2 – Customer historical tracking
- dim_product_scd2 – Product change history
- fact_sales – Transactional sales data
- inventory_alerts – Low-stock monitoring
- Tracks historical changes in customer and product attributes
- Maintains:
- start_date
- end_date
- is_current
- Supports Delta Time Travel for versioned data analysis
- Business Rule:
- IF StockQuantity < MinThreshold → Raise Alert
- Alerts stored as Delta table in Gold layer
- Enables near real-time stock-out visibility
-
Key Dashboards & Insights:
- Monthly sales by Region & Category
- Filters: Product, Region, Quarter
- StockQuantity vs Product
- Highlighted low-stock alerts
- Active & historical customer records
- Change tracking over time
- Top customers by lifetime sales
- Best-selling products
For a detailed report and interactive analysis, refer to the RevStack
| Component | Technology |
|---|---|
| Storage | Azure Data Lake Gen2 |
| Processing | Azure Databricks |
| Framework | PySpark, Delta Lake |
| Data Modeling | Medallion Architecture |
| Governance | Unity Catalog |
| Analytics | SQL |
| Visualization | Power BI |
- ✔ End-to-end retail data pipeline implemented
- ✔ SCD Type 2 historical tracking enabled
- ✔ Real-time inventory alerting logic
- ✔ Curated Gold layer for BI reporting
- ✔ Enterprise-grade Power BI dashboard
RevStack demonstrates a real-world data engineering architecture for retail analytics, combining scalable pipelines, historical tracking, and actionable business insights — ready for future enhancements like ML forecasting or automated alerts.
