Skip to content

keshavdalmia10/databricks_hackathon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

databricks_hackathon

RideOps Cancellation Command Center - A complete data pipeline and ML analytics solution for ride-sharing operations.

🎯 Project Overview

This project provides:

  • 🔄 Bronze → Silver → Gold data pipeline (Delta Lake)
  • 🤖 ML Model for cancellation prediction
  • 📊 Lakeview Dashboard for real-time monitoring
  • 🚀 100% Programmatic Deployment (no UI required)

Getting started

  1. Install UV: https://docs.astral.sh/uv/getting-started/installation/

  2. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html

  3. Authenticate to your Databricks workspace, if you have not done so already:

    $ databricks configure
    
  4. To deploy a development copy of this project, type:

    $ databricks bundle deploy --target dev
    

    (Note that "dev" is the default target, so the --target parameter is optional here.)

    This deploys everything that's defined for this project. For example, the default template would deploy a job called [dev yourname] databricks_hackathon_job to your workspace. You can find that job by opening your workpace and clicking on Workflows.

  5. Similarly, to deploy a production copy, type:

    $ databricks bundle deploy --target prod
    

    Note that the default job from the template has a schedule that runs every day (defined in resources/databricks_hackathon.job.yml). The schedule is paused when deploying in development mode (see https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).

  6. To run a job or pipeline, use the "run" command:

    $ databricks bundle run
    
  7. Optionally, install the Databricks extension for Visual Studio code for local development from https://docs.databricks.com/dev-tools/vscode-ext.html. It can configure your virtual environment and setup Databricks Connect for running unit tests locally. When not using these tools, consult your development environment's documentation and/or the documentation for Databricks Connect for manually setting up your environment (https://docs.databricks.com/en/dev-tools/databricks-connect/python/index.html).

  8. For documentation on the Databricks asset bundles format used for this project, and for CI/CD configuration, see https://docs.databricks.com/dev-tools/bundles/index.html.

📊 Lakeview Dashboard Deployment

Deploy the Cancellation Command Center dashboard programmatically:

Quick Deploy (30 seconds)

# Install SDK
pip install databricks-sdk

# Set credentials
export DATABRICKS_HOST="https://adb-2580806725893634.14.azuredatabricks.net"
export DATABRICKS_TOKEN="your_token_here"

# Deploy dashboard
python deploy_lakeview_dashboard.py

What You Get

  • 10 interactive widgets across 2 pages
  • Real-time metrics: Total bookings, cancellation rate, revenue at risk
  • ML insights: High-risk zones, peak hours analysis
  • Visualizations: Heatmaps, trends, performance tables

Documentation

  • 📖 Quick Start: See QUICK_DEPLOY.md
  • 📚 Full Guide: See LAKEVIEW_DEPLOYMENT_GUIDE.md
  • Setup Summary: See DASHBOARD_SETUP_COMPLETE.md

Alternative Deployment Methods

  1. Python Script (recommended) - Auto-detects warehouse
  2. Databricks Bundle - Infrastructure as code
  3. REST API - Custom automation
  4. Terraform - Multi-cloud IaC

See LAKEVIEW_DEPLOYMENT_GUIDE.md for detailed instructions.

📁 Project Structure

databricks_hackathon/
├── notebooks/                  # Data pipeline notebooks
│   ├── 00_setup.py            # Initial setup
│   ├── 01_bronze_ingestion.py # Raw data ingestion
│   ├── 02_silver_transformation.py # Data cleaning
│   ├── 03_aggregate_gold.py   # Feature engineering
│   └── 04_train_ml_model.py   # ML model training
├── resources/                  # Dashboard & configs
│   ├── lakeview_dashboard.lvdash.json  # Dashboard definition
│   ├── databricks_hackathon.dashboard.yml # Bundle config
│   ├── lakeview_dashboard.sql # SQL-based setup
│   └── *.yml                  # Job/pipeline configs
├── deploy_lakeview_dashboard.py  # Deployment script
└── docs/
    ├── QUICK_DEPLOY.md
    ├── LAKEVIEW_DEPLOYMENT_GUIDE.md
    └── DASHBOARD_SETUP_COMPLETE.md

🔄 Data Pipeline

  1. Bronze: Raw CSV ingestion
  2. Silver: Data cleaning, normalization, feature engineering
  3. Gold: Aggregated metrics for analytics
  4. ML: Cancellation prediction model

🎯 Key Features

  • ✅ Medallion architecture (Bronze/Silver/Gold)
  • ✅ Real-time dashboard with 10 widgets
  • ✅ ML model for cancellation prediction
  • ✅ Programmatic deployment (no UI)
  • ✅ Databricks Asset Bundle integration
  • ✅ Complete documentation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published