databricks_hackathon

RideOps Cancellation Command Center - A complete data pipeline and ML analytics solution for ride-sharing operations.

🎯 Project Overview

This project provides:

🔄 Bronze → Silver → Gold data pipeline (Delta Lake)
🤖 ML Model for cancellation prediction
📊 Lakeview Dashboard for real-time monitoring
🚀 100% Programmatic Deployment (no UI required)

Getting started

Install UV: https://docs.astral.sh/uv/getting-started/installation/
Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html
Authenticate to your Databricks workspace, if you have not done so already:
```
$ databricks configure
```
To deploy a development copy of this project, type:
```
$ databricks bundle deploy --target dev
```
(Note that "dev" is the default target, so the --target parameter is optional here.)

This deploys everything that's defined for this project. For example, the default template would deploy a job called [dev yourname] databricks_hackathon_job to your workspace. You can find that job by opening your workpace and clicking on Workflows.
Similarly, to deploy a production copy, type:
```
$ databricks bundle deploy --target prod
```
Note that the default job from the template has a schedule that runs every day (defined in resources/databricks_hackathon.job.yml). The schedule is paused when deploying in development mode (see https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).
To run a job or pipeline, use the "run" command:
```
$ databricks bundle run
```
Optionally, install the Databricks extension for Visual Studio code for local development from https://docs.databricks.com/dev-tools/vscode-ext.html. It can configure your virtual environment and setup Databricks Connect for running unit tests locally. When not using these tools, consult your development environment's documentation and/or the documentation for Databricks Connect for manually setting up your environment (https://docs.databricks.com/en/dev-tools/databricks-connect/python/index.html).
For documentation on the Databricks asset bundles format used for this project, and for CI/CD configuration, see https://docs.databricks.com/dev-tools/bundles/index.html.

📊 Lakeview Dashboard Deployment

Deploy the Cancellation Command Center dashboard programmatically:

Quick Deploy (30 seconds)

# Install SDK
pip install databricks-sdk

# Set credentials
export DATABRICKS_HOST="https://adb-2580806725893634.14.azuredatabricks.net"
export DATABRICKS_TOKEN="your_token_here"

# Deploy dashboard
python deploy_lakeview_dashboard.py

What You Get

10 interactive widgets across 2 pages
Real-time metrics: Total bookings, cancellation rate, revenue at risk
ML insights: High-risk zones, peak hours analysis
Visualizations: Heatmaps, trends, performance tables

Documentation

📖 Quick Start: See QUICK_DEPLOY.md
📚 Full Guide: See LAKEVIEW_DEPLOYMENT_GUIDE.md
✅ Setup Summary: See DASHBOARD_SETUP_COMPLETE.md

Alternative Deployment Methods

Python Script (recommended) - Auto-detects warehouse
Databricks Bundle - Infrastructure as code
REST API - Custom automation
Terraform - Multi-cloud IaC

See LAKEVIEW_DEPLOYMENT_GUIDE.md for detailed instructions.

📁 Project Structure

databricks_hackathon/
├── notebooks/                  # Data pipeline notebooks
│   ├── 00_setup.py            # Initial setup
│   ├── 01_bronze_ingestion.py # Raw data ingestion
│   ├── 02_silver_transformation.py # Data cleaning
│   ├── 03_aggregate_gold.py   # Feature engineering
│   └── 04_train_ml_model.py   # ML model training
├── resources/                  # Dashboard & configs
│   ├── lakeview_dashboard.lvdash.json  # Dashboard definition
│   ├── databricks_hackathon.dashboard.yml # Bundle config
│   ├── lakeview_dashboard.sql # SQL-based setup
│   └── *.yml                  # Job/pipeline configs
├── deploy_lakeview_dashboard.py  # Deployment script
└── docs/
    ├── QUICK_DEPLOY.md
    ├── LAKEVIEW_DEPLOYMENT_GUIDE.md
    └── DASHBOARD_SETUP_COMPLETE.md

🔄 Data Pipeline

Bronze: Raw CSV ingestion
Silver: Data cleaning, normalization, feature engineering
Gold: Aggregated metrics for analytics
ML: Cancellation prediction model

🎯 Key Features

✅ Medallion architecture (Bronze/Silver/Gold)
✅ Real-time dashboard with 10 widgets
✅ ML model for cancellation prediction
✅ Programmatic deployment (no UI)
✅ Databricks Asset Bundle integration
✅ Complete documentation

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
fixtures		fixtures
notebooks		notebooks
resources		resources
scratch		scratch
src		src
tests		tests
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
BUILD_PLAN.md		BUILD_PLAN.md
DASHBOARD_QUERIES.md		DASHBOARD_QUERIES.md
DASHBOARD_SETUP_COMPLETE.md		DASHBOARD_SETUP_COMPLETE.md
DEMO_SCRIPT.md		DEMO_SCRIPT.md
IMPLEMENTATION_COMPLETE.md		IMPLEMENTATION_COMPLETE.md
LAKEVIEW_DEPLOYMENT_GUIDE.md		LAKEVIEW_DEPLOYMENT_GUIDE.md
QUICK_DEPLOY.md		QUICK_DEPLOY.md
QUICK_START.md		QUICK_START.md
README.md		README.md
SCHEMA_MAPPING.md		SCHEMA_MAPPING.md
SOLUTION_SUMMARY.md		SOLUTION_SUMMARY.md
databricks.yml		databricks.yml
deploy_lakeview_dashboard.py		deploy_lakeview_dashboard.py
pyproject.toml		pyproject.toml
upload_csv.py		upload_csv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

databricks_hackathon

🎯 Project Overview

Getting started

📊 Lakeview Dashboard Deployment

Quick Deploy (30 seconds)

What You Get

Documentation

Alternative Deployment Methods

📁 Project Structure

🔄 Data Pipeline

🎯 Key Features

About

Uh oh!

Releases

Packages

Languages

keshavdalmia10/databricks_hackathon

Folders and files

Latest commit

History

Repository files navigation

databricks_hackathon

🎯 Project Overview

Getting started

📊 Lakeview Dashboard Deployment

Quick Deploy (30 seconds)

What You Get

Documentation

Alternative Deployment Methods

📁 Project Structure

🔄 Data Pipeline

🎯 Key Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages