Bridging the gap between Earth System Sciences, Industrial AI, and Scalable Engineering.
I am a Scientific AI Researcher and Data Architect with a unique "Hybrid Profile."
I combine academic rigor in Environmental Systems Modeling (M.Tech) with 5+ years of industrial experience building
Digital Twins, GenAI Architectures, and Big Data Pipelines.
My goal is to leverage Scientific Machine Learning and Cloud Engineering to solve complex physical domain challenges.
Domain: Computer Vision, Data Privacy, GDPR Compliance
Tech: YOLO11 ByteTrack Kafka Docker
An engineered "Zero-Trust" hybrid anonymization pipeline. Unlike standard blurring, this system anonymizes video streams before storage or transmission, ensuring strict GDPR compliance while maintaining zone occupancy analytics.
View Full Repository & Documentation
Domain: Generative AI, Knowledge Retrieval, NLP
Tech: LLMs LangChain ChromaDB Azure OpenAI
A production-grade Text-to-SQL system. It uses Retrieval-Augmented Generation (RAG) to inject database schema context into the LLM, reducing hallucination and allowing users to query complex relational databases using natural language.
View Full Repository & Documentation
Domain: Applied AI, OCR, Automation
Tech: Llama Vision FastAPI React Docker
An AI-powered data entry automation system. Utilizes multimodal LLMs to extract structured data from physical ID cards and business cards with high accuracy, featuring a React frontend for human-in-the-loop validation.
View Full Repository & Documentation
Domain: Data Visualization, Security, Web Engineering
Tech: Apache Superset FastAPI Docker Row-Level Security
A secure embedded analytics architecture. Implemented dynamic Row-Level Security (RLS) via Superset's guest token API to enforce strict multi-tenant data segregation, ensuring users only see data relevant to their role within a single dashboard.
View Full Repository & Documentation
Domain: Cloud Data Engineering, Big Data
Tech: Azure Synapse Cosmos DB Serverless SQL Synapse Link
A real-time analytics architecture utilizing Synapse Link. Optimized querying of massive NYC Taxi datasets using OpenRowSet and data pruning techniques, bridging the gap between operational NoSQL data and analytical SQL pools.
View Full Repository & Documentation
Domain: Big Data, ETL, Lakehouse Architecture
Tech: Azure Databricks PySpark Delta Lake ADF
A scalable Lakehouse pipeline designed for flexibility. Features parameterized notebooks for schema enforcement and automated incremental loading using Delta Lake's upsert capabilities, orchestrated via Azure Data Factory.
View Full Repository & Documentation
Domain: Environmental Science, Statistical Modeling
Tech: Python (Pandas) SARIMA Statistical Smoothing
A domain-specific environmental study. Validated high-frequency air pollution data against public meteorological records using statistical smoothing to accurately model seasonal trends and pollutant variations in New Delhi.
View Full Repository & Documentation
| Credential | Issuer | Verification |
|---|---|---|
| Qualified GATE Exam 2021 (AIR 596) | Indian Institute of Technology (IIT) | View Scorecard |
| Microsoft Certified: Fabric Data Engineer (DP-700) | Microsoft | Verify |
| Microsoft Certified: Azure Data Engineer (DP-203) | Microsoft | Verify |
| Microsoft Certified: Power BI Data Analyst (PL-300) | Microsoft | Verify |
| Fundamentals of GIS | UC Davis (Coursera) | Verify |
| AI For Everyone | DeepLearning.AI (Andrew Ng) | Verify |
| Python for Time Series Data Analysis | Udemy | Verify |
| Azure Databricks & Spark for Data Engineers | Udemy | Verify |
| Databases and SQL for Data Science | IBM | Verify |


