Umang Umang-projects

💭

📙 Aspiring ML Systems & Efficient AI.

🚀 Aspiring ML Systems & Efficient AI | Bridging Theory and Production AI.

Pinned Loading

Triton-Inference-Kernels Triton-Inference-Kernels Public

Custom OpenAI Triton kernels for high-performance models inference. Accelerates models on NVIDIA GPUs by leveraging Triton's productivity and CUDA-level performance.

Python
gpu-systems-playgrund gpu-systems-playgrund Public

GPU Systems playground with cuda kernel expriments and performance profilling.

Cuda 2
Veritas-AI-Tracking-Misinformation-with-Autonomous-Agents Veritas-AI-Tracking-Misinformation-with-Autonomous-Agents Public

Veritas AI: An autonomous agent crew that scrapes prediction markets to create a RAG-powered chatbot for tracking misinformation and public belief in real-time.

Python 1
AI-Action-Item-Extractor-Meeting-Dialogue-to-JSON AI-Action-Item-Extractor-Meeting-Dialogue-to-JSON Public

🤖 AI Action Item Extractor 📝 — transforms meeting dialogues 🔄 into structured JSON tasks 📋; fine‑tunes and compares Mistral‑7B & Phi‑4 using QLoRA ⚡ for top‑tier performance and real‑world applicab…

Python 1
Cuda-Attention-Optimization-journey Cuda-Attention-Optimization-journey Public

How a 3x kernel speedup resulted in a tiny 6% overall gain, and the profiler that revealed why.

Python 1
Hy-LoRA-A-Hybrid-SVD-LoRA-Strategy-for-Efficient-LLM-Adaptation Hy-LoRA-A-Hybrid-SVD-LoRA-Strategy-for-Efficient-LLM-Adaptation Public

Achieve >60% LLM compression with near-baseline perplexity using a novel "Compress-then-Adapt" strategy.

Python 1