Pinned Loading
-
Triton-Inference-Kernels
Triton-Inference-Kernels PublicCustom OpenAI Triton kernels for high-performance models inference. Accelerates models on NVIDIA GPUs by leveraging Triton's productivity and CUDA-level performance.
Python
-
gpu-systems-playgrund
gpu-systems-playgrund PublicGPU Systems playground with cuda kernel expriments and performance profilling.
Cuda 2
-
Veritas-AI-Tracking-Misinformation-with-Autonomous-Agents
Veritas-AI-Tracking-Misinformation-with-Autonomous-Agents PublicVeritas AI: An autonomous agent crew that scrapes prediction markets to create a RAG-powered chatbot for tracking misinformation and public belief in real-time.
Python 1
-
AI-Action-Item-Extractor-Meeting-Dialogue-to-JSON
AI-Action-Item-Extractor-Meeting-Dialogue-to-JSON Public🤖 AI Action Item Extractor 📝 — transforms meeting dialogues 🔄 into structured JSON tasks 📋; fine‑tunes and compares Mistral‑7B & Phi‑4 using QLoRA ⚡ for top‑tier performance and real‑world applicab…
Python 1
-
Cuda-Attention-Optimization-journey
Cuda-Attention-Optimization-journey PublicHow a 3x kernel speedup resulted in a tiny 6% overall gain, and the profiler that revealed why.
Python 1
-
Hy-LoRA-A-Hybrid-SVD-LoRA-Strategy-for-Efficient-LLM-Adaptation
Hy-LoRA-A-Hybrid-SVD-LoRA-Strategy-for-Efficient-LLM-Adaptation PublicAchieve >60% LLM compression with near-baseline perplexity using a novel "Compress-then-Adapt" strategy.
Python 1
If the problem persists, check the GitHub status page or contact support.