Skip to content
Change the repository type filter

All

    Repositories list

    • [HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.
      C++
      77710Updated Dec 18, 2025Dec 18, 2025
    • [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
      Python
      17132100Updated May 16, 2024May 16, 2024