- Donβt Pass@k: A Bayesian Framework for Large Language Model Evaluation
- 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
- Quantize What Counts: More for Keys, Less for Values
- π Pronouns: Vi/Vim



