MLA-ViT introduces Multi-Head Latent Attention (MLA) to Vision Transformers, reducing time and memory usage during both training and inference while maintaining comparable accuracy.
-
Updated
Apr 13, 2025 - Jupyter Notebook
MLA-ViT introduces Multi-Head Latent Attention (MLA) to Vision Transformers, reducing time and memory usage during both training and inference while maintaining comparable accuracy.
A code deep-dive on one of the key innovations from Deepseek - Multihead Latent Attention (MLA)
A compact, single-GPU optimized version of DeepSeek-V3, trained on FineWebEDU for research and experimentation.
Add a description, image, and links to the multihead-latent-attention topic page so that developers can more easily learn about it.
To associate your repository with the multihead-latent-attention topic, visit your repo's landing page and select "manage topics."