Plans for block-wise FP8 quantization during training?

Hi TE team,

I'm interested in whether there are plans to implement block-wise quantization for FP8 training, similar to what's described in papers like "Deepseek V3".

Block quantization could potentially provide better numerical stability and accuracy compared to tensor-wide quantization, especially for outlier values. This could be particularly valuable for large language models where maintaining precision is crucial.

Some specific questions:
1. Is this feature currently on your roadmap?
2. If yes, what's the approximate timeline?
3. If no, are there technical challenges preventing this implementation?

Thank you for your time!



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plans for block-wise FP8 quantization during training? #1411

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Plans for block-wise FP8 quantization during training? #1411

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions