About the issue of training time

Hello author, thank you for your contribution to the paper. During the process of reproducing your paper, I encountered an issue with the training time being too long. In your paper, you said: BitDistiller completes the process in approximately 3 Hours on a single A100-80G GPU, but during my experiments on Llama3, the results showed that it took thousands of hours. I would like to know where my problem lies and hope to receive your reply.I used two A100-80G GPUs with batch size of 8.

![Uploading 1732804561729.png…]()



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the issue of training time #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

About the issue of training time #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions