-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
Hello author, thank you for your contribution to the paper. During the process of reproducing your paper, I encountered an issue with the training time being too long. In your paper, you said: BitDistiller completes the process in approximately 3 Hours on a single A100-80G GPU, but during my experiments on Llama3, the results showed that it took thousands of hours. I would like to know where my problem lies and hope to receive your reply.I used two A100-80G GPUs with batch size of 8.
Metadata
Metadata
Assignees
Labels
No labels