Skip to content

About the issue of training time #13

@Toughq

Description

@Toughq

Hello author, thank you for your contribution to the paper. During the process of reproducing your paper, I encountered an issue with the training time being too long. In your paper, you said: BitDistiller completes the process in approximately 3 Hours on a single A100-80G GPU, but during my experiments on Llama3, the results showed that it took thousands of hours. I would like to know where my problem lies and hope to receive your reply.I used two A100-80G GPUs with batch size of 8.

Uploading 1732804561729.png…

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions