About Llama-X and Alpaca repo

Hi, may I know why the hyperparameters of the training command in `Llama-x (this repo)` and [Alpaca](https://github.com/tatsu-lab/stanford_alpaca) are different. Eg., the `batch size 128 vs. 512 (64*8)`, the `warmup steps 0.03 (ratio) vs. 2`.
Which hyperparameter should we adopt?

Another question is what is the Llama-i (7B) in the `Llama-X Evaluation` section? And the `GSM8K` result is `18.8%` while my own LLAMA-X model (using the hyperparamters in this repo) is only `10%`. Not sure why the gap is so large. Would you mind sharing your evaluation script on `GSM8K` in Llama-X? Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Llama-X and Alpaca repo #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

About Llama-X and Alpaca repo #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions