GPT-OSS-20B Pretraining#862
Conversation
address review from previous PR Credit co-authors for prior squash Co-authored-by: ZixianWangAMD <zixiwang@amd.com> Co-authored-by: Michal Marcinkiewicz <michalm@nvidia.com> Co-authored-by: Lukasz Pierscieniewski <l.pierscieniewski@gmail.com>
disable async save and save intermediate checkpoint
…arget log perplexity to be 3.3 for consistency purposes
This reverts commit fed1bb4.
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
|
@mmarcinkiewicz can you please review this? |
|
It seems the datadir needs to be writeable (presumably to store the index) - can we put index into a different dir so the datadir stays RO? |
Add option to run with SLURM
pbaumstarck
left a comment
There was a problem hiding this comment.
Looking good overall and I got the code running. Another minor comment that we don't have any binary whl files in the repo, so it'd be ideal if we could dynamically retrieve and install that.
|
now I'm looking into it - where is the mlperf logging being done? I see |
Updated README.md to clarify evaluation metrics and training parameters.
ShriyaRishab
left a comment
There was a problem hiding this comment.
Approved in the task force meeting
This PR provides the reference code for GPT-OSS-20B using Primus framework that can be run on both AMD and NVIDIA hardware.