Skip to content

Conversation

@youth123
Copy link

@youth123 youth123 commented Jan 26, 2026

PR Category
Train

PR Types
New Features

PR Description

  • Supports loading and saving checkpoints in nemo zarr format
  • Supports train packed seqs
  • Fix the issue where wandb finalization cannot find the latest_checkpointed_iteration file
  • Fix lora can not support layernorm weight load & not support nemo zarr

The checkpoint file format is as follows:
load zarr format:
-context
-weights
-module.decoder.xxx._extra_state
-module.decoder.xxx.weight
-optimizer.state.fp32_param.xxx.weight
-optimizer.state.fp32_param.xxx.weight.sync
common.pt
meatadata.json

save zarr format:
-iter_xxx
-module.decoder.xxx._extra_state
-module.decoder.xxx.weight
-optimizer.state.fp32_param.xxx.weight
-optimizer.state.fp32_param.xxx.weight.sync
common.pt
meatadata.json
latest_checkpointed_iteration.txt

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


liji seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants