-
Notifications
You must be signed in to change notification settings - Fork 135
add nemo_bridge #1050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add nemo_bridge #1050
Conversation
| #Load the HF model from config | ||
| config_load = args.hf_config_path | ||
| config = safe_load_config_with_retry(config_load, trust_remote_code=False) | ||
| bridge = AutoBridge.from_hf_config(config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this save-ckpt step allocate extra GPU memory when initializing an HF model?
| bridge.load_hf_weights(ddp_model) | ||
| # no optimizer weight | ||
| iteration=0 | ||
| num_floating_point_operations_so_far=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add print_rank_0 here
| # use megatron bridge | ||
| from megatron.nemo_bridge.models import AutoBridge | ||
| bridge=AutoBridge.from_hf_pretrained(load_dir) | ||
| bridge.load_hf_weights(ddp_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can nemo-bridge’s load_hf_model handle a ddp_model directly, where ddp_model is wrapped by DistributedDataParallel?
| @@ -0,0 +1,8 @@ | |||
| # Copyright (c) 2025, BAAI. All rights reserved. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nemo megatron-bridge supports pip install for usage, ref https://pypi.org/project/megatron-bridge/
please remove source codes
| @@ -0,0 +1,8 @@ | |||
| # Copyright (c) 2025, BAAI. All rights reserved. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename flagscale/train/megatron/nemo_bridge to flagscale/train/megatron/bridge so that it matches the import pattern from megatron.bridge
tengqm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When copy pasting source code from other repos, we are supposed/obliged to copy paste their copyright notice as well. We cannot claim copyrights for these code.
The original code has following copyright header to be preserved:
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.| @@ -0,0 +1,110 @@ | |||
| # Copyright (c) 2025, BAAI. All rights reserved. | |||
| # | |||
| # Copied from: https://github.com/NVIDIA-NeMo/Megatron-Bridge | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Megatron-Bridge has a copyright claim, we are supposed to paste their copyright statements here.
…gScale into add_nemo_bridge
Reconstruct the Nemo-Bridge based on the restructured flagscale version. Currently, flagscale has supported some functions of nemo-bridge, enabling the flagscale framework to load and save ckpt in the hf format during the training process. Additionally, in the current version, new features have been added, allowing for the setting of the number of iterations for saving hf weights based on the save_hf_interval. The model has verified that Deepseek V3 16_a3B, Qwen3-32B, and Qwen3-0.6B all have correct accuracy.