Support mbridge distillation for any_model by danielkorzekwa · Pull Request #904 · NVIDIA/Model-Optimizer

danielkorzekwa · 2026-02-18T18:26:06Z

What does this PR do?

hf_to_mcore mbridge converter (examples for llama and qwen models)
distillation script

- Add distill_anymodel.py: Knowledge distillation script for AnyModel checkpoints - Add import_anymodel_to_mbridge.py: Import script to convert HF AnyModel to MBridge format - Update base.py: Simplify HeterogeneousBridgeMixin for AnyModel support

- Add __init__.py: Module initialization - Add llama.py: Llama bridge implementation - Add qwen3.py: Qwen3 bridge implementation

coderabbitai · 2026-02-18T18:26:18Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (3)

main
release/.*
feature/.*

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dkorzekwa/any_model_mbridge_distillation

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

kevalmorabia97 · 2026-02-20T12:35:56Z

modelopt/torch/puzzletron/export/mbridge/base.py

+        OmegaConf conversion tries to access per_block_parameters which may not
+        be initialized when loading from YAML. Return empty list as fallback.
+        """
+        if name == "per_block_parameters":


Should this also check if that attr is not set then only return [] else return whatever is set?

kevalmorabia97 · 2026-02-20T12:47:10Z

modelopt/torch/puzzletron/export/mbridge/base.py

Do you think at some point we should just upstream this in Megatron-Bridge repo since its a standard Megatron feature without anything to do with model optimization?

kevalmorabia97 · 2026-02-20T12:49:02Z

examples/puzzletron/mbridge_distillation/README.md

+export WORKSPACE=/path/to/your/project
+```
+
+1. **Clone Megatron-Bridge:**


Megatron-Bridge is already cloned in the container at /opt/Megatron-Bridge. Why dont we just do following inside the container: cd /opt/Megatron-Bridge && git checkout 960a718cb8989676b258e107d538642717e22e39?

kevalmorabia97 · 2026-02-20T12:49:44Z

examples/puzzletron/mbridge_distillation/README.md

+   git submodule update
+   ```
+
+3. **Start Docker container with mounts:**


If these same steps work for 26.02, can we use that instead?

kevalmorabia97 · 2026-02-20T12:50:32Z

examples/puzzletron/mbridge_distillation/README.md

+   ```bash
+   docker run --gpus all -it --rm \
+     -v $WORKSPACE:/workspace \
+     -v $WORKSPACE/Megatron-Bridge/3rdparty/Megatron-LM:/opt/megatron-lm \


Fyi, for 26.02, Megatron-LM is at /opt/Megatron-Bridge/3rdparty/Megatron-LM

kevalmorabia97 · 2026-02-20T13:07:26Z