Skip to content

Conversation

@pstjohn
Copy link
Collaborator

@pstjohn pstjohn commented Jan 27, 2026

Currently blocked by huggingface/transformers#43510

Summary by CodeRabbit

  • New Features

    • Added checkpoint loading validation test.
    • Enhanced distributed testing infrastructure with improved port configuration.
  • Improvements

    • Removed upper-bound version constraints on the transformer library for broader compatibility.
    • Improved weight initialization and checkpoint state management for TransformerEngine-optimized models.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 27, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 27, 2026

📝 Walkthrough

Walkthrough

This pull request relaxes transformers version constraints across multiple projects, introduces weight initialization and state management improvements for TransformerEngine-backed models, updates weight tying declarations from tuples to typed class variable mappings, and adds TCP port fixture infrastructure to distributed tests for rendezvous configuration.

Changes

Cohort / File(s) Summary
Transformers dependency relaxation
.devcontainer/recipes/requirements.txt, bionemo-recipes/models/amplify/pyproject.toml, bionemo-recipes/models/esm2/pyproject.toml, bionemo-recipes/models/llama3/requirements.txt, bionemo-recipes/recipes/esm2_accelerate_te/requirements.txt, bionemo-recipes/recipes/esm2_native_te/requirements.txt, bionemo-recipes/recipes/esm2_peft_te/requirements.txt, bionemo-recipes/recipes/geneformer_native_te_mfsdp_fp8/requirements.txt, bionemo-recipes/recipes/llama3_native_te/requirements.txt
Removes upper bound constraint on transformers dependency, changing from transformers<5.0 to transformers across all dependency files.
ESM2 weight initialization and state management
bionemo-recipes/models/esm2/src/esm/modeling_esm_te.py
Introduces _init_weights() method to skip TransformerEngine modules, adds state_dict() override to filter TE-specific _extra_state entries, updates get_init_context() signature to accept dtype parameter, changes _tied_weights_keys from tuple to ClassVar[dict[str, str]] mapping, and removes explicit init_weights() call from initialization sequence.
ESM2 model conversion
bionemo-recipes/models/esm2/src/esm/convert.py
Replaces tie_weights() with post_init() as final post-processing step in ESM TE to HF model conversion.
ESM2 test infrastructure
bionemo-recipes/models/esm2/tests/conftest.py, bionemo-recipes/models/esm2/tests/test_convert.py
Adds unused_tcp_port fixture for acquiring ephemeral ports; introduces new test for checkpoint loading verification.
ESM2 distributed tests
bionemo-recipes/models/esm2/tests/test_distributed_fp8.py, bionemo-recipes/models/esm2/tests/test_distributed_strategies.py
Updates test signatures to accept unused_tcp_port parameter, adds rendezvous backend/endpoint configuration to torchrun commands, refactors FP8 extra state collection from state_dict keys to direct module inspection, adds cross-rank FP8 extra state synchronization validation, and includes seed initialization for reproducibility.
Llama3 weight initialization and state management
bionemo-recipes/models/llama3/modeling_llama_te.py, bionemo-recipes/recipes/llama3_native_te/modeling_llama_te.py
Adds state_dict() override to filter TE-specific _extra_state keys, implements get_init_context() classmethod for empty context initialization, updates _tied_weights_keys from tuple to ClassVar[dict[str, str]] mapping, and adjusts typing imports to include ClassVar.
Llama3 test infrastructure
bionemo-recipes/models/llama3/tests/conftest.py, bionemo-recipes/models/llama3/tests/test_cp_bshd.py, bionemo-recipes/models/llama3/tests/test_cp_thd.py, bionemo-recipes/models/llama3/tests/test_meta_device_init.py
Adds unused_tcp_port fixture, updates test signatures to accept fixture parameter, configures rendezvous backend/endpoint in torchrun commands, and fixes model loading to use local NVEsmForMaskedLM import instead of remote AutoModel loading.
Recipe example files
bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py, bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py, bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py
Introduces identical weight initialization, state management, and weight tying declaration patterns as in main model files, including _init_weights(), state_dict() overrides, updated get_init_context() signatures, and _tied_weights_keys ClassVar dict mappings.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hopping through transformers, we loosen the reins,
TransformerEngine now flows through our veins,
Weight tying maps grow, state dicts now clean,
Port-finding sockets ensure tests convene,
With dtype in context, our models take flight—
A refactored foundation, now sparkling bright!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is incomplete. It only contains a single line referencing a blocked upstream PR, lacking any of the required template sections such as Description, Usage, Type of changes, CI Pipeline Configuration, or Pre-submit Checklist. Complete the pull request description using the provided template. Include detailed description of changes, usage examples, type of changes, CI configuration labels, and pre-submit checklist items.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Update for transformers 5.0' clearly and concisely describes the main objective of the pull request - updating the codebase for transformers v5.0 compatibility.
Docstring Coverage ✅ Passed Docstring coverage is 84.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pstjohn pstjohn force-pushed the pstjohn/bio-128-update-recipes-to-transformers-v5 branch from 6b22f73 to d9c65e7 Compare February 2, 2026 15:06
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
@pstjohn pstjohn force-pushed the pstjohn/bio-128-update-recipes-to-transformers-v5 branch from 20b3b6f to af11872 Compare February 2, 2026 15:43
@pstjohn pstjohn marked this pull request as ready for review February 2, 2026 15:44
@pstjohn
Copy link
Collaborator Author

pstjohn commented Feb 2, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 2, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
bionemo-recipes/models/esm2/src/esm/convert.py (1)

122-135: ⚠️ Potential issue | 🟡 Minor

Remove redundant post_init() call; apply_transforms() already calls tie_weights().

Testing confirms the current code is safe—weight values are preserved correctly through the post_init() call (test_convert.py validates roundtrip conversion with strict tolerances). However, apply_transforms() already invokes tie_weights() before returning, making the subsequent post_init() call redundant. Also note that convert_esm_hf_to_te() does not call post_init(), creating an inconsistency.

Suggested fix
-    output_model.post_init()
bionemo-recipes/models/esm2/tests/conftest.py (1)

16-43: ⚠️ Potential issue | 🟡 Minor

Add a second blank line after the import block.

Ruff/isort will flag a single blank line before the module-level TRITON env block.

🧹 Suggested fix
-from esm.convert import convert_esm_hf_to_te
-
-# Fix Triton UTF-8 decoding issue by setting CUDA library path
+from esm.convert import convert_esm_hf_to_te
+
+
+# Fix Triton UTF-8 decoding issue by setting CUDA library path

As per coding guidelines, Follow import sorting configuration as per isort with 2 lines after imports.

🧹 Nitpick comments (3)
bionemo-recipes/models/amplify/pyproject.toml (1)

21-21: Confirm whether AMPLIFY should stay capped at Transformers <5.0.

If AMPLIFY is now v5-compatible, aligning this constraint (e.g., transformers>=5.0,<6) would prevent users from staying on v4. If not, the TODO is fine but this will keep the package incompatible with v5 installs.

💡 Optional alignment with v5 support
-    "transformers<5.0",                  # TODO(BIO-143): update AMPLIFY to support Transformers v5
+    "transformers>=5.0,<6",              # TODO(BIO-143): update AMPLIFY to support Transformers v5
bionemo-recipes/recipes/geneformer_native_te_mfsdp_fp8/requirements.txt (1)

9-9: Add an explicit v5 lower bound to avoid silently running v4.

An unconstrained transformers can be satisfied by an already-installed v4, which defeats the v5 migration and risks runtime mismatches. Consider transformers>=5.0,<6 to lock the supported major.

♻️ Suggested constraint
-transformers
+transformers>=5.0,<6
bionemo-recipes/recipes/esm2_native_te/requirements.txt (1)

11-11: Consider bounding transformers to the v5 major line.

Leaving this unbounded can allow v6+ upgrades with breaking changes. A constrained range still permits v5 updates while preventing accidental major bumps.

🔧 Suggested constraint
-transformers
+transformers>=5.0,<6.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant