Update for transformers 5.0 #1436

pstjohn · 2026-01-27T00:00:09Z

Currently blocked by huggingface/transformers#43510

Summary by CodeRabbit

New Features
- Added checkpoint loading validation test.
- Enhanced distributed testing infrastructure with improved port configuration.
Improvements
- Removed upper-bound version constraints on the transformer library for broader compatibility.
- Improved weight initialization and checkpoint state management for TransformerEngine-optimized models.

copy-pr-bot · 2026-01-27T00:00:14Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-01-27T00:00:21Z

📝 Walkthrough

Walkthrough

This pull request relaxes transformers version constraints across multiple projects, introduces weight initialization and state management improvements for TransformerEngine-backed models, updates weight tying declarations from tuples to typed class variable mappings, and adds TCP port fixture infrastructure to distributed tests for rendezvous configuration.

Changes

Cohort / File(s)	Summary
Transformers dependency relaxation `.devcontainer/recipes/requirements.txt`, `bionemo-recipes/models/amplify/pyproject.toml`, `bionemo-recipes/models/esm2/pyproject.toml`, `bionemo-recipes/models/llama3/requirements.txt`, `bionemo-recipes/recipes/esm2_accelerate_te/requirements.txt`, `bionemo-recipes/recipes/esm2_native_te/requirements.txt`, `bionemo-recipes/recipes/esm2_peft_te/requirements.txt`, `bionemo-recipes/recipes/geneformer_native_te_mfsdp_fp8/requirements.txt`, `bionemo-recipes/recipes/llama3_native_te/requirements.txt`	Removes upper bound constraint on transformers dependency, changing from `transformers<5.0` to `transformers` across all dependency files.
ESM2 weight initialization and state management `bionemo-recipes/models/esm2/src/esm/modeling_esm_te.py`	Introduces `_init_weights()` method to skip TransformerEngine modules, adds `state_dict()` override to filter TE-specific `_extra_state` entries, updates `get_init_context()` signature to accept dtype parameter, changes `_tied_weights_keys` from tuple to `ClassVar[dict[str, str]]` mapping, and removes explicit `init_weights()` call from initialization sequence.
ESM2 model conversion `bionemo-recipes/models/esm2/src/esm/convert.py`	Replaces `tie_weights()` with `post_init()` as final post-processing step in ESM TE to HF model conversion.
ESM2 test infrastructure `bionemo-recipes/models/esm2/tests/conftest.py`, `bionemo-recipes/models/esm2/tests/test_convert.py`	Adds `unused_tcp_port` fixture for acquiring ephemeral ports; introduces new test for checkpoint loading verification.
ESM2 distributed tests `bionemo-recipes/models/esm2/tests/test_distributed_fp8.py`, `bionemo-recipes/models/esm2/tests/test_distributed_strategies.py`	Updates test signatures to accept `unused_tcp_port` parameter, adds rendezvous backend/endpoint configuration to torchrun commands, refactors FP8 extra state collection from state_dict keys to direct module inspection, adds cross-rank FP8 extra state synchronization validation, and includes seed initialization for reproducibility.
Llama3 weight initialization and state management `bionemo-recipes/models/llama3/modeling_llama_te.py`, `bionemo-recipes/recipes/llama3_native_te/modeling_llama_te.py`	Adds `state_dict()` override to filter TE-specific `_extra_state` keys, implements `get_init_context()` classmethod for empty context initialization, updates `_tied_weights_keys` from tuple to `ClassVar[dict[str, str]]` mapping, and adjusts typing imports to include ClassVar.
Llama3 test infrastructure `bionemo-recipes/models/llama3/tests/conftest.py`, `bionemo-recipes/models/llama3/tests/test_cp_bshd.py`, `bionemo-recipes/models/llama3/tests/test_cp_thd.py`, `bionemo-recipes/models/llama3/tests/test_meta_device_init.py`	Adds `unused_tcp_port` fixture, updates test signatures to accept fixture parameter, configures rendezvous backend/endpoint in torchrun commands, and fixes model loading to use local NVEsmForMaskedLM import instead of remote AutoModel loading.
Recipe example files `bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py`, `bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py`, `bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py`	Introduces identical weight initialization, state management, and weight tying declaration patterns as in main model files, including `_init_weights()`, `state_dict()` overrides, updated `get_init_context()` signatures, and `_tied_weights_keys` ClassVar dict mappings.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hopping through transformers, we loosen the reins,
TransformerEngine now flows through our veins,
Weight tying maps grow, state dicts now clean,
Port-finding sockets ensure tests convene,
With dtype in context, our models take flight—
A refactored foundation, now sparkling bright! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The pull request description is incomplete. It only contains a single line referencing a blocked upstream PR, lacking any of the required template sections such as Description, Usage, Type of changes, CI Pipeline Configuration, or Pre-submit Checklist.	Complete the pull request description using the provided template. Include detailed description of changes, usage examples, type of changes, CI configuration labels, and pre-submit checklist items.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Update for transformers 5.0' clearly and concisely describes the main objective of the pull request - updating the codebase for transformers v5.0 compatibility.
Docstring Coverage	✅ Passed	Docstring coverage is 84.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

pstjohn · 2026-02-02T15:44:14Z

@coderabbitai review

coderabbitai · 2026-02-02T15:44:21Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

bionemo-recipes/models/esm2/src/esm/convert.py (1)
122-135: ⚠️ Potential issue | 🟡 Minor

Remove redundant post_init() call; apply_transforms() already calls tie_weights().

Testing confirms the current code is safe—weight values are preserved correctly through the post_init() call (test_convert.py validates roundtrip conversion with strict tolerances). However, apply_transforms() already invokes tie_weights() before returning, making the subsequent post_init() call redundant. Also note that convert_esm_hf_to_te() does not call post_init(), creating an inconsistency.
Suggested fix
-    output_model.post_init()
bionemo-recipes/models/esm2/tests/conftest.py (1)
16-43: ⚠️ Potential issue | 🟡 Minor

Add a second blank line after the import block.

Ruff/isort will flag a single blank line before the module-level TRITON env block.
🧹 Suggested fix
-from esm.convert import convert_esm_hf_to_te
-
-# Fix Triton UTF-8 decoding issue by setting CUDA library path
+from esm.convert import convert_esm_hf_to_te
+
+
+# Fix Triton UTF-8 decoding issue by setting CUDA library path
As per coding guidelines, Follow import sorting configuration as per isort with 2 lines after imports.

🧹 Nitpick comments (3)

bionemo-recipes/models/amplify/pyproject.toml (1)
21-21: Confirm whether AMPLIFY should stay capped at Transformers <5.0.

If AMPLIFY is now v5-compatible, aligning this constraint (e.g., transformers>=5.0,<6) would prevent users from staying on v4. If not, the TODO is fine but this will keep the package incompatible with v5 installs.
💡 Optional alignment with v5 support
-    "transformers<5.0",                  # TODO(BIO-143): update AMPLIFY to support Transformers v5
+    "transformers>=5.0,<6",              # TODO(BIO-143): update AMPLIFY to support Transformers v5
bionemo-recipes/recipes/geneformer_native_te_mfsdp_fp8/requirements.txt (1)
9-9: Add an explicit v5 lower bound to avoid silently running v4.

An unconstrained transformers can be satisfied by an already-installed v4, which defeats the v5 migration and risks runtime mismatches. Consider transformers>=5.0,<6 to lock the supported major.
♻️ Suggested constraint
-transformers
+transformers>=5.0,<6
bionemo-recipes/recipes/esm2_native_te/requirements.txt (1)
11-11: Consider bounding transformers to the v5 major line.

Leaving this unbounded can allow v6+ upgrades with breaking changes. A constrained range still permits v5 updates while preventing accidental major bumps.
🔧 Suggested constraint
-transformers
+transformers>=5.0,<6.0

pstjohn force-pushed the pstjohn/bio-128-update-recipes-to-transformers-v5 branch from 6b22f73 to d9c65e7 Compare February 2, 2026 15:06

Update to transformers v5

af11872

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

pstjohn force-pushed the pstjohn/bio-128-update-recipes-to-transformers-v5 branch from 20b3b6f to af11872 Compare February 2, 2026 15:43

pstjohn marked this pull request as ready for review February 2, 2026 15:44

pstjohn requested review from cspades, dorotat-nv, jomitchellnv, jstjohn, jwilber, savitha-eng and trvachov as code owners February 2, 2026 15:44

coderabbitai bot reviewed Feb 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update for transformers 5.0 #1436

Update for transformers 5.0 #1436

pstjohn commented Jan 27, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Jan 27, 2026

Uh oh!

coderabbitai bot commented Jan 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

pstjohn commented Feb 2, 2026

Uh oh!

coderabbitai bot commented Feb 2, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Update for transformers 5.0 #1436

Are you sure you want to change the base?

Update for transformers 5.0 #1436

Conversation

pstjohn commented Jan 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Jan 27, 2026

Uh oh!

coderabbitai bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

pstjohn commented Feb 2, 2026

Uh oh!

coderabbitai bot commented Feb 2, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pstjohn commented Jan 27, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 27, 2026 •

edited

Loading