[Draft] Integrate TE permutation by tdophung · Pull Request #1 · tdophung/maxtext

tdophung · 2026-01-21T19:36:42Z

Description

This change add an option to use TE's implementation of the permutation operation for the moe layer.

Current permutation operation in MaxText is done as taking an array of input tokens and multiply that to a sparse array of routing map. Then this is sharded in multiple different parallelization schemes.

TE's implementation should not be a multiplication of sparse arrays, but rather constructing a (n, 2E + 1) row_id_map, and read-write operations to write the input tokens in the correct index within each expert, base on the constructed row_id_map. Current TE's implementation will allow for FSDP and EP sharding.

FIXES NVIDIA/TransformerEngine#2585

Tests

TBD

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

…p ; [docs/reference/api.rst] Placeholder for generated API docs ; [pyproject.toml] Uncomment docs deps ; [docs/guides.md] Remove redundant extensions ; [src/MaxText/inference_mlperf/offline_mode.py] Switch from `flags` to `argparse` to overcome duplicate flag error; [.github/workflows/check_docs_build.yml] Use `uv` to resolve max recursive dependency resolution issue ; [src/MaxText/inference_mlperf/matmul/matmul_dtypes.py] Hide under `if __name__ == "__main__"` to resolve hangups on TPUs and CPUs ; [*.py] Various minor fixes to get rid of doc warnings

…uojin-fix-hlo PiperOrigin-RevId: 859148792

PiperOrigin-RevId: 859161396

tdophung · 2026-01-21T19:37:23Z

gemini-review

…attn_dp PiperOrigin-RevId: 859195426

…ation_tests_restructure PiperOrigin-RevId: 859214905

…illation

…lation_restructure PiperOrigin-RevId: 859237519

PiperOrigin-RevId: 859256825

…x/benchmark-v5p-llama2-70b-configs PiperOrigin-RevId: 859607658

…vidow_log_config_once PiperOrigin-RevId: 859657642

PiperOrigin-RevId: 859721293

Co-authored-by: Eitan Porat <eporat@lightricks.com>

…ail_build PiperOrigin-RevId: 865682450

…sai/fix-ga-in-sft-trainer PiperOrigin-RevId: 865712234

…liang PiperOrigin-RevId: 865988464

…/fix_ungroup PiperOrigin-RevId: 866061982

mrope fix

…kenizer-path PiperOrigin-RevId: 866106142

…l_fixes PiperOrigin-RevId: 866110131

PiperOrigin-RevId: 866117394

…oguo-utils2 PiperOrigin-RevId: 866156329

…/hf_epoch PiperOrigin-RevId: 866183800

…nabled. PiperOrigin-RevId: 866204905

…ature PiperOrigin-RevId: 866206035

PiperOrigin-RevId: 866171684 Co-authored-by: maxtext authors <google-ml-automation@google.com>

…_cli_fix PiperOrigin-RevId: 866532079

…st_fix PiperOrigin-RevId: 866535796

PiperOrigin-RevId: 866619829

update clean up add attention_out for attention, attention_mla

…/add_checkpoint PiperOrigin-RevId: 866687753

…er/sharony/exp_sharding_dump PiperOrigin-RevId: 867643814

ycchenzheng and others added 9 commits December 8, 2025 21:03

Add Big Qurey necessary dependencies

68c2bcf

add markdown linting via mdformat as a pre-commit hook

4d0cf93

Adjusted the batch size to enable llama2-70b to run on v5p-128

8c9fede

fix dump hlo issue

a2ccc4a

Move integration tests to tests/integration and unit tests to tests/unit

5d66b49

support attention data parallelism

351eebc

Merge pull request AI-Hypercomputer#2977 from AI-Hypercomputer:chengn…

cfd7786

…uojin-fix-hlo PiperOrigin-RevId: 859148792

Merge pull request AI-Hypercomputer#2780 from ev-br:mdlint_1

aae7aea

PiperOrigin-RevId: 859161396

tdophung added the gemini-review label Jan 21, 2026

Google-ML-Automation and others added 19 commits January 21, 2026 11:43

Merge pull request AI-Hypercomputer#2955 from AI-Hypercomputer:mohit/…

3c8e0fa

…attn_dp PiperOrigin-RevId: 859195426

Merge pull request AI-Hypercomputer#2949 from AI-Hypercomputer:integr…

dae905c

…ation_tests_restructure PiperOrigin-RevId: 859214905

Move src/MaxText/distillation to src/maxtext/trainers/post_train/dist…

c1baacc

…illation

Merge pull request AI-Hypercomputer#2956 from AI-Hypercomputer:distil…

f41f61a

…lation_restructure PiperOrigin-RevId: 859237519

Add TE NVFP4 quantization config without RHT

f0e1eb4

Merge pull request AI-Hypercomputer#2049 from SamuelMarks:docgen

d01e65d

PiperOrigin-RevId: 859256825

Only log config once

ea71ea5

Merge pull request AI-Hypercomputer#2957 from CIeNET-International:fi…

40b8f0c

…x/benchmark-v5p-llama2-70b-configs PiperOrigin-RevId: 859607658

Merge pull request AI-Hypercomputer#2990 from AI-Hypercomputer:mattda…

8204907

…vidow_log_config_once PiperOrigin-RevId: 859657642

Fix build-docs CI workflow

f02444f

Merge pull request AI-Hypercomputer#2989 from AI-Hypercomputer:docs_fix

1d29ce1

PiperOrigin-RevId: 859721293

Record maxtext & post-training requirements commit ids to CI workflow

d53b80f

upgrading mldiagonistics version using seed

43f79dd

test signing commit

a06cbdf

Reverting to just mldiagnostics package

3fe6ef8

Using jax 0.8.1

6b3b516

Updating dependencies as suggested

228c060

Move src/MaxText/sft to src/maxtext/trainers/post_train/sft

9be4a6a

qwen3-omni audio encoder

0084322

Co-authored-by: Eitan Porat <eporat@lightricks.com>

Google-ML-Automation and others added 27 commits February 4, 2026 19:04

Merge pull request AI-Hypercomputer#3089 from AI-Hypercomputer:dont_f…

88805db

…ail_build PiperOrigin-RevId: 865682450

Support input buffer count for gmm

d5b9a6c

Merge pull request AI-Hypercomputer#3040 from AI-Hypercomputer:jimmyt…

0f85477

…sai/fix-ga-in-sft-trainer PiperOrigin-RevId: 865712234

fix ungrouped-imports

dadc275

Update paths in order to run train_distill.

6a6203f

Merge pull request AI-Hypercomputer#3081 from AI-Hypercomputer:amanda…

863d2a3

…liang PiperOrigin-RevId: 865988464

Merge pull request AI-Hypercomputer#3093 from AI-Hypercomputer:aireen…

352dd58

…/fix_ungroup PiperOrigin-RevId: 866061982

mm_utils refactor

4dfcb23

mrope fix

Merge pull request AI-Hypercomputer#3092 from AI-Hypercomputer:fix-to…

695694b

…kenizer-path PiperOrigin-RevId: 866106142

Merge pull request AI-Hypercomputer#3091 from AI-Hypercomputer:distil…

cdf4e6b

…l_fixes PiperOrigin-RevId: 866110131

Merge pull request AI-Hypercomputer#2987 from hx89:hx/nvfp4_no_rht

9fe6cdf

PiperOrigin-RevId: 866117394

support epoch in hf pipeline

8e52df8

Merge pull request AI-Hypercomputer#3052 from AI-Hypercomputer:hengta…

27eada9

…oguo-utils2 PiperOrigin-RevId: 866156329

Onboard DeepSeek MHC feature

1d02688

Merge pull request AI-Hypercomputer#3088 from AI-Hypercomputer:aireen…

a54b374

…/hf_epoch PiperOrigin-RevId: 866183800

Fix two issues that blocks training loop with continuous checkpoint e…

0a3c106

…nabled. PiperOrigin-RevId: 866204905

Merge pull request AI-Hypercomputer#3065 from AI-Hypercomputer:mhc_fe…

68c4066

…ature PiperOrigin-RevId: 866206035

adjust copybara rules for multimodal refactor (AI-Hypercomputer#3101)

9a6ff81

PiperOrigin-RevId: 866171684 Co-authored-by: maxtext authors <google-ml-automation@google.com>

Fix Gemini review with new version

6804d8d

Skip MHC gmm tests for GPU

73b295e

Merge pull request AI-Hypercomputer#3102 from AI-Hypercomputer:gemini…

7b97069

…_cli_fix PiperOrigin-RevId: 866532079

Merge pull request AI-Hypercomputer#3105 from AI-Hypercomputer:mhc_te…

00eb74e

…st_fix PiperOrigin-RevId: 866535796

Make batch split factor configurable.

f62ee44

PiperOrigin-RevId: 866619829

add activation checkpoint for batch_split

94f24a0

update clean up add attention_out for attention, attention_mla

Merge pull request AI-Hypercomputer#3108 from AI-Hypercomputer:qinwen…

95ef3e1

…/add_checkpoint PiperOrigin-RevId: 866687753

output logical axes and add unit test

c0a6b81

Merge pull request AI-Hypercomputer#3034 from CIeNET-International:us…

fb04bc7

…er/sharony/exp_sharding_dump PiperOrigin-RevId: 867643814

tdophung marked this pull request as draft February 9, 2026 18:12

tdophung force-pushed the integrate_te_permutation branch from 23b11e1 to fb04bc7 Compare February 11, 2026 00:52

tdophung merged commit 8434e35 into main Feb 11, 2026
2 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[Draft] Integrate TE permutation #1

[Draft] Integrate TE permutation #1
tdophung merged 173 commits intomainfrom
integrate_te_permutation

tdophung commented Jan 21, 2026 •

edited

Loading

Uh oh!

tdophung commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Comments

Conversation

tdophung commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

tdophung commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

tdophung commented Jan 21, 2026 •

edited

Loading