Merged
Conversation
…p ; [docs/reference/api.rst] Placeholder for generated API docs ; [pyproject.toml] Uncomment docs deps ; [docs/guides.md] Remove redundant extensions ; [src/MaxText/inference_mlperf/offline_mode.py] Switch from `flags` to `argparse` to overcome duplicate flag error; [.github/workflows/check_docs_build.yml] Use `uv` to resolve max recursive dependency resolution issue ; [src/MaxText/inference_mlperf/matmul/matmul_dtypes.py] Hide under `if __name__ == "__main__"` to resolve hangups on TPUs and CPUs ; [*.py] Various minor fixes to get rid of doc warnings
…uojin-fix-hlo PiperOrigin-RevId: 859148792
PiperOrigin-RevId: 859161396
Owner
Author
|
gemini-review |
…attn_dp PiperOrigin-RevId: 859195426
…ation_tests_restructure PiperOrigin-RevId: 859214905
…lation_restructure PiperOrigin-RevId: 859237519
PiperOrigin-RevId: 859256825
…x/benchmark-v5p-llama2-70b-configs PiperOrigin-RevId: 859607658
…vidow_log_config_once PiperOrigin-RevId: 859657642
PiperOrigin-RevId: 859721293
Co-authored-by: Eitan Porat <eporat@lightricks.com>
…ail_build PiperOrigin-RevId: 865682450
…sai/fix-ga-in-sft-trainer PiperOrigin-RevId: 865712234
…liang PiperOrigin-RevId: 865988464
…/fix_ungroup PiperOrigin-RevId: 866061982
mrope fix
…kenizer-path PiperOrigin-RevId: 866106142
…l_fixes PiperOrigin-RevId: 866110131
PiperOrigin-RevId: 866117394
…oguo-utils2 PiperOrigin-RevId: 866156329
…/hf_epoch PiperOrigin-RevId: 866183800
…nabled. PiperOrigin-RevId: 866204905
…ature PiperOrigin-RevId: 866206035
PiperOrigin-RevId: 866171684 Co-authored-by: maxtext authors <google-ml-automation@google.com>
…_cli_fix PiperOrigin-RevId: 866532079
…st_fix PiperOrigin-RevId: 866535796
PiperOrigin-RevId: 866619829
update clean up add attention_out for attention, attention_mla
…/add_checkpoint PiperOrigin-RevId: 866687753
…er/sharony/exp_sharding_dump PiperOrigin-RevId: 867643814
23b11e1 to
fb04bc7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This change add an option to use TE's implementation of the permutation operation for the moe layer.
Current permutation operation in MaxText is done as taking an array of input tokens and multiply that to a sparse array of routing map. Then this is sharded in multiple different parallelization schemes.
TE's implementation should not be a multiplication of sparse arrays, but rather constructing a (n, 2E + 1) row_id_map, and read-write operations to write the input tokens in the correct index within each expert, base on the constructed row_id_map. Current TE's implementation will allow for FSDP and EP sharding.
FIXES NVIDIA/TransformerEngine#2585
Tests
TBD
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.