fix + tests dense & MoE TP all reduce (decoder only) by 3outeille · Pull Request #43722 · huggingface/transformers

3outeille · 2026-02-03T23:08:40Z

Let's make sure it works for decoder only first (We skip VLM + Encoder-decoder for now)

Introduction, forward, backward, generation (with convert mapping triggering) test agains TP vs non-TP baseline

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import os
from torch.distributed.elastic.multiprocessing.errors import record

model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
# model_id = "Qwen/Qwen1.5-MoE-A2.7B-Chat"
# model_id = "Qwen/Qwen3-30B-A3B-Instruct-2507"

rank = int(os.environ["RANK"])
world_size = int(os.environ["WORLD_SIZE"])
device = torch.device(f"cuda:{rank}")
# Need to be initialized explicitly to use the `barrier` before loading
torch.distributed.init_process_group(backend="nccl", rank=rank, world_size=world_size, device_id=rank)

@record
def main():

    model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16, tp_plan="auto")
    # model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16, device_map="auto")
    tokenizer = AutoTokenizer.from_pretrained(model_id)

    messages = [
        {"role": "user", "content": "What do you think about life?"},
    ]
    inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
    input_size = inputs.input_ids.shape[-1]
    output = model.generate(**inputs, max_new_tokens=100, do_sample=False)
    text = tokenizer.batch_decode(output[:, input_size:])[0]
    print(text)

main()

torch.distributed.destroy_process_group()

./run_dense_tests.sh results_dense

- `./run_moe_tests.sh results_moe`

HuggingFaceDocBuilderDev · 2026-02-03T23:21:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…ed GPU management - Updated `run_dense_tests.sh` and `run_moe_tests.sh` to support parallel execution of tests using available GPU pairs. - Changed variable names for clarity, replacing `NUM_GPUS` with `GPUS_PER_TEST`. - Enhanced output messages to reflect the number of parallel test slots and GPU usage. - Implemented logic to handle skipped tests and updated result reporting to include skipped counts. - Removed `TensorParallelTesterMixin` from `CausalLMModelTest` and integrated it into `ModelTesterMixin` for better structure in test classes.

Cyrilvallez

Just a few very early thoughts!

run_dense_tests.sh

tests/test_tensor_parallel_mixin.py

…lecting for mergeModuleList

- Modified `run_dense_tests.sh` and `run_moe_tests.sh` to change the pytest keyword from "test_tensor_parallel" to "test_tp_" for improved test targeting. - Cleaned up comments and removed unused code in `test_tensor_parallel_mixin.py` to streamline the testing process and enhance readability.

into fix-moe-ep

github-actions · 2026-02-12T07:23:18Z

💔 This comment contains run-slow, but unknown error occurred and the workflow run aborted!

ArthurZucker · 2026-02-12T14:53:11Z

src/transformers/core_model_loading.py


        op_name = _format_op_name(op)
-
-        tb_str = "".join(traceback.format_exception(type(e), e, e.__traceback__))


let's keeps this one please!

ArthurZucker · 2026-02-12T14:53:25Z

src/transformers/core_model_loading.py

+    load_config: Any,
    tp_plan: dict[str, str] | None,
+    dtype_plan: dict | None = None,


not sure we want to revert this

ArthurZucker · 2026-02-12T14:53:39Z

src/transformers/core_model_loading.py

-                    shard_index = (
-                        len(mapping.collected_tensors.get(source_pattern, []))
-                        if isinstance(mapping, WeightConverter) and isinstance(mapping.operations[0], MergeModulelist)
-                        else None
-                    )


this is important for "EP" sharding no?

ArthurZucker · 2026-02-12T14:54:08Z

src/transformers/modeling_utils.py

+if is_torch_greater_or_equal("2.3.0"):
+    str_to_torch_dtype["U16"] = torch.uint16
+    str_to_torch_dtype["U32"] = torch.uint32
+    str_to_torch_dtype["U64"] = torch.uint64


we don't support 2.3 only >= 2.4

ArthurZucker · 2026-02-12T14:54:27Z

src/transformers/modeling_utils.py

there is a lot to revert here still (cleanup)

3outeille · 2026-02-12T16:34:10Z

run-slow: apertus, deepseek_v2, deepseek_v3, dots1, ernie4_5_moe, exaone4, exaone_moe, flex_olmo, gemma2, gemma3, gemma3n, glm4_moe, glm4_moe_lite, glm_moe_dsa, gpt_oss

github-actions · 2026-02-12T16:35:28Z

This comment contains run-slow, running the specified jobs:

models: ["models/apertus", "models/deepseek_v2", "models/deepseek_v3", "models/dots1", "models/ernie4_5_moe", "models/exaone4", "models/exaone_moe", "models/flex_olmo", "models/gemma2", "models/gemma3", "models/gemma3n", "models/glm4_moe", "models/glm4_moe_lite", "models/glm_moe_dsa", "models/gpt_oss"]
quantizations: []

3outeille · 2026-02-12T17:17:17Z

run-slow: apertus, deepseek_v2, deepseek_v3, dots1, ernie4_5_moe, exaone4, exaone_moe, flex_olmo, gemma2, gemma3, gemma3n, glm4_moe, glm4_moe_lite, glm_moe_dsa, gpt_oss

github-actions · 2026-02-12T18:18:32Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	542e74c6	merge commit
PR	3cde5991	branch commit
main	c8f112d4	base commit

⚠️ No test being reported (jobs are skipped or cancelled)!

github-actions · 2026-02-12T18:19:12Z

💔 This comment contains run-slow, but unknown error occurred and the workflow run aborted!

into fix-moe-ep

3outeille · 2026-02-12T20:14:20Z

run-slow: apertus, deepseek_v2, deepseek_v3, dots1, ernie4_5_moe, exaone4, exaone_moe, flex_olmo, gemma3

github-actions · 2026-02-12T20:15:38Z

This comment contains run-slow, running the specified jobs:

models: ["models/apertus", "models/deepseek_v2", "models/deepseek_v3", "models/dots1", "models/ernie4_5_moe", "models/exaone4", "models/exaone_moe", "models/flex_olmo", "models/gemma3"]
quantizations: []

github-actions · 2026-02-12T21:04:19Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	efb11cbd	merge commit
PR	550b1428	branch commit
main	609e3d58	base commit

✅ No failing test specific to this PR 🎉 👏 !

github-actions · 2026-02-13T20:52:55Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: apertus, deepseek_v2, deepseek_v3, dots1, ernie4_5_moe, exaone4, exaone_moe, flex_olmo, gemma3

3outeille added 2 commits February 3, 2026 23:07

introducing test tensor parallel mixing to catch TP related error

9152e86

Remove test file for tensor parallel functionality

3234776

Cyrilvallez reviewed Feb 4, 2026

View reviewed changes

run_dense_tests.sh Show resolved Hide resolved

tests/test_tensor_parallel_mixin.py Outdated Show resolved Hide resolved

ArthurZucker and others added 8 commits February 4, 2026 09:31

restore

ec2ed1d

add all reduce for ep

33ca330

fix init and bias sharding

e545ac1

fix finalize weight init

fa78068

add full stacktracing

6e4d234

fix

05fc1fa

add report to run tests

ac291e8

okay big improvement here

819698c

3outeille changed the base branch from main to fix-ep February 4, 2026 13:38

3outeille changed the title ~~EP all reduce~~ tests EP all reduce Feb 4, 2026

ArthurZucker and others added 11 commits February 4, 2026 13:44

the only case shard index should be used is when we are acctually col…

d99f834

…lecting for mergeModuleList

more fixes

f0d0de1

fix EP forward gpt oss

c5cbdc8

add test that trigger the weight converter or only dynamoc loading

381d773

Merge branch 'fix-ep' into fix-moe-ep

b8901a8

Merge branch 'fix-moe-ep' of https://github.com/huggingface/transformers

a2aa66a

into fix-moe-ep

cleaning + find_port + remove comments

1dca5f9

revert some shit

94d676c

when you are stupid sometimes you really need a brain :) :) :) :)

959b46f

fix TP

01c5774

3outeille changed the title ~~tests EP all reduce~~ tests EP all reduce (decoder only) Feb 4, 2026

ArthurZucker and others added 3 commits February 4, 2026 16:25

Ok GPT oss is fixed now

9dbb634

try to fix perms

8374298

test only causal llm

989bd9a

redudancy in tests

084269a

3outeille and others added 3 commits February 12, 2026 11:13

simplify

ea0abf8

Merge branch 'main' into fix-moe-ep

d01896e

Merge branch 'main' into fix-moe-ep

0db56c8

ArthurZucker reviewed Feb 12, 2026

View reviewed changes

3outeille and others added 4 commits February 12, 2026 17:10

Merge branch 'main' into fix-moe-ep

1662b5b

revert

c038773

Merge branch 'main' into fix-moe-ep

59e9860

Merge branch 'main' into fix-moe-ep

3cde599

Merge branch 'main' into fix-moe-ep

2b5d952

3outeille added 3 commits February 12, 2026 18:20

fix gemma2

df5f993

Merge branch 'fix-moe-ep' of https://github.com/huggingface/transformers

db23a99

into fix-moe-ep

fix

c97dd50

3outeille force-pushed the fix-moe-ep branch from b492073 to c97dd50 Compare February 12, 2026 18:21

3outeille added 3 commits February 12, 2026 18:31

make tests work only on CPU

95619cd

linting

d0d351c

skip tests for run_slow

550b142

3outeille requested a review from ArthurZucker February 12, 2026 21:35

Merge branch 'main' into fix-moe-ep

4d0e21d

huggingface deleted a comment from github-actions bot Feb 13, 2026


		op_name = _format_op_name(op)

		tb_str = "".join(traceback.format_exception(type(e), e, e.__traceback__))

Conversation

3outeille commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Feb 3, 2026

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

ArthurZucker Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

3outeille commented Feb 12, 2026

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

3outeille commented Feb 12, 2026

Uh oh!

github-actions bot commented Feb 12, 2026

CI Results

Commit Info

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

3outeille commented Feb 12, 2026

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

github-actions bot commented Feb 12, 2026

CI Results

Commit Info

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

3outeille commented Feb 3, 2026 •

edited

Loading