fix + tests dense & MoE TP all reduce (decoder only)#43722
fix + tests dense & MoE TP all reduce (decoder only)#43722
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…ed GPU management - Updated `run_dense_tests.sh` and `run_moe_tests.sh` to support parallel execution of tests using available GPU pairs. - Changed variable names for clarity, replacing `NUM_GPUS` with `GPUS_PER_TEST`. - Enhanced output messages to reflect the number of parallel test slots and GPU usage. - Implemented logic to handle skipped tests and updated result reporting to include skipped counts. - Removed `TensorParallelTesterMixin` from `CausalLMModelTest` and integrated it into `ModelTesterMixin` for better structure in test classes.
Cyrilvallez
left a comment
There was a problem hiding this comment.
Just a few very early thoughts!
…lecting for mergeModuleList
- Modified `run_dense_tests.sh` and `run_moe_tests.sh` to change the pytest keyword from "test_tensor_parallel" to "test_tp_" for improved test targeting. - Cleaned up comments and removed unused code in `test_tensor_parallel_mixin.py` to streamline the testing process and enhance readability.
|
💔 This comment contains |
|
|
||
| op_name = _format_op_name(op) | ||
|
|
||
| tb_str = "".join(traceback.format_exception(type(e), e, e.__traceback__)) |
There was a problem hiding this comment.
let's keeps this one please!
| load_config: Any, | ||
| tp_plan: dict[str, str] | None, | ||
| dtype_plan: dict | None = None, |
There was a problem hiding this comment.
not sure we want to revert this
| shard_index = ( | ||
| len(mapping.collected_tensors.get(source_pattern, [])) | ||
| if isinstance(mapping, WeightConverter) and isinstance(mapping.operations[0], MergeModulelist) | ||
| else None | ||
| ) |
There was a problem hiding this comment.
this is important for "EP" sharding no?
src/transformers/modeling_utils.py
Outdated
| if is_torch_greater_or_equal("2.3.0"): | ||
| str_to_torch_dtype["U16"] = torch.uint16 | ||
| str_to_torch_dtype["U32"] = torch.uint32 | ||
| str_to_torch_dtype["U64"] = torch.uint64 |
There was a problem hiding this comment.
we don't support 2.3 only >= 2.4
There was a problem hiding this comment.
there is a lot to revert here still (cleanup)
|
run-slow: apertus, deepseek_v2, deepseek_v3, dots1, ernie4_5_moe, exaone4, exaone_moe, flex_olmo, gemma2, gemma3, gemma3n, glm4_moe, glm4_moe_lite, glm_moe_dsa, gpt_oss |
|
This comment contains models: ["models/apertus", "models/deepseek_v2", "models/deepseek_v3", "models/dots1", "models/ernie4_5_moe", "models/exaone4", "models/exaone_moe", "models/flex_olmo", "models/gemma2", "models/gemma3", "models/gemma3n", "models/glm4_moe", "models/glm4_moe_lite", "models/glm_moe_dsa", "models/gpt_oss"] |
|
run-slow: apertus, deepseek_v2, deepseek_v3, dots1, ernie4_5_moe, exaone4, exaone_moe, flex_olmo, gemma2, gemma3, gemma3n, glm4_moe, glm4_moe_lite, glm_moe_dsa, gpt_oss |
|
💔 This comment contains |
b492073 to
c97dd50
Compare
|
run-slow: apertus, deepseek_v2, deepseek_v3, dots1, ernie4_5_moe, exaone4, exaone_moe, flex_olmo, gemma3 |
|
This comment contains models: ["models/apertus", "models/deepseek_v2", "models/deepseek_v3", "models/dots1", "models/ernie4_5_moe", "models/exaone4", "models/exaone_moe", "models/flex_olmo", "models/gemma3"] |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: apertus, deepseek_v2, deepseek_v3, dots1, ernie4_5_moe, exaone4, exaone_moe, flex_olmo, gemma3 |
Let's make sure it works for decoder only first (We skip VLM + Encoder-decoder for now)
Introduction, forward, backward, generation (with convert mapping triggering) test agains TP vs non-TP baseline
./run_dense_tests.sh results_dense