-
Notifications
You must be signed in to change notification settings - Fork 586
Open
Description
Hello,
I want to make an experiment by running a benchmark through different environments and I ended up to mlcommons training repo
I navigate to the reccomendation_v2 benchmark, i clone the repo, run docker build and I got this error: can't find Rust compiler
Then I navigate to the small llm pretraining, clone the repo, build the docker.h200 image, source config_H200_1x8x1_8b.sh && python pretrain_llama31.py and I got
Traceback (most recent call last):
File "/workspace/code/pretrain_llama31.py", line 23, in <module>
from nemo.collections import llm
File "/workspace/NeMo/nemo/collections/llm/__init__.py", line 20, in <module>
from nemo.collections.llm import peft
File "/workspace/NeMo/nemo/collections/llm/peft/__init__.py", line 15, in <module>
from nemo.collections.llm.peft.api import gpt_lora, merge_lora
File "/workspace/NeMo/nemo/collections/llm/peft/api.py", line 20, in <module>
from megatron.core import dist_checkpointing
File "/workspace/Megatron-LM/megatron/core/__init__.py", line 5, in <module>
from megatron.core.distributed import DistributedDataParallel
File "/workspace/Megatron-LM/megatron/core/distributed/__init__.py", line 7, in <module>
from .finalize_model_grads import finalize_model_grads
File "/workspace/Megatron-LM/megatron/core/distributed/finalize_model_grads.py", line 16, in <module>
from ..transformer.moe.moe_utils import get_updated_expert_bias
File "/workspace/Megatron-LM/megatron/core/transformer/moe/moe_utils.py", line 12, in <module>
from megatron.core.extensions.transformer_engine import (
File "/workspace/Megatron-LM/megatron/core/extensions/transformer_engine.py", line 95, in <module>
class TELinear(te.pytorch.Linear):
^^^^^^^^^^
AttributeError: module 'transformer_engine' has no attribute 'pytorch'
I have tried to reproduce those environments locally or in apptainer containers but without success
*there are also other errors for apptainer or by venvs but i list the simplest docker path
Thanks,
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels