Skip to content

image build error #863

@aaadipop

Description

@aaadipop

Hello,

I want to make an experiment by running a benchmark through different environments and I ended up to mlcommons training repo
I navigate to the reccomendation_v2 benchmark, i clone the repo, run docker build and I got this error: can't find Rust compiler
Then I navigate to the small llm pretraining, clone the repo, build the docker.h200 image, source config_H200_1x8x1_8b.sh && python pretrain_llama31.py and I got

Traceback (most recent call last):
  File "/workspace/code/pretrain_llama31.py", line 23, in <module>
    from nemo.collections import llm
  File "/workspace/NeMo/nemo/collections/llm/__init__.py", line 20, in <module>
    from nemo.collections.llm import peft
  File "/workspace/NeMo/nemo/collections/llm/peft/__init__.py", line 15, in <module>
    from nemo.collections.llm.peft.api import gpt_lora, merge_lora
  File "/workspace/NeMo/nemo/collections/llm/peft/api.py", line 20, in <module>
    from megatron.core import dist_checkpointing
  File "/workspace/Megatron-LM/megatron/core/__init__.py", line 5, in <module>
    from megatron.core.distributed import DistributedDataParallel
  File "/workspace/Megatron-LM/megatron/core/distributed/__init__.py", line 7, in <module>
    from .finalize_model_grads import finalize_model_grads
  File "/workspace/Megatron-LM/megatron/core/distributed/finalize_model_grads.py", line 16, in <module>
    from ..transformer.moe.moe_utils import get_updated_expert_bias
  File "/workspace/Megatron-LM/megatron/core/transformer/moe/moe_utils.py", line 12, in <module>
    from megatron.core.extensions.transformer_engine import (
  File "/workspace/Megatron-LM/megatron/core/extensions/transformer_engine.py", line 95, in <module>
    class TELinear(te.pytorch.Linear):
                   ^^^^^^^^^^
AttributeError: module 'transformer_engine' has no attribute 'pytorch'

I have tried to reproduce those environments locally or in apptainer containers but without success
*there are also other errors for apptainer or by venvs but i list the simplest docker path

Thanks,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions