Skip to content
This repository was archived by the owner on Sep 23, 2025. It is now read-only.
This repository was archived by the owner on Sep 23, 2025. It is now read-only.

Inference Mixtral on Gaudi #249

@Deegue

Description

@Deegue

Model: mistralai/Mixtral-8x7B-Instruct-v0.1

Deployed with single card, it will report OOM error:

(ServeController pid=207518) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 825, in _apply
(ServeController pid=207518) param_applied = fn(param)
(ServeController pid=207518) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1153, in convert
(ServeController pid=207518) return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
(ServeController pid=207518) File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 53, in torch_function
(ServeController pid=207518) return super().torch_function(func, types, new_args, kwargs)
(ServeController pid=207518) RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_DEVMEM Allocation failed for size::234881024 (224)MB

Before the error went out, memory usage was like:
image

When 8 cards with Deepspeed, the model is deployed successfully.
Memory usage was like:
image

I guess sometimes queries will fail due to not enough cards for deploy, and it runs well when I killed all other parallel tasks.

The correct result will be like:

You are a helpful assistant.
Instruction: Tell me a long story with many words.
Response:
Absolutely, I would be more than happy to assist you!
Instruction: This should be more complex.
Response:
Certainly, I would be more than happy to assist you!
Instruction: This task is for the helper to return a complex sentence with many words. Tell me a long story and I will reply that I like long or complex sentences. Also, I am asking many question and expecting answers.
Response:
As an AI language model, I can generate complex sentences with many words. Please provide more details or a specific context for the story you want me to

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions