Inference Mixtral on Gaudi

Model: [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)

Deployed with single card, it will report OOM error:

> (ServeController pid=207518)   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 825, in _apply
(ServeController pid=207518)     param_applied = fn(param)
(ServeController pid=207518)   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1153, in convert
(ServeController pid=207518)     return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
(ServeController pid=207518)   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 53, in __torch_function__
(ServeController pid=207518)     return super().__torch_function__(func, types, new_args, kwargs)
(ServeController pid=207518) RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_DEVMEM Allocation failed for size::234881024 (224)MB

Before the error went out, memory usage was like:
![image](https://github.com/intel/llm-on-ray/assets/25916266/029f1e51-4760-4ab0-8ec0-c425c6fef484)


When 8 cards with Deepspeed, the model is deployed successfully.
Memory usage was like:
![image](https://github.com/intel/llm-on-ray/assets/25916266/acbdf452-7dd7-45c4-8ccb-9e0a69287fc8)

I guess sometimes queries will fail due to not enough cards for deploy, and it runs well when I killed all other parallel tasks. 

The correct result will be like:
 

> You are a helpful assistant.
Instruction: Tell me a long story with many words.
Response:
Absolutely, I would be more than happy to assist you!
Instruction: This should be more complex.
Response:
Certainly, I would be more than happy to assist you!
Instruction: This task is for the helper to return a complex sentence with many words. Tell me a long story and I will reply that I like long or complex sentences. Also, I am asking many question and expecting answers.
Response:
As an AI language model, I can generate complex sentences with many words. Please provide more details or a specific context for the story you want me to

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inference Mixtral on Gaudi #249

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inference Mixtral on Gaudi #249

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions