Skip to content
This repository was archived by the owner on Sep 23, 2025. It is now read-only.
This repository was archived by the owner on Sep 23, 2025. It is now read-only.

Openai API not allow temperature=0.0 for llama-2-7b-chat-hf #139

@yutianchen666

Description

@yutianchen666

When running the llama-2-7b-chat-hf model with openai api for gsm8k(Mathematical Ability Test), it needs to set temperature=0.0

But I get unexpected error like

lm_eval --model local-chat-completions --model_args model=llama-2-7b-chat-hf,base_url=http://localhost:8000/v1 --task gsm8k
2024-03-12:16:09:56,344 INFO [main.py:225] Verbosity set to INFO
2024-03-12:16:09:56,344 INFO [init.py:373] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-03-12:16:10:01,070 INFO [main.py:311] Selected Tasks: ['gsm8k']
2024-03-12:16:10:01,070 INFO [main.py:312] Loading selected tasks...
2024-03-12:16:10:01,075 INFO [evaluator.py:129] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
2024-03-12:16:10:01,419 INFO [evaluator.py:190] get_task_dict has been updated to accept an optional argument, task_managerRead more here:https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/interface.md#external-library-usage
2024-03-12:16:10:17,655 INFO [task.py:395] Building contexts for gsm8k on rank 0...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:06<00:00, 192.73it/s]
2024-03-12:16:10:24,524 INFO [evaluator.py:357] Running generate_until requests
0%| 2024-03-12:16:11:08,170 INFO [_client.py:1026] HTTP Request: POST http://localhost:8000/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
2024-03-12:16:11:08,171 INFO [_base_client.py:952] Retrying request to /chat/completions in 0.788895 seconds
2024-03-12:16:11:09,010 INFO [_client.py:1026] HTTP Request: POST http://localhost:8000/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
2024-03-12:16:11:09,011 INFO [_base_client.py:952] Retrying request to /chat/completions in 1.621023 seconds
2024-03-12:16:11:10,683 INFO [_client.py:1026] HTTP Request: POST http://localhost:8000/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
Traceback (most recent call last):
File "/home/yutianchen/Project/lm-evaluation-harness/lm_eval/models/utils.py", line 333, in wrapper
return func(*args, **kwargs)
File "/home/yutianchen/Project/lm-evaluation-harness/lm_eval/models/openai_completions.py", line 75, in completion
return client.chat.completions.create(**kwargs)
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_utils/_utils.py", line 303, in wrapper
return func(*args, **kwargs)
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/resources/chat/completions.py", line 598, in create
return self._post(
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 1088, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 853, in request
return self._request(
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 916, in _request
return self._retry_request(
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 958, in _retry_request
return self._request(
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 916, in _request
return self._retry_request(
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 958, in _retry_request
return self._request(
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 930, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'generated_text': None, 'num_input_tokens': None, 'num_input_tokens_batch': None, 'num_generated_tokens': None, 'num_generated_tokens_batch': None, 'preprocessing_time': None, 'generation_time': None, 'timestamp': 1710259870.67887, 'finish_reason': None, 'error': {'object': 'error', 'message': 'Internal Server Error', 'internal_message': 'Internal Server Error', 'type': 'InternalServerError', 'param': {}, 'code': 500}}

The error is similar when testing temperature=0.0 using llm-on-ray query_openai_sdk.py --model_name llama-2-7b-chat-hf --temperature 0.0

python examples/inference/api_server_openai/query_openai_sdk.py --model_name llama-2-7b-chat-hf --temperature 0.0
Traceback (most recent call last):
File "/home/yutianchen/Project/latest_lib/llm-on-ray/examples/inference/api_server_openai/query_openai_sdk.py", line 98, in
for i in chunk_chat():
File "/home/yutianchen/Project/latest_lib/llm-on-ray/examples/inference/api_server_openai/query_openai_sdk.py", line 75, in chunk_chat
output = client.chat.completions.create(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_utils/_utils.py", line 275, in wrapper
return func(*args, **kwargs)
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/resources/chat/completions.py", line 663, in create
return self._post(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 1200, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 889, in request
return self._request(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 965, in _request
return self._retry_request(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 1013, in _retry_request
return self._request(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 965, in _request
return self._retry_request(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 1013, in _retry_request
return self._request(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 980, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'generated_text': None, 'num_input_tokens': None, 'num_input_tokens_batch': None, 'num_generated_tokens': None, 'num_generated_tokens_batch': None, 'preprocessing_time': None, 'generation_time': None, 'timestamp': 1710260245.4014304, 'finish_reason': None, 'error': {'object': 'error', 'message': 'Internal Server Error', 'internal_message': 'Internal Server Error', 'type': 'InternalServerError', 'param': {}, 'code': 500}}

Both LLaMA https://github.com/facebookresearch/llama/issues/687 and Transformers https://github.com/huggingface/transformers/pull/25722 officials suggest to ”set do_sample = False in case temperature = 0“
image

But openai api’s
client.chat.completions.create(**kwargs)
It does not support the do_sample parameter, and there is no suitable args to solve the problem of temperature=0.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions