Why is the answer_score directly set to -2 when format_correct is False in verl/utils/reward_score/kk.py?

For the calculation of rewards in kk.py, the implementation in the code sets the answer_score to -2 when format_correct is False. However, at this time, the responses of LLM may be correct, and the answer may also be enclosed within \<answer>\</answer> tags.
I think after successfully extracting the answer text based on the answer tags, we can maintain the original method to parse the answers. When the answer tags are incomplete, we can scan the complete output of LLM from back to front to extract the answers, because the answers given by LLM at the end are usually the final version. At least when \<answer> and \</answer> tags exist, the answer should be extracted normally instead of directly setting the answer_score to -2. Is my idea correct?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is the answer_score directly set to -2 when format_correct is False in verl/utils/reward_score/kk.py? #71

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Why is the answer_score directly set to -2 when format_correct is False in verl/utils/reward_score/kk.py? #71

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions