Skip to content

Why is the answer_score directly set to -2 when format_correct is False in verl/utils/reward_score/kk.py? #71

@Zhenyu-Bo

Description

@Zhenyu-Bo

For the calculation of rewards in kk.py, the implementation in the code sets the answer_score to -2 when format_correct is False. However, at this time, the responses of LLM may be correct, and the answer may also be enclosed within <answer></answer> tags.
I think after successfully extracting the answer text based on the answer tags, we can maintain the original method to parse the answers. When the answer tags are incomplete, we can scan the complete output of LLM from back to front to extract the answers, because the answers given by LLM at the end are usually the final version. At least when <answer> and </answer> tags exist, the answer should be extracted normally instead of directly setting the answer_score to -2. Is my idea correct?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions