For the calculation of rewards in kk.py, the implementation in the code sets the answer_score to -2 when format_correct is False. However, at this time, the responses of LLM may be correct, and the answer may also be enclosed within <answer></answer> tags.
I think after successfully extracting the answer text based on the answer tags, we can maintain the original method to parse the answers. When the answer tags are incomplete, we can scan the complete output of LLM from back to front to extract the answers, because the answers given by LLM at the end are usually the final version. At least when <answer> and </answer> tags exist, the answer should be extracted normally instead of directly setting the answer_score to -2. Is my idea correct?