GitHub - mingyin0312/RL4LLM: RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct

This is my attempt for using RL to improve LLM reasoning. The GRPO training is a modified version of this nice code and the evaluation follows this.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
eval.py		eval.py
grpo_qwen.py		grpo_qwen.py
gsm8k_eval_results_GRPO.json		gsm8k_eval_results_GRPO.json
gsm8k_eval_results_Original.json		gsm8k_eval_results_Original.json
gsm8k_eval_results_SFT.json		gsm8k_eval_results_SFT.json
sft_qwen.py		sft_qwen.py

Provide feedback