This is my attempt for using RL to improve LLM reasoning. The GRPO training is a modified version of this nice code and the evaluation follows this.
-
Notifications
You must be signed in to change notification settings - Fork 4
mingyin0312/RL4LLM
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published