Skip to content

RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct

Notifications You must be signed in to change notification settings

mingyin0312/RL4LLM

Repository files navigation

This is my attempt for using RL to improve LLM reasoning. The GRPO training is a modified version of this nice code and the evaluation follows this.

About

RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages