simple grpo

A simple grpo trainer script. It's basically a rewrite of TRL's GRPOTrainer but simplified. The idea is to drop some things that work ootb in TRL in exchange for extensibility.

works for vlms/llms
no accelerate, only torch dist
supports fsdp and peft
no weighing rewards
always scale
bpo-style loss

if you want to use it you should have a look at config.py and update it according to your needs. For instance, update the data collator, change config values, etc .It should have ~ the same defaults as TRL now

then:

install

uv sync
uv pip install flash-attn --no-build-isolation

note: for the following, set the CUDA_VISIBLE_DEVICES for the vllm server and the trainer scripts, similar to TRL's vllm instructions. also, set the --nproc_per_node flag

run vllm server

VLLM_USE_V1=0 CUDA_VISIBLE_DEVICES=0,1... uv run vllm_server.py --model "Qwen/Qwen2.5-VL-7B-Instruct"

run train script

CUDA_VISIBLE_DEVICES=4,5... uv run torchrun --nproc_per_node=4 train.py

optionally, you can change the config values with flags. e.g.

CUDA_VISIBLE_DEVICES=4,5.. uv run torchrun --nproc_per_node=4 train.py --use_fsdp

todo:

impl two-sided clipping: https://github.com/huggingface/trl/commit/05bc43e960396581e458195b8388efe6b82cae1f

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
config.py		config.py
data.py		data.py
pyproject.toml		pyproject.toml
train.py		train.py
utils.py		utils.py
uv.lock		uv.lock
vllm_client.py		vllm_client.py
vllm_server.py		vllm_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

simple grpo

install

run vllm server

run train script

todo:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

nph4rd/grpo

Folders and files

Latest commit

History

Repository files navigation

simple grpo

install

run vllm server

run train script

todo:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages