GRPO-PubMedQA Manual Training

This project fine-tunes a Qwen-2.5-0.5B-Instruct model on the PubMedQA dataset using a manually supervised GRPO (Group Relative Policy Optimization) pipeline, with subagent using qwen 7b.

How to Run

Option 1 — Using Docker

Step 1: Go into the project folder:

cd supervisor

Step 2: Build the Docker image:

docker build -t grpo-pubmedqa:latest .

Step 3: Run the container with GPU support:

docker run -it --gpus all \
  -v ${PWD}:/workspace \
  -e WANDB_API_KEY=your_key_here \
  -e WANDB_PROJECT=GRPO-Qwen-PubMedQA-Manual \
  grpo-pubmedqa:latest
(YOU CAN PROVIDE A MODEL NAME to replace sub agent model)
(IF YOU WANT TO CHANGE THE MODEL YOU train you need to change it in main)

Option 2 — Run Locally (Without Docker)

Step 1: Install dependencies:

pip install -r requirements.txt

Step 2: Set up Weights & Biases (W&B) for experiment tracking:

export WANDB_API_KEY=your_key_here
export WANDB_PROJECT=GRPO-Qwen-PubMedQA-Manual

Step 3: Run the training script:

python main.py

Notes

Ensure you have a working GPU + CUDA setup.
Weights & Biases is optional but recommended for tracking metrics and losses.
You can modify the default model name or dataset path directly inside main.py if needed.
The model and tokenizer will be automatically downloaded from Hugging Face on first run.

Example Environment Variables (Windows PowerShell)

setx WANDB_API_KEY "your_key_here"
setx WANDB_PROJECT "GRPO-Qwen-PubMedQA-Manual"

That’s it! 🎯
You’re ready to train and evaluate your GRPO-based PubMedQA supervisor model.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
sub_agent		sub_agent
supervisor		supervisor
supervisor_steps		supervisor_steps
README.md		README.md
install_dependencies.sh		install_dependencies.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GRPO-PubMedQA Manual Training

How to Run

Option 1 — Using Docker

Option 2 — Run Locally (Without Docker)

Notes

Example Environment Variables (Windows PowerShell)

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Candy26i/supervisor_remote

Folders and files

Latest commit

History

Repository files navigation

GRPO-PubMedQA Manual Training

How to Run

Option 1 — Using Docker

Option 2 — Run Locally (Without Docker)

Notes

Example Environment Variables (Windows PowerShell)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages