Skip to content

Candy26i/supervisor_remote

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GRPO-PubMedQA Manual Training

This project fine-tunes a Qwen-2.5-0.5B-Instruct model on the PubMedQA dataset using a manually supervised GRPO (Group Relative Policy Optimization) pipeline, with subagent using qwen 7b.


How to Run

Option 1 — Using Docker

Step 1: Go into the project folder:

cd supervisor

Step 2: Build the Docker image:

docker build -t grpo-pubmedqa:latest .

Step 3: Run the container with GPU support:

docker run -it --gpus all \
  -v ${PWD}:/workspace \
  -e WANDB_API_KEY=your_key_here \
  -e WANDB_PROJECT=GRPO-Qwen-PubMedQA-Manual \
  grpo-pubmedqa:latest
(YOU CAN PROVIDE A MODEL NAME to replace sub agent model)
(IF YOU WANT TO CHANGE THE MODEL YOU train you need to change it in main)

Option 2 — Run Locally (Without Docker)

Step 1: Install dependencies:

pip install -r requirements.txt

Step 2: Set up Weights & Biases (W&B) for experiment tracking:

export WANDB_API_KEY=your_key_here
export WANDB_PROJECT=GRPO-Qwen-PubMedQA-Manual

Step 3: Run the training script:

python main.py

Notes

  • Ensure you have a working GPU + CUDA setup.
  • Weights & Biases is optional but recommended for tracking metrics and losses.
  • You can modify the default model name or dataset path directly inside main.py if needed.
  • The model and tokenizer will be automatically downloaded from Hugging Face on first run.

Example Environment Variables (Windows PowerShell)

setx WANDB_API_KEY "your_key_here"
setx WANDB_PROJECT "GRPO-Qwen-PubMedQA-Manual"

That’s it! 🎯
You’re ready to train and evaluate your GRPO-based PubMedQA supervisor model.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •