Note: This repository is a fork of the original TRL (Transformer Reinforcement Learning) library with custom modifications and experimental workflows.
This repository brings together practical training, evaluation, and inference workflows around preference optimization (DPO) and reinforcement-style optimization (GRPO) using the TRL library. It includes:
- Offline/online DPO training utilities and scripts with custom modifications
- GRPO training scripts and analysis
- Inference helpers and evaluation harnesses
- Mobile-oriented deployment experiments
- A modified copy of the TRL project in the trl-cloud folder with experimental features
This repository is based on the HuggingFace TRL library with the following key modifications:
trl-cloud/examples/scripts/dpo_online.py- Enhanced online DPO training script with custom dataset limiting and experimental features- Additional custom training and evaluation scripts in root directories
- Root-level training scripts (
grpo_train.py,run_inference.py) - Enhanced evaluation and comparison utilities
- Mobile deployment experiments
- Training analysis and visualization tools
- Create the repository on GitHub or use GitHub CLI:
gh repo create <your-repo-name> --source . --private --push --remote origin
- Set repository metadata:
- Add a description, topics (e.g.,
trl,dpo,grpo,rlhf,huggingface), and a project URL.
- Add a description, topics (e.g.,
- Protect branches and set required status checks as your CI evolves.
- dpo_online/ — Online DPO training and inference utilities (e.g., online-DPO.py, inference.py)
- dpo_testing/ — Offline DPO test scripts, evaluation reports, and comparisons
- grpo_train.py — Standalone GRPO training entry point at the repo root
- run_inference.py — Simple high-level inference entry point at the repo root
- analysis/ — Plots and text outputs showing training progress and dataset impacts
- trl-cloud/ — A full TRL project layout (library, examples, tests, and scripts) with your modifications
- moblie_compatible/ — Mobile-focused training/deployment experiments and demo
Tip: The trl-cloud subfolder contains an extensive project structure (examples, tests, scripts, and more). You can use it as a reference or run its utilities directly if desired.
- Python 3.10+ is recommended
- Create and activate a virtual environment:
- Windows (PowerShell):
python -m venv .venv.\\.venv\\Scripts\\Activate.ps1
- Linux/macOS (bash):
python -m venv .venvsource .venv/bin/activate
- Windows (PowerShell):
Install core dependencies at the repository root:
pip install -r requirements.txt
Some subprojects (like trl-cloud/) also provide an additional requirements.txt for advanced scenarios.
-
Run GRPO training (root entry point):
python grpo_train.py
-
Run quick inference (root entry point):
python run_inference.py
-
Online DPO training (example):
cd dpo_onlinepython online-DPO.py
-
Inference with online DPO assets (example):
cd dpo_onlinepython inference.py
-
Offline DPO testing and evaluation:
cd dpo_testing./test_offline_dpo.sh
Adjust model names, dataset paths, and hyperparameters in the scripts to match your environment and hardware.
dpo_testing/contains utilities to measure the quality of generated outputs and compare multiple models (compare_models.py,evaluation_report.md).analysis/includes training progress figures and result summaries to understand learning dynamics.
moblie_compatible/contains mobile-focused training, deployment helpers, and simple demos, including packaging examples.
The trl-cloud/ directory contains a complete TRL-style project with:
examples/,scripts/, anddocs/for hands-on experimentationtests/for verifying functionalitydocker/for GPU-enabled containerized environments
You can cd trl-cloud and explore its scripts (e.g., commands/run_dpo.sh, commands/run_sft.sh) or run the provided tests if you have the required dependencies and hardware.
- Use Weights & Biases or your preferred logger to track experiments if the scripts support it.
- Keep an eye on VRAM requirements; downscale batch sizes and sequence lengths as needed.
- Save intermediate checkpoints frequently, especially for long runs.
- This project retains the original TRL license for the
trl-cloud/component (seetrl-cloud/LICENSE). - Your modifications are provided under the same license unless otherwise stated.
This repository builds upon the great work in the TRL ecosystem and related open-source communities.