TRL Experiments: DPO, GRPO, Evaluation, and Inference

Note: This repository is a fork of the original TRL (Transformer Reinforcement Learning) library with custom modifications and experimental workflows.

This repository brings together practical training, evaluation, and inference workflows around preference optimization (DPO) and reinforcement-style optimization (GRPO) using the TRL library. It includes:

Offline/online DPO training utilities and scripts with custom modifications
GRPO training scripts and analysis
Inference helpers and evaluation harnesses
Mobile-oriented deployment experiments
A modified copy of the TRL project in the trl-cloud folder with experimental features

Fork Information

This repository is based on the HuggingFace TRL library with the following key modifications:

Modified Files

trl-cloud/examples/scripts/dpo_online.py - Enhanced online DPO training script with custom dataset limiting and experimental features
Additional custom training and evaluation scripts in root directories

Custom Additions

Root-level training scripts (grpo_train.py, run_inference.py)
Enhanced evaluation and comparison utilities
Mobile deployment experiments
Training analysis and visualization tools

Deploying Your Fork to GitHub

Create the repository on GitHub or use GitHub CLI:
- gh repo create <your-repo-name> --source . --private --push --remote origin
Set repository metadata:
- Add a description, topics (e.g., trl, dpo, grpo, rlhf, huggingface), and a project URL.
Protect branches and set required status checks as your CI evolves.

Repository Layout

dpo_online/ — Online DPO training and inference utilities (e.g., online-DPO.py, inference.py)
dpo_testing/ — Offline DPO test scripts, evaluation reports, and comparisons
grpo_train.py — Standalone GRPO training entry point at the repo root
run_inference.py — Simple high-level inference entry point at the repo root
analysis/ — Plots and text outputs showing training progress and dataset impacts
trl-cloud/ — A full TRL project layout (library, examples, tests, and scripts) with your modifications
moblie_compatible/ — Mobile-focused training/deployment experiments and demo

Tip: The trl-cloud subfolder contains an extensive project structure (examples, tests, scripts, and more). You can use it as a reference or run its utilities directly if desired.

Getting Started

1) Environment setup

Python 3.10+ is recommended
Create and activate a virtual environment:
- Windows (PowerShell):
  - python -m venv .venv
  - .\\.venv\\Scripts\\Activate.ps1
- Linux/macOS (bash):
  - python -m venv .venv
  - source .venv/bin/activate

Install core dependencies at the repository root:

pip install -r requirements.txt

Some subprojects (like trl-cloud/) also provide an additional requirements.txt for advanced scenarios.

2) Quick Usage

Run GRPO training (root entry point):
- python grpo_train.py
Run quick inference (root entry point):
- python run_inference.py
Online DPO training (example):
- cd dpo_online
- python online-DPO.py
Inference with online DPO assets (example):
- cd dpo_online
- python inference.py
Offline DPO testing and evaluation:
- cd dpo_testing
- ./test_offline_dpo.sh

Adjust model names, dataset paths, and hyperparameters in the scripts to match your environment and hardware.

Evaluations and Analysis

dpo_testing/ contains utilities to measure the quality of generated outputs and compare multiple models (compare_models.py, evaluation_report.md).
analysis/ includes training progress figures and result summaries to understand learning dynamics.

Mobile-Oriented Experiments

moblie_compatible/ contains mobile-focused training, deployment helpers, and simple demos, including packaging examples.

Working with trl-cloud/

The trl-cloud/ directory contains a complete TRL-style project with:

examples/, scripts/, and docs/ for hands-on experimentation
tests/ for verifying functionality
docker/ for GPU-enabled containerized environments

You can cd trl-cloud and explore its scripts (e.g., commands/run_dpo.sh, commands/run_sft.sh) or run the provided tests if you have the required dependencies and hardware.

Tips

Use Weights & Biases or your preferred logger to track experiments if the scripts support it.
Keep an eye on VRAM requirements; downscale batch sizes and sequence lengths as needed.
Save intermediate checkpoints frequently, especially for long runs.

License

This project retains the original TRL license for the trl-cloud/ component (see trl-cloud/LICENSE).
Your modifications are provided under the same license unless otherwise stated.

Acknowledgements

This repository builds upon the great work in the TRL ecosystem and related open-source communities.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
analysis		analysis
dpo_online		dpo_online
dpo_testing		dpo_testing
trl-cloud		trl-cloud
README.md		README.md
eval_online_DPO_lora.md		eval_online_DPO_lora.md
grpo_train.py		grpo_train.py
model_comparison.png		model_comparison.png
model_comparison_400.png		model_comparison_400.png
model_comparison_800.png		model_comparison_800.png
run_inference.py		run_inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TRL Experiments: DPO, GRPO, Evaluation, and Inference

Fork Information

Modified Files

Custom Additions

Deploying Your Fork to GitHub

Repository Layout

Getting Started

1) Environment setup

2) Quick Usage

Evaluations and Analysis

Mobile-Oriented Experiments

Working with trl-cloud/

Tips

License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

f3990/trl

Folders and files

Latest commit

History

Repository files navigation

TRL Experiments: DPO, GRPO, Evaluation, and Inference

Fork Information

Modified Files

Custom Additions

Deploying Your Fork to GitHub

Repository Layout

Getting Started

1) Environment setup

2) Quick Usage

Evaluations and Analysis

Mobile-Oriented Experiments

Working with trl-cloud/

Tips

License

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages