Skip to content
/ trl Public

Fork of Hugging Face TRL (Transformer Reinforcement Learning) with custom online DPO experiments, GRPO training scripts, evaluation tooling, and mobile-oriented deployment workflows.

Notifications You must be signed in to change notification settings

f3990/trl

Repository files navigation

TRL Experiments: DPO, GRPO, Evaluation, and Inference

Note: This repository is a fork of the original TRL (Transformer Reinforcement Learning) library with custom modifications and experimental workflows.

This repository brings together practical training, evaluation, and inference workflows around preference optimization (DPO) and reinforcement-style optimization (GRPO) using the TRL library. It includes:

  • Offline/online DPO training utilities and scripts with custom modifications
  • GRPO training scripts and analysis
  • Inference helpers and evaluation harnesses
  • Mobile-oriented deployment experiments
  • A modified copy of the TRL project in the trl-cloud folder with experimental features

Fork Information

This repository is based on the HuggingFace TRL library with the following key modifications:

Modified Files

  • trl-cloud/examples/scripts/dpo_online.py - Enhanced online DPO training script with custom dataset limiting and experimental features
  • Additional custom training and evaluation scripts in root directories

Custom Additions

  • Root-level training scripts (grpo_train.py, run_inference.py)
  • Enhanced evaluation and comparison utilities
  • Mobile deployment experiments
  • Training analysis and visualization tools

Deploying Your Fork to GitHub

  • Create the repository on GitHub or use GitHub CLI:
    • gh repo create <your-repo-name> --source . --private --push --remote origin
  • Set repository metadata:
    • Add a description, topics (e.g., trl, dpo, grpo, rlhf, huggingface), and a project URL.
  • Protect branches and set required status checks as your CI evolves.

Repository Layout

  • dpo_online/ — Online DPO training and inference utilities (e.g., online-DPO.py, inference.py)
  • dpo_testing/ — Offline DPO test scripts, evaluation reports, and comparisons
  • grpo_train.py — Standalone GRPO training entry point at the repo root
  • run_inference.py — Simple high-level inference entry point at the repo root
  • analysis/ — Plots and text outputs showing training progress and dataset impacts
  • trl-cloud/ — A full TRL project layout (library, examples, tests, and scripts) with your modifications
  • moblie_compatible/ — Mobile-focused training/deployment experiments and demo

Tip: The trl-cloud subfolder contains an extensive project structure (examples, tests, scripts, and more). You can use it as a reference or run its utilities directly if desired.

Getting Started

1) Environment setup

  • Python 3.10+ is recommended
  • Create and activate a virtual environment:
    • Windows (PowerShell):
      • python -m venv .venv
      • .\\.venv\\Scripts\\Activate.ps1
    • Linux/macOS (bash):
      • python -m venv .venv
      • source .venv/bin/activate

Install core dependencies at the repository root:

  • pip install -r requirements.txt

Some subprojects (like trl-cloud/) also provide an additional requirements.txt for advanced scenarios.

2) Quick Usage

  • Run GRPO training (root entry point):

    • python grpo_train.py
  • Run quick inference (root entry point):

    • python run_inference.py
  • Online DPO training (example):

    • cd dpo_online
    • python online-DPO.py
  • Inference with online DPO assets (example):

    • cd dpo_online
    • python inference.py
  • Offline DPO testing and evaluation:

    • cd dpo_testing
    • ./test_offline_dpo.sh

Adjust model names, dataset paths, and hyperparameters in the scripts to match your environment and hardware.

Evaluations and Analysis

  • dpo_testing/ contains utilities to measure the quality of generated outputs and compare multiple models (compare_models.py, evaluation_report.md).
  • analysis/ includes training progress figures and result summaries to understand learning dynamics.

Mobile-Oriented Experiments

  • moblie_compatible/ contains mobile-focused training, deployment helpers, and simple demos, including packaging examples.

Working with trl-cloud/

The trl-cloud/ directory contains a complete TRL-style project with:

  • examples/, scripts/, and docs/ for hands-on experimentation
  • tests/ for verifying functionality
  • docker/ for GPU-enabled containerized environments

You can cd trl-cloud and explore its scripts (e.g., commands/run_dpo.sh, commands/run_sft.sh) or run the provided tests if you have the required dependencies and hardware.

Tips

  • Use Weights & Biases or your preferred logger to track experiments if the scripts support it.
  • Keep an eye on VRAM requirements; downscale batch sizes and sequence lengths as needed.
  • Save intermediate checkpoints frequently, especially for long runs.

License

  • This project retains the original TRL license for the trl-cloud/ component (see trl-cloud/LICENSE).
  • Your modifications are provided under the same license unless otherwise stated.

Acknowledgements

This repository builds upon the great work in the TRL ecosystem and related open-source communities.

About

Fork of Hugging Face TRL (Transformer Reinforcement Learning) with custom online DPO experiments, GRPO training scripts, evaluation tooling, and mobile-oriented deployment workflows.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published