Generalizable End-to-End Tool-Use RL with Synthetic CodeGym

Weihua Du, Hailei Gong, Zhan Ling, Kang Liu, Lingfeng Shen, Xuesong Yao, Yufei Xu, Dingyuan Shi, Yiming Yang, Jiecao Chen
"Generalizable End-to-End Tool-Use RL with Synthetic CodeGym" (2025)

CodeGym is a synthetic environment generation framework for LLM agent reinforcement learning on multi-turn tool-use tasks. It automatically converts static code problems into interactive CodeGym environments where agents can learn to use tools to solve complex tasks in various configurations.

Key Components

We are open-sourcing the following key parts of the project:

CodeGym environment synthesis pipeline: refer to gym/README.md for details.
Server for launching CodeGym environments aimed at large-scale reinforcement learning: refer to online_server/README.md for details.

A community reproduction of the synthetic dataset is available at HuggingFace.

Overview

CodeGym transforms traditional code problems into interactive environments where LLM agents can learn to:

Use tools and actions to solve problems step-by-step
Learn generalizable tool-use behaviors

Environment Synthesis Process

We designed an elaborate process for CodeGym environment synthesis and verification:

Gym Synthesis:

Extract reusable code logic and functions from programming solutions
Convert them into a library of documented tools and utilities
Generate OpenAI Gym format environments with state, actions, transitions, and rewards

Gym Verification:

Generate comprehensive unit tests spanning multiple difficulty levels
Validate environment correctness (no compilation errors, timeouts, or memory issues)
Verify solvability by generating solution functions that successfully use the provided tools

Examples

The example/ folder contains sample CodeGym environments to help you get started:

example/example_envs contains some CodeGym environments examples
example/training_instance.jsonl contains some instances for RL training
example/raw_problems.jsonl contains some raw coding problems for generation pipeline demonstration

Key Result

By training in CodeGym, LLMs show stronger generalization on out-of-distribution (OOD) tool-use and multi-turn benchmarks:

CodeGym Synthesis Pipeline

We release the pipeline for environment synthesis and verification. Please refer to gym/README.md for details.

Server for CodeGym Environments

We release a highly concurrent server for launching CodeGym environments aimed at large-scale reinforcement learning. Please refer to online_server/README.md for details.

License

This project and dataset are released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.

Citation

If you find this work useful, please cite our paper:

@article{du2025generalizable,
  title={Generalizable End-to-End Tool-Use RL with Synthetic CodeGym},
  author={Du, Weihua and Gong, Hailei and Ling, Zhan and Liu, Kang and Shen, Lingfeng and Yao, Xuesong and Xu, Yufei and Shi, Dingyuan and Yang, Yiming and Chen, Jiecao},
  journal={arXiv preprint arXiv:2509.17325},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
asset		asset
example		example
gym		gym
online_server		online_server
prompt_cn		prompt_cn
prompt_en		prompt_en
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generalizable End-to-End Tool-Use RL with Synthetic CodeGym

Key Components

Overview

Environment Synthesis Process

Examples

Key Result

CodeGym Synthesis Pipeline

Server for CodeGym Environments

License

Citation

About

Uh oh!

Releases

Packages

Languages

License

StigLidu/CodeGym

Folders and files

Latest commit

History

Repository files navigation

Generalizable End-to-End Tool-Use RL with Synthetic CodeGym

Key Components

Overview

Environment Synthesis Process

Examples

Key Result

CodeGym Synthesis Pipeline

Server for CodeGym Environments

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages