Weihua Du, Hailei Gong, Zhan Ling, Kang Liu, Lingfeng Shen, Xuesong Yao, Yufei Xu, Dingyuan Shi, Yiming Yang, Jiecao Chen
"Generalizable End-to-End Tool-Use RL with Synthetic CodeGym" (2025)
CodeGym is a synthetic environment generation framework for LLM agent reinforcement learning on multi-turn tool-use tasks. It automatically converts static code problems into interactive CodeGym environments where agents can learn to use tools to solve complex tasks in various configurations.
We are open-sourcing the following key parts of the project:
- CodeGym environment synthesis pipeline: refer to
gym/README.mdfor details. - Server for launching CodeGym environments aimed at large-scale reinforcement learning: refer to
online_server/README.mdfor details.
A community reproduction of the synthetic dataset is available at HuggingFace.
CodeGym transforms traditional code problems into interactive environments where LLM agents can learn to:
- Use tools and actions to solve problems step-by-step
- Learn generalizable tool-use behaviors
We designed an elaborate process for CodeGym environment synthesis and verification:
Gym Synthesis:
- Extract reusable code logic and functions from programming solutions
- Convert them into a library of documented tools and utilities
- Generate OpenAI Gym format environments with state, actions, transitions, and rewards
Gym Verification:
- Generate comprehensive unit tests spanning multiple difficulty levels
- Validate environment correctness (no compilation errors, timeouts, or memory issues)
- Verify solvability by generating solution functions that successfully use the provided tools
The example/ folder contains sample CodeGym environments to help you get started:
example/example_envscontains some CodeGym environments examplesexample/training_instance.jsonlcontains some instances for RL trainingexample/raw_problems.jsonlcontains some raw coding problems for generation pipeline demonstration
By training in CodeGym, LLMs show stronger generalization on out-of-distribution (OOD) tool-use and multi-turn benchmarks:
We release the pipeline for environment synthesis and verification. Please refer to gym/README.md for details.
We release a highly concurrent server for launching CodeGym environments aimed at large-scale reinforcement learning. Please refer to online_server/README.md for details.
This project and dataset are released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.
If you find this work useful, please cite our paper:
@article{du2025generalizable,
title={Generalizable End-to-End Tool-Use RL with Synthetic CodeGym},
author={Du, Weihua and Gong, Hailei and Ling, Zhan and Liu, Kang and Shen, Lingfeng and Yao, Xuesong and Xu, Yufei and Shi, Dingyuan and Yang, Yiming and Chen, Jiecao},
journal={arXiv preprint arXiv:2509.17325},
year={2025}
}

