Chaoqi Liu1, Xiaoshen Han1, Jiawei Gao1, Yue Zhao2, Haonan Chen1, Yilun Du1
1Harvard University 2Stanford University
-
Clone with submodules so that
third_party/LIBEROis available:git clone --recurse-submodules git@github.com:Chaoqi-LIU/oat.git # or, after a plain clone: git submodule update --init --recursive
-
Install
uvif you do not already have it. Follow the uv installation guide. -
Initialize the project and install all dependencies and local editable sources:
uv sync uv pip install -e .
-
Install
micromambaif you do not already have it. Follow micromamba installation guide -
Initialize the project and install all dependencies and local editable sources:
micromamba env create -f conda_env.yaml
NOTE: We encountered issues running uv on our slurm cluster, so we also provide conda/mamba as an alternative. The example commands below use uv. If you have trouble setting it up or observe significant performance degradation, please let me know.
We provide a prebuilt libero10 dataset on Hugging Face: chaoqi-liu/libero10_N500.zarr. Alternatively, follow the instructions below to build the dataset locally.
-
Download the LIBERO releases (e.g.,
libero_spatial,libero_object,libero_goal,libero_100) intodata/libero/hdf5_datasets/. (libero10is contained inlibero100)uv run third_party/LIBERO/benchmark_scripts/download_libero_datasets.py --datasets libero_[spatial/object/goal/100]
-
Convert each HDF5 dump into the repo's zarr format:
uv run scripts/convert_libero_dataset.py --root_dir data/libero --hdf5_dir_name hdf5_datasets
The script converts every
*.hdf5file it finds, savestask_N{episodes}.zarrunderdata/libero/, and prompts before overwriting existing exports. Use-n/--num_sample_demoto limit how many demos per task if needed. -
Compose a
libero10multitask zarr:uv run scripts/compose_libero_multitask_dataset.py --multitask_name libero10 --root_dir data/libero
This merges
*.zarrdatasets related tolibero10usingscripts/merge_data.py, shuffles the episodes, and writesdata/libero/libero10_N{total}.zarr.
After you have data/libero/libero10_N{n}.zarr ready, train the action tokenizer that OAT policies consume:
HYDRA_FULL_ERROR=1 uv run accelerate launch \
--num_machines [num_node] \
--multi_gpu \
--num_processes [num_gpu] \
scripts/run_workspace.py \
--config-name=train_oattok \
task/tokenizer=libero/libero10Once the tokenizer checkpoint exists, train the policy that predicts action tokens and decodes them back into actions:
HYDRA_FULL_ERROR=1 MUJOCO_GL=egl uv run accelerate launch \
--num_machines [num_node] \
--multi_gpu \
--num_processes [num_gpu] \
scripts/run_workspace.py \
--config-name=train_oatpolicy \
task/policy=libero/libero10 \
task.policy.lazy_eval=false \
policy.action_tokenizer.checkpoint=[path/to/oattok.ckpt]set lazy_eval=false would evaluate policy during training every training.rollout_every epochs.
Evaluate a trained checkpoint using scripts/eval_policy_sim.py:
uv run scripts/eval_policy_sim.py \
--checkpoint [path/to/oatpolicy.ckpt] \
--output_dir output/eval/libero10 \
--num_exp 5 # run 5 times, so we can get stderrThe script instantiates the same LIBERO runner and dataset from oat.config.task.policy.libero.libero10 and dumps per-checkpoint statistics plus optional videos to output/eval/libero10.
Also checkout sim_env, which provides a set of simulation benchmarks.
If you like this work, please cite:
@misc{liu2026oatorderedactiontokenization,
title={OAT: Ordered Action Tokenization},
author={Chaoqi Liu and Xiaoshen Han and Jiawei Gao and Yue Zhao and Haonan Chen and Yilun Du},
year={2026},
eprint={2602.04215},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2602.04215},
}This project is licensed under the MIT License - see the LICENSE file for details.