[2025/11/05] 📊 All bddl and init files have been uploaded to Huggingface (supports fast parallel evaluation): Dataset
[2025/10/29] 🌐 We launched the official project website for LIBERO-Pro (with more demos & details): Webpage
[2025/10/22] 📱 We have shared a project promotion post on Xhs: Xhs
[2025/10/20] 💬 We have created an official WeChat account (join discussions, get quick Q&A) WeChat
[2025/10/05] 🤖 We have released the full LIBERO-Pro code on GitHub: Code
[2025/10/04] 🎉 Our paper, LIBERO-Pro: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization is now available on arXiv: Paper

🌟 Follow Us

We are committed to continuously improving LIBERO-Pro based on your feedback. Our goal is to establish a fair and simple evaluation environment for Vision-Language-Action (VLA) models. Your input is invaluable in helping us achieve this goal!

🔍 Motivation

Recent VLA models have demonstrated impressive performance on known tasks; however, our observations suggest that such success largely stems from mechanical memorization of training scenarios rather than genuine acquisition of transferable task-solving strategies.

Model	Goal	P1	P2	Spatial	P1	P2	10	P1	P2	Object	P1	P2
OpenVLA
Pi0
Pi0.5
UniVLA

🟦 Original 🟧 + P1: Task Perturbation 🟩 + P2: Position Perturbation
📉 All models achieve >0.9 on original LIBERO tasks but collapse under LIBERO-PRO perturbations, showing poor true generalization.

🌍 Fairer Environment

LIBERO-Pro calls for a more rigorous, standardized, and transparent approach to measuring generalization, helping the community move beyond memorization and toward true understanding.

⚙️ Five Core Generalization Dimensions

Dimension	Description	Example Evaluation
Object	Modifies object appearance, color, and scale to test adaptability to visual shifts.	"red cup" → "yellow cup"
Position	Relocates objects within feasible spatial bounds to evaluate the model’s adaptability to spatial position changes.	Change the position of "cup" and "bowl"
Semantic	Paraphrases natural language commands to probe linguistic robustness.	"Grasp the mug" → "Pick up the cup"
Task	Redefines task logic and target states to test procedural generalization.	"Pick up the mug" → "Pick up the butter"
Environment	Replaces working environments to evaluate cross-environment robustness.	"Main table" → "Kitchen table"

🧩 These perturbations are combinable and configurable via YAML for scalable and controlled generalization studies.

Welcome to join our wechat discussion group, we will answer any questions in real time, and also welcome more in-depth academic discussion.

Installation

Clone the official LIBERO-PRO repository by run:

git clone https://github.com/Zxy-MLlab/LIBERO-PRO/

LIBERO-PRO is developed based on the original LIBERO benchmark, so it uses the same runtime environment as LIBERO—no separate environment configuration for LIBERO-PRO is needed. You only need to install the environment in accordance with LIBERO’s official requirements, as shown below:

conda create -n libero_pro python=3.8.13
conda activate libero_pro
git clone https://github.com/Zxy-MLlab/LIBERO-PRO.git
cd LIBERO-PRO
pip install -r requirements.txt
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -e .

LIBERO-Pro Evaluation

⚡️ Quick Start

Follow the steps below to quickly set up and run LIBERO-Pro for your own evaluations.

💡 Note:
To enable stable and fast parallel evaluation, we updated libero/libero/benchmark/__init__.py and libero/libero/benchmark/libero_suite_task_map.py. If you cloned the repo before 2025/11/05, please re-download and replace these two files.

1️⃣ Download Required Files

First, download all bddl_files and init_files from our official Huggingface dataset: 👉 LIBERO-Pro Dataset

2️⃣ Move Files into LIBERO-Pro Structure

Move the downloaded files into the correct LIBERO-Pro directory structure:

mv libero_data/bddl_files/* libero/libero/bddl_files/
mv libero_data/init_files/* libero/libero/init_files/

3️⃣ Configure Evaluation Settings

All evaluation parameters can be set in the file:

evaluation_config.yaml

In the this evaluation mode, only one perturbation type can be active at a time. To specify the desired perturbation, modify the corresponding field in the config file:

use_swap: false
use_object: false
use_language: false
use_task: true

Custom Evaluation (Optional)

To specify combined-type generalization evaluation, modify evaluation_config.yaml in your project directory.

Parameter	Function
use_environment	Enable/disable environment generalization evaluation
use_swap	Enable/disable position generalization evaluation
use_object	Enable/disable object generalization evaluation
use_language	Enable/disable semantic (language) generalization evaluation
use_task	Enable/disable task generalization evaluation

Note: task generalization (use_task: true) cannot be combined with others.

Evaluation on OpenVLA

Below is a reference code snippet for conducting LIBERO-PRO generalization evaluation on OpenVLA. Please place LIBERO-PRO in the following directory:

# 📁 openvla-oft-main
.
├── .idea/
├── experiments/
│   └── robot/
│       ├── aloha/
│       └── libero/
│           ├── experiments/
│           ├── LIBERO-PRO/ # our project
│           ├── libero_utils.py
│           ├── regenerate_libero_dataset.py
│           ├── run_libero_eval.py
│           ├── sample_libero_spatial_observation.pkl
│           ├── openvla_utils.py
│           └── robot_utils.py

Before evaluating, modify the run_libero_eval.py to adapt to LIBERO-RPO:

from LIBERO-PRO import perturbation

# Register for temporary evaluation tasks
class TaskSuite(str, Enum):
  ...
  LIBERO_GOAL_TEMP = "libero_goal_temp"
  LIBERO_SPATIAL_TEMP = "libero_spatial_temp"
  LIBERO_10_TEMP = "libero_10_temp"
  LIBERO_OBJECT_TEMP = "libero_object_temp"
  LIBERO_GOAL_LAN = "libero_goal_lan"
  LIBERO_SPATIAL_LAN = "libero_spatial_lan"
  LIBERO_10_LAN = "libero_10_lan"
  LIBERO_OBJECT_LAN = "libero_object_lan"
  LIBERO_GOAL_OBJECT = "libero_goal_object"
  LIBERO_SPATIAL_OBJECT = "libero_spatial_object"
  LIBERO_10_OBJECT = "libero_10_object"
  LIBERO_OBJECT_OBJECT = "libero_object_object"
  LIBERO_GOAL_SWAP = "libero_goal_swap"
  LIBERO_SPATIAL_SWAP = "libero_spatial_swap"
  LIBERO_10_SWAP = "libero_10_swap"
  LIBERO_OBJECT_SWAP = "libero_object_swap"
  LIBERO_GOAL_TASK = "libero_goal_task"
  LIBERO_SPATIAL_TASK = "libero_spatial_task"
  LIBERO_10_TASK = "libero_10_task"
  LIBERO_OBJECT_TASK = "libero_object_task"
  LIBERO_GOAL_ENV = "libero_goal_env"
  LIBERO_SPATIAL_ENV = "libero_spatial_env"
  LIBERO_10_ENV = "libero_10_env"
  LIBERO_OBJECT_ENV = "libero_object_env"

TASK_MAX_STEPS = {
  ...
  TaskSuite.LIBERO_GOAL_TEMP: 300,
  TaskSuite.LIBERO_SPATIAL_TEMP: 220,
  TaskSuite.LIBERO_10_TEMP: 520,
  TaskSuite.LIBERO_OBJECT_TEMP: 280,
  TaskSuite.LIBERO_GOAL_LAN: 300,
  TaskSuite.LIBERO_SPATIAL_LAN: 220,
  TaskSuite.LIBERO_10_LAN: 520,
  TaskSuite.LIBERO_OBJECT_LAN: 280,
  TaskSuite.LIBERO_GOAL_OBJECT: 300,
  TaskSuite.LIBERO_SPATIAL_OBJECT: 220,
  TaskSuite.LIBERO_10_OBJECT: 520,
  TaskSuite.LIBERO_OBJECT_OBJECT: 280,
  TaskSuite.LIBERO_GOAL_SWAP: 300,
  TaskSuite.LIBERO_SPATIAL_SWAP: 220,
  TaskSuite.LIBERO_10_SWAP: 520,
  TaskSuite.LIBERO_OBJECT_SWAP: 280,
  TaskSuite.LIBERO_GOAL_TASK: 300,
  TaskSuite.LIBERO_SPATIAL_TASK: 220,
  TaskSuite.LIBERO_10_TASK: 520,
  TaskSuite.LIBERO_OBJECT_TASK: 280,
  TaskSuite.LIBERO_GOAL_ENV: 300,
  TaskSuite.LIBERO_SPATIAL_ENV: 220,
  TaskSuite.LIBERO_10_ENV: 520,
  TaskSuite.LIBERO_OBJECT_ENV: 280,
}

# Modify this line
def check_unnorm_key(cfg: GenerateConfig, model) -> None:
  ...
  unnorm_key = cfg.unnorm_key
  ...

# Modify this line
def eval_libero(cfg: GenerateConfig) -> float:
  ...
      with open(cfg.evaluation_config_path, "r", encoding="utf-8") as f:
        evaluation_cfg = yaml.safe_load(f)

    evaluation_cfg["bddl_files_path"] = evaluation_cfg.get("bddl_files_path", "") + "/" + cfg.task_suite_name
    evaluation_cfg["task_suite_name"] = cfg.task_suite_name

    use_swap = evaluation_cfg.get("use_swap", False)
    use_object = evaluation_cfg.get("use_object", False)
    use_language = evaluation_cfg.get("use_language", False)
    use_task = evaluation_cfg.get("use_task", False)
    use_environment = evaluation_cfg.get("use_environment", False)

    # Step 1: Check if only one of the use_xxx flags is True
    if sum([use_swap, use_object, use_language, use_task, use_environment]) > 1:
        # If more than one flag is True, use the temp environment
        bddl_file_path = evaluation_cfg.get("bddl_files_path", "") + cfg.task_suite_name + "_temp/"

        init_file_path = evaluation_cfg.get("init_file_dir", "") + cfg.task_suite_name + "_temp/"

        # Check if the directories exist and the log.txt file contents match
        if not os.path.exists(bddl_file_path) or not os.path.exists(init_file_path):
            # If directories don't exist, create them and the log.txt file
            os.makedirs(init_file_path, exist_ok=True)
            os.makedirs(bddl_file_path, exist_ok=True)

            # Create the log.txt dynamically based on current flag values
            log_content = f"{use_swap},{use_object},{use_language},{use_task},{use_environment}"
            with open(os.path.join(bddl_file_path, "log.txt"), "w") as log_file:
                log_file.write(log_content)  # Write the dynamic state to the log file

            perturbation.create_env(configs=evaluation_cfg)
        else:
            # If directories exist, check the contents of the log.txt file
            with open(os.path.join(bddl_file_path, "log.txt"), "r") as log_file:
                log_contents = log_file.read().strip()

            # Define the expected log content based on the current flags
            expected_log = f"{use_swap},{use_object},{use_language},{use_task},{use_environment}"

            # If the log contents don't match, clean up and recreate the environment
            if log_contents != expected_log:
                # Remove existing files in both directories
                for folder in [bddl_file_path, init_file_path]:
                    for root, dirs, files in os.walk(folder, topdown=False):
                        for name in files:
                            os.remove(os.path.join(root, name))
                        for name in dirs:
                            os.rmdir(os.path.join(root, name))
                # Create the environment again
                os.makedirs(init_file_path, exist_ok=True)
                os.makedirs(bddl_file_path, exist_ok=True)

                # Write the updated log content based on current flags
                with open(os.path.join(bddl_file_path, "log.txt"), "w") as log_file:
                    log_file.write(expected_log)  # Write the updated log

                perturbation.create_env(configs=evaluation_cfg)

        # Update task_suite_name with "_temp" suffix
        cfg.task_suite_name = cfg.task_suite_name + "_temp"

    # Step 2: Handle the case when only one use_xxx flag is True
    else:
        if use_swap:
            perturb_key = "use_swap"
        elif use_object:
            perturb_key = "use_object"
        elif use_language:
            perturb_key = "use_language"
        elif use_task:
            perturb_key = "use_task"
        elif use_environment:
            perturb_key = "use_environment"

        init_file_path = evaluation_cfg.get("init_file_dir", "") + cfg.task_suite_name + "_" + evaluation_cfg.get(
            "perturbation_mapping", {}).get(perturb_key, "")

        if not os.path.exists(init_file_path):
            perturbation.create_env(configs=evaluation_cfg)

        cfg.task_suite_name = cfg.task_suite_name + "_" + evaluation_cfg.get("perturbation_mapping", {}).get(perturb_key, "")
  ...

Note!!! For unknown reasons, in some cases replacing the environment will cause the objects on the table to move randomly. After many tests, replacing the environment with 'main_table' works and we are actively in contact with the authors of LIBERO to fix this issue.

🏆 LIBERO-Pro Model Leaderboard

The following table summarizes model performance under five generalization perturbations in LIBERO-Pro. Each cell represents the normalized success rate (0.00–1.00).

Model	LIBERO-Goal					LIBERO-Spatial					LIBERO-10					LIBERO-Object					Total
Model	Obj	Pos	Sem	Task	Env	Obj	Pos	Sem	Task	Env	Obj	Pos	Sem	Task	Env	Obj	Pos	Sem	Task	Env	Total
OpenVLA	0.96	0.00	0.98	0.00	0.98	0.97	0.00	0.97	0.00	0.89	0.81	0.00	0.96	0.00	0.85	0.98	0.00	0.98	0.00	0.00	0.52
Pi0	0.94	0.00	0.93	0.00	0.39	0.95	0.00	0.97	0.00	0.60	0.79	0.00	0.82	0.00	0.27	0.94	0.00	0.90	0.00	0.29	0.44
Pi0.5	0.97	0.38	0.97	0.00	0.46	0.97	0.20	0.97	0.01	0.46	0.92	0.08	0.93	0.01	0.46	0.98	0.17	0.96	0.01	0.73	0.53
Molmoact	0.68	0.00	0.85	0.00	-	0.90	0.00	0.88	0.00	-	0.54	0.00	0.74	0.06	-	0.92	0.06	0.96	0.00	-	0.41
NORA	0.58	0.00	0.88	0.00	-	0.92	0.00	0.91	0.00	-	0.46	0.00	0.74	0.00	-	0.86	0.00	0.92	0.00	-	0.40
x-VLA	0.68	0.01	0.98	0.09	-	0.97	0.00	0.96	0.00	-	0.62	0.00	0.95	0.10	-	0.89	0.02	0.98	0.08	-	0.46

✅ We will continue to expand the LIBERO-PRO leaderboard with new model evaluations. Researchers are warmly invited to use LIBERO-PRO to assess their Vision-Language-Action (VLA) models and share the results with us for inclusion in the official online leaderboard.

Initial Position Perturbation Experiment

This guide provides a step-by-step procedure for reproducing the Object Position Perturbation Evaluation and replicating the results shown in Figure 6 of the paper.

💡 We have pre-packaged all necessary .init and .bddl files required for evaluation. You can easily reproduce the experiment by following the steps below.

🚀 Quick Start

1️⃣ Prepare the BDDL Files

Execute the following commands to set up the perturbed BDDL configuration:

# Navigate to the BDDL directory
cd libero/libero/bddl_files/

# Create a new folder for the perturbation experiment
mkdir -p libero_object_temp

# Copy the target perturbation configuration (e.g., x0.1)
cp -r libero_object_temp_x0.1/* libero_object_temp/

🧩 This creates the libero_object_temp directory containing all .bddl files required for the object position perturbation experiment.

2️⃣ Prepare the Initialization Files

Similarly, set up the initialization configuration directory:

# Navigate to the initialization directory
cd libero/libero/init_files/

# Create a matching subdirectory
mkdir -p libero_object_temp

# Copy the initialization configuration (e.g., x0.1)
cp -r libero_object_temp_x0.1/* libero_object_temp/

💡 Ensure that both bddl_files and init_files share consistent naming conventions (e.g., libero_object_temp_x0.1 → libero_object_temp).

3️⃣ Configure Perturbation Intensity (Optional)

You can adjust the perturbation intensity based on your experimental requirements.
The following levels are supported:

Perturbation Axis	Available Levels	Description
X-axis Perturbation	`x0.1`, `x0.2`, `x0.3`, `x0.4`, `x0.5`	Object translation along the X-axis
Y-axis Perturbation	`y0.1`, `y0.2`, `y0.3`, `y0.4`, `y0.5`	Object translation along the Y-axis

Example: to test a specific perturbation level, simply copy the corresponding configuration:

# Example: apply perturbation magnitude x0.3
cp -r libero_object_temp_x0.3/* libero_object_temp/

# Example: apply perturbation magnitude y0.5
cp -r libero_object_temp_y0.5/* libero_object_temp/

⚙️ Modify the perturbation axis and magnitude to simulate different spatial displacement conditions.

4️⃣ Run the Evaluation

Using OpenVLA as an example, execute the following command to perform the evaluation:

# Navigate to the project root
cd libero/

# Run the perturbation evaluation
python run_libero_eval.py

The script automatically detects and loads perturbation data from libero/libero/bddl_files/libero_object_temp/ and libero/libero/init_files/libero_object_temp/.

Citation

If you use LIBERO-PRO in your research, please cite both LIBERO and LIBERO-PRO:

@article{liu2023libero,
  title={LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning},
  author={Liu, Bo and Zhu, Yifeng and Gao, Chongkai and Feng, Yihao and Liu, Qiang and Zhu, Yuke and Stone, Peter},
  journal={arXiv preprint arXiv:2306.03310},
  year={2023}
}

@article{zhou2025liberopro,
  title={LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization},
  author={Xueyang Zhou and Yangming Xu and Guiyao Tie and Yongchao Chen and Guowen Zhang and Duanfeng Chu and Pan Zhou and Lichao Sun},
  journal={[arXiv preprint arXiv:2510.03827]},
  year={2025},
  publisher={[Publisher]} / eprint={[arXiv ID]}
}

License

Component	License
Codebase	MIT License
Datasets	Creative Commons Attribution 4.0 International (CC BY 4.0)

💡 LIBERO-Pro — advancing the frontier of robust and fair generalization evaluation for Vision-Language-Action Models.

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
benchmark_scripts		benchmark_scripts
images		images
libero		libero
libero_ood		libero_ood
notebooks		notebooks
scripts		scripts
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluation_config.yaml		evaluation_config.yaml
perturbation.py		perturbation.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LIBERO-Pro: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization

✨ News ✨

🌟 Follow Us

🔍 Motivation

🌍 Fairer Environment

⚙️ Five Core Generalization Dimensions

Contents

Installation

LIBERO-Pro Evaluation

⚡️ Quick Start

1️⃣ Download Required Files

2️⃣ Move Files into LIBERO-Pro Structure

3️⃣ Configure Evaluation Settings

Custom Evaluation (Optional)

Evaluation on OpenVLA

🏆 LIBERO-Pro Model Leaderboard

Initial Position Perturbation Experiment

🚀 Quick Start

1️⃣ Prepare the BDDL Files

2️⃣ Prepare the Initialization Files

3️⃣ Configure Perturbation Intensity (Optional)

4️⃣ Run the Evaluation

Citation

License

About

Uh oh!

Releases

Packages

Languages

License

Zxy-MLlab/LIBERO-PRO

Folders and files

Latest commit

History

Repository files navigation

LIBERO-Pro: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization

✨ News ✨​

🌟 Follow Us

🔍 Motivation

🌍 Fairer Environment

⚙️ Five Core Generalization Dimensions

Contents

Installation

LIBERO-Pro Evaluation

⚡️ Quick Start

1️⃣ Download Required Files

2️⃣ Move Files into LIBERO-Pro Structure

3️⃣ Configure Evaluation Settings

Custom Evaluation (Optional)

Evaluation on OpenVLA

🏆 LIBERO-Pro Model Leaderboard

Initial Position Perturbation Experiment

🚀 Quick Start

1️⃣ Prepare the BDDL Files

2️⃣ Prepare the Initialization Files

3️⃣ Configure Perturbation Intensity (Optional)

4️⃣ Run the Evaluation

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

✨ News ✨

Packages