MDriveBench: MDriveBench: Multi-Agent Multi-Granular Driving Benchmark

TODO: Add project overview

Repository Structure and Scope

This repository implements MDriveBench, a multi-agent driving benchmark.
It was originally built on top of CoLMDriver; CoLMDriver is now one of multiple models in the repository, and the benchmark infrastructure has been built on top of it.

MDriveBench provides:

Benchmark infrastructure (CARLA integration, scenarios, evaluation, analysis)
Multiple baseline and LLM-based driving models (TCP, CoDriving, LMDrive, UniAD, CoLMDriver, and VAD)
Training code for CoLMDriver components

Global Setup

General Setup

Two environments are needed: 'vllm' for MLLMs inference and 'colmdriver' for simulation.

vLLM env

Step 1: Install conda (if not installed already)

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Step 2: Environment Creation and VLLM download

conda create -n vllm python=3.10
conda activate vllm
pip install vllm

CoLMDriver env

Step 1: Basic Installation for colmdriver

Get code and create pytorch environment.

git clone https://github.com/marco-cos/CoLMDriver.git
cd CoLMDriver

conda create --name colmdriver python=3.7 cmake=3.22.1
conda activate colmdriver
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
conda install cudnn -c conda-forge

pip install -r opencood/requirements.txt
pip install -r simulation/requirements.txt
pip install openai

Step 2: Download and setup CARLA 0.9.10.1.

chmod +x simulation/setup_carla.sh
./simulation/setup_carla.sh
easy_install carla/PythonAPI/carla/dist/carla-0.9.10-py3.7-linux-x86_64.egg
mkdir external_paths
ln -s ${PWD}/carla/ external_paths/carla_root
# If you already have a Carla, just create a soft link to external_paths/carla_root

The file structure should be:

|--CoLMDriver
    |--external_paths
        |--carla_root
            |--CarlaUE4
            |--Co-Simulation
            |--Engine
            |--HDMaps
            |--Import
            |--PythonAPI
            |--Tools
            |--CarlaUE4.sh
            ...

Note: we choose the setuptools==41 to install because this version has the feature easy_install. After installing the carla.egg you can install the lastest setuptools to avoid No module named distutils_hack.

Steps 3,4,5 are for perception module.

Step 3: Install Spconv (1.2.1)

We use spconv 1.2.1 to generate voxel features in perception module.

To install spconv 1.2.1, please follow the guide in https://github.com/traveller59/spconv/tree/v1.2.1.

Or run the following commands:

# 1. Activate your environment (if not activated already)
conda activate colmdriver

# 2. Install dependencies (these are all user-space)
conda install -y cmake=3.22.1 ninja boost ccache -c conda-forge
pip install pybind11 numpy

# 3. Clone spconv recursively (submodules required!)
git clone -b v1.2.1 --recursive https://github.com/traveller59/spconv.git
cd spconv

# 4. Build the wheel (will compile in your conda CUDA toolchain)
python setup.py bdist_wheel

# 5. Install the resulting .whl (no sudo needed)
pip install dist/spconv-1.2.1-*.whl

cd ..

Step 4: Set up

# Set up
python setup.py develop

# Bbx IOU cuda version compile
python opencood/utils/setup.py build_ext --inplace

Step 5: Install pypcd

# go to another folder
cd ..
git clone https://github.com/klintan/pypcd.git
cd pypcd
pip install python-lzf
python setup.py install
cd ..

Baseline Evaluation Setup

Evaluation of baselines

Setup and get ckpts.

Methods	TCP	CoDriving
Installation Guide	github	github
Checkpoints	google drive	google drive

The downloaded checkpoints should follow this structure:

|--CoLMDriver
    |--ckpt
        |--codriving
            |--perception
            |--planning
        |--TCP
            |--new.ckpt

TCP Environment Setup

Create TCP conda environment

cd CoLMDriver
conda env create -f model_envs/tcp_codriving.yaml -n tcp_codriving
conda activate tcp_codriving

Set CARLA path environment variables

export CARLA_ROOT=PATHTOYOURREPOROOT/CoLMDriver/external_paths/carla_root
export PYTHONPATH=$CARLA_ROOT/PythonAPI:$CARLA_ROOT/PythonAPI/carla:$CARLA_ROOT/PythonAPI/carla/dist/carla-0.9.10-py3.7-linux-x86_64.egg

CoDriving Environment Setup

Create CoDriving conda environment

cd CoLMDriver
conda env create -f model_envs/tcp_codriving.yaml -n tcp_codriving
conda activate tcp_codriving

Set CARLA path environment variables

export CARLA_ROOT=PATHTOYOURREPOROOT/CoLMDriver/external_paths/carla_root
export PYTHONPATH=$CARLA_ROOT/PythonAPI:$CARLA_ROOT/PythonAPI/carla:$CARLA_ROOT/PythonAPI/carla/dist/carla-0.9.10-py3.7-linux-x86_64.egg

LMDrive Environment Setup

Clone LMDrive into the assets directory

git clone https://github.com/opendilab/LMDrive simulation/assets/LMDrive

Prepare LMDrive checkpoints

cd simulation/assets/LMDrive
mkdir -p ckpt

Download and place the following into simulation/assets/LMDrive/ckpt:

Vision encoder: https://huggingface.co/OpenDILabCommunity/LMDrive-vision-encoder-r50-v1.0
LMDrive LLaVA weights: https://huggingface.co/OpenDILabCommunity/LMDrive-llava-v1.5-7b-v1.0

Download and place the following into CoLMDriver/ckpt/llava-v1.5-7b:

Base LLaVA model: https://huggingface.co/liuhaotian/llava-v1.5-7b

Create environment and install dependencies

cd CoLMDriver
conda env create -f model_envs/lmdrive.yaml -n lmdrive
conda activate lmdrive

pip install carla-birdeye-view==1.1.1 --no-deps
pip install -e simulation/assets/LMDrive/vision_encoder

Set CARLA path environment variables

export CARLA_ROOT=PATHTOYOURREPOROOT/CoLMDriver/external_paths/carla_root
export PYTHONPATH=$CARLA_ROOT/PythonAPI:$CARLA_ROOT/PythonAPI/carla:$CARLA_ROOT/PythonAPI/carla/dist/carla-0.9.10-py3.7-linux-x86_64.egg

UniAD Environment Setup

UniAD is a unified perception–prediction–planning autonomous driving model.
We evaluate it on the InterDrive benchmark using its official pretrained weights and a standardized conda environment to avoid dependency conflicts.

To ensure consistent and reproducible evaluation of the UniAD baseline model, we standardize the environment setup using a pre-built conda environment. This avoids dependency conflicts and ensures that anyone can run UniAD without rebuilding environments from scratch.

The YAML file for the UniAD environment is located in:

model_envs/uniad_env.yaml

To create and activate the environment:

conda env create -f model_envs/uniad_env.yaml -n uniad_env
conda activate uniad_env

UniAD runs inside the uniad_env conda environment, which contains all required CUDA, PyTorch, CARLA, and UniAD dependencies.

Additional Files

Create a ckpt/UniAD directory if it does not exist: mkdir -p CoLMDriver/ckpt/UniAD

Download the UniAD checkpoint from https://huggingface.co/rethinklab/Bench2DriveZoo/blob/main/uniad_base_b2d.pth and place it here:

CoLMDriver/ckpt/UniAD/uniad_base_b2d.pth

Download the UniAD config file from https://github.com/Thinklab-SJTU/Bench2DriveZoo/blob/uniad/vad/adzoo/uniad/configs/stage2_e2e/base_e2e_b2d.py and place it in:

simulation/assets/UniAD/base_e2e_b2d.py

VAD Environment Setup

The YAML file for the VAD environment is located in:

model_envs/vad_env.yaml

Create VAD conda environment

cd CoLMDriver
conda env create -f model_envs/vad_env.yaml -n vad
conda activate vad

2. Start a Carla Instance

CUDA_VISIBLE_DEVICES=0 ./external_paths/carla_root/CarlaUE4.sh --world-port=2000 -prefer-nvidia

Run VAD on Interdrive

# CARLA must already be running on port 2000
bash scripts/eval/eval_mode.sh 0 2000 vad ideal Interdrive_all

CoLMDriver Model Setup

Step 1: Download checkpoints from Google drive. The downloaded checkpoints of CoLMDriver should follow this structure:

|--CoLMDriver
    |--ckpt
        |--colmdriver
            |--LLM
            |--perception
            |--VLM
            |--waypoints_planner

To download the checkpoints through command line and move them into the correct directories (no GUI required):

#In CoLMDriver repostiory directory, with colmdriver conda env activated
pip install gdown
gdown 1z3poGdoomhujCNQtoQ80-BCO34GTOLb-

mkdir ckpt
mv colmdriver.zip ckpt
cd ckpt
unzip colmdriver.zip
rm colmdriver.zip

#Fix obsolete dataset dependancy bug
sed -i "s|root_dir: .*|root_dir: $(pwd)|; s|test_dir: .*|test_dir: $(pwd)|; s|validate_dir: .*|validate_dir: $(pwd)|" colmdriver/percpetion/config.yaml
touch dataset_index.txt

Step 2: Running VLM, LLM (from repository root)

#Enter conda ENV
conda activate vllm
# VLM on call
CUDA_VISIBLE_DEVICES=6 vllm serve ckpt/colmdriver/VLM --port 1111 --max-model-len 8192 --trust-remote-code --enable-prefix-caching

# LLM on call (in new terminal, with vllm env activated)
CUDA_VISIBLE_DEVICES=7 vllm serve ckpt/colmdriver/LLM --port 8888 --max-model-len 4096 --trust-remote-code --enable-prefix-caching

Make sure that the CUDA_VISIBLE_DEVICES variable is set to a GPU available, which can be checked using the nvidia-smi command

Note: make sure that the selected ports (1111,8888) are not occupied by other services. If you use other ports, please modify values of key 'comm_client' and 'vlm_client' in simulation/leaderboard/team_code/agent_config/colmdriver_config.yaml accordingly.

Benchmark Evaluation on InterDrive

All models are evaluated on the InterDrive benchmark using a unified interface:

bash scripts/eval/eval_mode.sh <GPU_ID> <CARLA_PORT> <MODEL_NAME> <MODE> <SCENARIO_SET>

Where:

<MODEL_NAME> ∈ { colmdriver, tcp, codriving, lmdrive, uniad, vad }
<MODE> ∈ { ideal, realtime } (where supported)
<SCENARIO_SET> ∈ { Interdrive_all, Interdrive_no_npc, Interdrive_npc }

Make sure you have:

The corresponding conda environment activated for each model (e.g., tcp_codriving, lmdrive, uniad_env, colmdriver, etc.)
Any model-specific services running (e.g., VLM/LLM servers for CoLMDriver)

Start CARLA

# Start CARLA server; change port if 2000 is already in use
CUDA_VISIBLE_DEVICES=0 ./external_paths/carla_root/CarlaUE4.sh --world-port=2000 -prefer-nvidia

If CARLA segfaults on startup, try:

conda install -c conda-forge libglvnd mesa-libgl-devel libegl libxrender libxext libxi

Example evaluation commands

# TCP, full InterDrive
bash scripts/eval/eval_mode.sh 0 2000 tcp ideal Interdrive_all

# CoDriving, full InterDrive
bash scripts/eval/eval_mode.sh 0 2000 codriving ideal Interdrive_all

# LMDrive, full InterDrive
bash scripts/eval/eval_mode.sh 0 2000 lmdrive ideal Interdrive_all

# UniAD, full InterDrive
bash scripts/eval/eval_mode.sh 0 2000 uniad ideal Interdrive_all

# CoLMDriver: full benchmark, realtime mode, and subsets
bash scripts/eval/eval_mode.sh 0 2000 colmdriver ideal Interdrive_all
bash scripts/eval/eval_mode.sh 0 2000 colmdriver realtime Interdrive_all
bash scripts/eval/eval_mode.sh 0 2000 colmdriver ideal Interdrive_no_npc
bash scripts/eval/eval_mode.sh 0 2000 colmdriver ideal Interdrive_npc

Evaluation results are saved under:

results/results_driving_<MODEL_NAME>

For example:

results/results_driving_colmdriver
results/results_driving_tcp
results/results_driving_lmdrive

It’s recommended to run the LLM server, VLM server, CARLA server, and evaluation script in separate terminals. CARLA processes may fail to stop cleanly; kill them manually if needed.

LLM-Driven Scenario Generation

TODO: Add documentation for LLM-driven scenario generation, including:

Natural-language specification of driving scenarios
Conversion from language to executable scenarios
Support for multi-agent negotiation and coordination behaviors

Results Analysis and Visualization

Results Analysis

The repository includes a comprehensive results analysis script that generates detailed reports, visualizations, and statistics about driving performance and negotiation behavior.

Basic Usage

# Basic analysis of results directory
python visualization/results_analysis.py results/results_driving_colmdriver --output-dir report

# Multiple experiment folders
python visualization/results_analysis.py results/results_driving_colmdriver exp1 exp2 --output-dir report

# Generate single markdown report for multiple experiments
python visualization/results_analysis.py results/results_driving_colmdriver exp1 exp2 --output-dir report --markdown report/combined.md

Generated Analysis

The script generates:

Markdown Report: Comprehensive analysis with embedded figures
CSV Data Tables:
- Per-route summary
- Category summaries
- Negotiation statistics (scenario/agent/setting breakdowns)
- Infractions breakdown
Visualizations:
- Driving scores by scenario category
- Success rates across traffic conditions
- NPC impact analysis
- Negotiation frequency, rounds, and message-length distributions
- Agent count distribution
- Score distributions
- Infractions breakdown
Artifacts:
- Text report summarizing negotiation behavior
- Collected nego.json files copied into the output directory for easy sharing

Key Metrics Analyzed

Driving Score (DS) and Success Rate
Route Categories (IC/LM/LC) performance
Impact of NPC traffic
Negotiation behavior:
- Frequency
- Number of rounds
- Consensus scores
- Safety scores
Agent counts and interactions
Infractions breakdown

The analysis helps understand:

How different traffic conditions affect performance
Which scenarios trigger most negotiations
How negotiation patterns vary across scenarios
Where driving performance needs improvement

Visualizing Results

The repository provides tools to generate videos from evaluation results:

# Generate video for a specific scenario
python visualization/gen_video.py path/to/scenario/folder --output scenario.mp4

# Options:
--fps VALUE           Set video framerate (default: 10)
--width VALUE         Set output width in pixels
--height VALUE        Set output height in pixels
--font-scale VALUE    Adjust text overlay size
--min-hold VALUE     Minimum seconds to show overlay text

# Examples:
# Basic video with default settings
python visualization/gen_video.py results/results_driving_colmdriver/route_00/0000 --output route00_test.mp4

# High quality render with custom settings
python visualization/gen_video.py results/results_driving_colmdriver/route_00/0000 \
    --output route00_hq.mp4 \
    --fps 30 \
    --width 1920 \
    --height 1080 \
    --font-scale 1.2

# Process multiple scenarios
python visualization/gen_video.py results/results_driving_colmdriver/route_*/0000 \
    --output-dir videos/

Features:

Multi-vehicle perspective rendering
Negotiation overlay visualization
Configurable resolution and framerate
Automatic scenario discovery
Progress tracking
Font size and text display customization

Dataset

The dataset for training CoLMDriver is obtained from V2Xverse, which contains experts behaviors in CARLA. You may get the dataset in two ways:

Download from this huggingface repository.
Generate the dataset by yourself, following this guidance.

The dataset should be linked/stored under external_paths/data_root/ follow this structure:

|--data_root
    |--weather-0
        |--data
            |--routes_town{town_id}_{route_id}_w{weather_id}_{datetime}
                |--ego_vehicle_{vehicle_id}
                    |--2d_bbs_{direction}
                    |--3d_bbs
                    |--actors_data
                    |--affordances
                    |--bev_visibility
                    |--birdview
                    |--depth_{direction}
                    |--env_actors_data
                    |--lidar
                    |--lidar_semantic_front
                    |--measurements
                    |--rgb_{direction}
                    |--seg_{direction}
                    |--topdown
                |--rsu_{vehicle_id}
                |--log
            ...

Training

Perception module

Our perception module follows CoDriving. To train perception module from scratch or a continued checkpoint, run the following commonds:

# Single GPU training
python opencood/tools/train.py -y opencood/hypes_yaml/v2xverse/colmdriver_multiclass_config.yaml [--model_dir ${CHECKPOINT_FOLDER}]

# DDP training
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch  --nproc_per_node=2 --use_env opencood/tools/train_ddp.py -y opencood/hypes_yaml/v2xverse/colmdriver_multiclass_config.yaml [--model_dir ${CHECKPOINT_FOLDER}]

# Offline testing of perception
python opencood/tools/inference_multiclass.py --model_dir ${CHECKPOINT_FOLDER}

The training outputs can be found at opencood/logs. Arguments Explanation:

model_dir (optional) : the path of the checkpoints. This is used to fine-tune or continue-training. When the model_dir is given, the trainer will discard the hypes_yaml and load the config.yaml in the checkpoint folder. In this case, ${CONFIG_FILE} can be None,
--nproc_per_node indicate the GPU number you will use.

Planning module

Given a checkpoint of perception module, we freeze its parameters and train the down-stream planning module in an end-to-end paradigm. The planner gets BEV perception feature and occupancy map as input and targets to predict the future waypoints of ego vehicle.

Train the planning module with a given perception checkpoint on multiple GPUs:

# Train planner
bash scripts/train/train_planner_e2e.sh $GPU_ids $num_GPUs $perception_ckpt $planner_config $planner_ckpt_resume $name_of_log $save_path

# Example
bash scripts/train/train_planner_e2e.sh 0,1 2 ckpt/colmdriver/percpetion covlm_cmd_extend_adaptive_20 None log ./ckpt/colmdriver_planner

# Offline test
bash scripts/eval/eval_planner_e2e.sh 0,1 ckpt/colmdriver/percpetion covlm_cmd_extend_adaptive_20 ckpt/colmdriver/waypoints_planner/epoch_26.ckpt ./ckpt/colmdriver_planner

VLM planner

Data generation

Extract information from V2Xverse data (mentioned above): MLLMs/data_transfer_sum.py
Generate json format training data: MLLMs/data_transfer_query.py

Our training data is also provided in google drive for reference. Since the images are originated from local V2Xverse dataset, you still need to download the dataset to get full access.

Lora Finetuning

Using ms-swift to finetune the MLLMs. Installation and details refer to the official repo. We provide an example script in MLLMs/finetune.sh

Acknowledgements

This implementation is based on code from several repositories.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ChatScene		ChatScene
MLLMs		MLLMs
codriving		codriving
common		common
docs		docs
img		img
model_envs		model_envs
opencdascenarios		opencdascenarios
opencood		opencood
report		report
route_debug/myscenario		route_debug/myscenario
scenario_generator		scenario_generator
scripts		scripts
simulation		simulation
spconv		spconv
tools		tools
vision_encoder		vision_encoder
visualization		visualization
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
carla		carla
requirements.txt		requirements.txt
scenario_builder.html		scenario_builder.html
setup.py		setup.py
setup_map.py		setup_map.py
xml_routes.zip		xml_routes.zip

License

ucla-mobility/MDriveBench

Folders and files

Latest commit

History

Repository files navigation