GitHub - zhoujiawei3/SafeMVDrive: Official implementation for “SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain”

Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain

Adversarial vehicle suddenly cuts in; ego vehicle slightly steers right to avoid.

Rear adversarial vehicle suddenly accelerates; ego vehicle also speeds up to evade.

Rear adversarial vehicle suddenly accelerates; ego vehicle changes lane left to evade.

Front adversarial vehicle suddenly slows down; ego vehicle changes lane and decelerates to avoid.

Abstract

Safety-critical scenarios are rare yet pivotal for evaluating and enhancing the robustness of autonomous driving systems. While existing methods generate safety-critical driving trajectories, simulations, or single-view videos, they fall short of meeting the demands of advanced end-to-end autonomous systems (E2E AD), which require real-world, multi-view video data. To bridge this gap, we introduce SafeMVDrive, the first framework designed to generate high-quality, safety-critical, multi-view driving videos grounded in real-world domains. SafeMVDrive strategically integrates a safety-critical trajectory generator with an advanced multi-view video generator. To tackle the challenges inherent in this integration, we first enhance scene understanding ability of the trajectory generator by incorporating visual context -- which is previously unavailable to such generator -- and leveraging a GRPO-finetuned vision-language model to achieve more realistic and context-aware trajectory generation. Second, recognizing that existing multi-view video generators struggle to render realistic collision events, we introduce a two-stage, controllable trajectory generation mechanism that produces collision-evasion trajectories, ensuring both video quality and safety-critical fidelity. Finally, we employ a diffusion-based multi-view video generator to synthesize high-quality safety-critical driving videos from the generated trajectories. Experiments conducted on an E2E AD planner demonstrate a significant increase in collision rate when tested with our generated data, validating the effectiveness of SafeMVDrive in stress-testing planning modules.

Pipeline

SafeMVDrive has the following pipeline:

VLM-based Adversarial Vehicle Selector: Identifies the adversarial vehicle from multi-view images.
Two-Stage Evasion Trajectory Generator: First generates a collision trajectory, then refines it into a realistic evasion trajectory.
Trajectory-to-Video Generator: Synthesizes multi-view videos from the generated trajectories.

Getting Started

The codebase is organized into two primary modules:

vlm-selector/: VLM-based Adversarial Vehicle Selector
two-stage-simulator/: Two-stage Evasion Trajectory Generator
T2VGenerator/: Trajectory-to-Video Generator
eval/: E2E driving module evaluation

Each module requires a separate environment.

1. Parsing the nuScenes Dataset and model weights

Model Weights

First, prepare the weights for our three components.

vlm-selector: Get our GRPO-finetuned Qwen2.5 VL-Instruct-7B model from: https://huggingface.co/JiaweiZhou/SafeMVDrive/tree/main/vlm-selector/checkpoint-2600.

two-stage-simulator: Get our diffusion-based trajectory generation model trained with one-frame context from from: https://huggingface.co/JiaweiZhou/SafeMVDrive/tree/main/two-stage-simulator.

T2VGenerator: Pretrained weights required by UniMLVG

Organize the directory structure as follows:

    ${CODE_ROOT}/
    ├── T2VGenerator
    ├── ...
    ├── weights
    │   ├── vlm-selector
    │   │   ├── checkpoint-2600
    │   │   │   ├── global_step2600
    │   │   │   ├── ...        
    │   ├── two-stage-simulator
    │   │   ├── config.json 
    │   │   ├── iter80000.ckpt
    │   ├── T2VGenerator
    │   │   ├── ctsd_unimlvg_tirda_bm_nwa_60k.pth

Dataset

Download the original nuScenes dataset from nuScenes and organize the directory structure as follows:

    ${CODE_ROOT}/
    ├── T2VGenerator
    ├── ...
    ├── nuscenes
    │   ├── v1.0-trainval-zip
    │   │   ├── nuScenes-map-expansion-v1.3.zip
    │   ├── can_bus
    │   ├── maps
    │   ├── samples
    │   ├── sweeps
    │   ├── v1.0-trainval

2. VLM-based Adversarial Vehicle Select

Test on CUDA 12.4.

Setup

conda create -n safemvdrive-vlm python=3.10
conda activate safemvdrive-vlm
cd vlm-selector/
bash setup.sh

Preprocessing and VLM Inference

cd vlm-selector/src
bash VLM_selector.sh

You can modify the DATA_COUNT in the script to change the number of samples randomly selected from the nuscenes val dataset.

3. Two-stage Evasion Trajectory Generation

Test on CUDA 11.3.

Setup

conda create -n safemvdrive-trajectory python=3.9
conda activate safemvdrive-trajectory
cd two-stage-simulator/
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu113
pip install pip==24.0 # 
pip install numpy==1.23.4 # ignore confict
pip install -e .
cd trajdata
pip install -r trajdata_requirements.txt
pip install -e .

Install Pplan

git clone https://github.com/NVlabs/spline-planner.git Pplan
cd Pplan
pip install -e .

Evasion Trajectory Generation

cd two-stage-simulator
bash two-stage-simulate.sh

4. Trajectory-to-Video Generation

Test on CUDA 12.4.

Setup

conda create -n safemvdrive-video python=3.9
conda activate safemvdrive-video
python -m pip install torch==2.5.1 torchvision==0.20.1
cd T2VGenerator
git submodule update --init --recursive
python -m pip install -r requirements.txt

Trajectory-to-Video Generation

cd T2VGenerator
bash T2VGeneration.sh

5. Evaluating End-to-End Autonomous Driving Models on the Generated Adversarial Dataset

We take UniAD as an example. To evaluate it on the generated adversarial dataset (or our SafeMVDrive dataset), follow these steps:

Setup environment required by UniAD and download pretrained weight.
Comment out the following lines (and fix indentation) in /path/to/uniad/env/site-packages/nuscenes/eval/common/loaders.py

if scene record['name'] in splits[eval_split]:

else:
    raise ValueError('Error: Requested split {} which this function cannot map to the correct NuScenes version.'
                         .format(eval_split))

The remaining required modifications for UniAD have been placed in eval/UniAD, intended to replace the corresponding files under {ROOT_OF_UNIAD}.

Create a symbolic link to the generated dataset directory at {ROOT_OF_UNIAD}/data/nuscenes and execute {ROOT_OF_UNIAD}/tools/uniad_create_data.sh to extract metadata.sh to extract metadata.
Run {ROOT_OF_UNIAD}/tools/uniad_dist_eval.sh to evaluate.

By default, we use the output obj_box_col as the basis for calculating the sample level collision rate. To compute the scene-level collision rate instead, please modify the following lines accordingly: planning_metrics.py (Lines 176–179)

Citation

If you find this repository helpful, please consider citing our paper:

@article{zhou2025safemvdrive,
  title={SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain},
  author={Zhou, Jiawei and Lyu, Linye and Tian, Zhuotao and Zhuo, Cheng and Li, Yu},
  journal={arXiv preprint arXiv:2505.17727},
  year={2025}
}

Acknowledgements

We would like to thank the developers of CTG and OpenDWM, upon which our work is built.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
T2VGenerator		T2VGenerator
assets		assets
eval/UniAD		eval/UniAD
two-stage-simulator		two-stage-simulator
vlm-selector		vlm-selector
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain

Abstract

Pipeline

Getting Started

1. Parsing the nuScenes Dataset and model weights

Model Weights

Dataset

2. VLM-based Adversarial Vehicle Select

Setup

Preprocessing and VLM Inference

3. Two-stage Evasion Trajectory Generation

Setup

Evasion Trajectory Generation

4. Trajectory-to-Video Generation

Setup

Trajectory-to-Video Generation

5. Evaluating End-to-End Autonomous Driving Models on the Generated Adversarial Dataset

Citation

Acknowledgements

About

Uh oh!

Releases

Uh oh!

Languages

License

zhoujiawei3/SafeMVDrive

Folders and files

Latest commit

History

Repository files navigation

Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain

Abstract

Pipeline

Getting Started

1. Parsing the nuScenes Dataset and model weights

Model Weights

Dataset

2. VLM-based Adversarial Vehicle Select

Setup

Preprocessing and VLM Inference

3. Two-stage Evasion Trajectory Generation

Setup

Evasion Trajectory Generation

4. Trajectory-to-Video Generation

Setup

Trajectory-to-Video Generation

5. Evaluating End-to-End Autonomous Driving Models on the Generated Adversarial Dataset

Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Languages