Safety-critical scenarios are rare yet pivotal for evaluating and enhancing the robustness of autonomous driving systems. While existing methods generate safety-critical driving trajectories, simulations, or single-view videos, they fall short of meeting the demands of advanced end-to-end autonomous systems (E2E AD), which require real-world, multi-view video data. To bridge this gap, we introduce SafeMVDrive, the first framework designed to generate high-quality, safety-critical, multi-view driving videos grounded in real-world domains. SafeMVDrive strategically integrates a safety-critical trajectory generator with an advanced multi-view video generator. To tackle the challenges inherent in this integration, we first enhance scene understanding ability of the trajectory generator by incorporating visual context -- which is previously unavailable to such generator -- and leveraging a GRPO-finetuned vision-language model to achieve more realistic and context-aware trajectory generation. Second, recognizing that existing multi-view video generators struggle to render realistic collision events, we introduce a two-stage, controllable trajectory generation mechanism that produces collision-evasion trajectories, ensuring both video quality and safety-critical fidelity. Finally, we employ a diffusion-based multi-view video generator to synthesize high-quality safety-critical driving videos from the generated trajectories. Experiments conducted on an E2E AD planner demonstrate a significant increase in collision rate when tested with our generated data, validating the effectiveness of SafeMVDrive in stress-testing planning modules.
SafeMVDrive has the following pipeline:

- VLM-based Adversarial Vehicle Selector: Identifies the adversarial vehicle from multi-view images.
- Two-Stage Evasion Trajectory Generator: First generates a collision trajectory, then refines it into a realistic evasion trajectory.
- Trajectory-to-Video Generator: Synthesizes multi-view videos from the generated trajectories.
The codebase is organized into two primary modules:
vlm-selector/: VLM-based Adversarial Vehicle Selectortwo-stage-simulator/: Two-stage Evasion Trajectory GeneratorT2VGenerator/: Trajectory-to-Video Generatoreval/: E2E driving module evaluation
Each module requires a separate environment.
First, prepare the weights for our three components.
vlm-selector: Get our GRPO-finetuned Qwen2.5 VL-Instruct-7B model from: https://huggingface.co/JiaweiZhou/SafeMVDrive/tree/main/vlm-selector/checkpoint-2600.
two-stage-simulator: Get our diffusion-based trajectory generation model trained with one-frame context from from: https://huggingface.co/JiaweiZhou/SafeMVDrive/tree/main/two-stage-simulator.
T2VGenerator: Pretrained weights required by UniMLVG
Organize the directory structure as follows:
${CODE_ROOT}/
├── T2VGenerator
├── ...
├── weights
│ ├── vlm-selector
│ │ ├── checkpoint-2600
│ │ │ ├── global_step2600
│ │ │ ├── ...
│ ├── two-stage-simulator
│ │ ├── config.json
│ │ ├── iter80000.ckpt
│ ├── T2VGenerator
│ │ ├── ctsd_unimlvg_tirda_bm_nwa_60k.pthDownload the original nuScenes dataset from nuScenes and organize the directory structure as follows:
${CODE_ROOT}/
├── T2VGenerator
├── ...
├── nuscenes
│ ├── v1.0-trainval-zip
│ │ ├── nuScenes-map-expansion-v1.3.zip
│ ├── can_bus
│ ├── maps
│ ├── samples
│ ├── sweeps
│ ├── v1.0-trainval
Test on CUDA 12.4.
conda create -n safemvdrive-vlm python=3.10
conda activate safemvdrive-vlm
cd vlm-selector/
bash setup.shcd vlm-selector/src
bash VLM_selector.shYou can modify the DATA_COUNT in the script to change the number of samples randomly selected from the nuscenes val dataset.
Test on CUDA 11.3.
conda create -n safemvdrive-trajectory python=3.9
conda activate safemvdrive-trajectory
cd two-stage-simulator/
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu113
pip install pip==24.0 #
pip install numpy==1.23.4 # ignore confict
pip install -e .
cd trajdata
pip install -r trajdata_requirements.txt
pip install -e .Install Pplan
git clone https://github.com/NVlabs/spline-planner.git Pplan
cd Pplan
pip install -e .cd two-stage-simulator
bash two-stage-simulate.shTest on CUDA 12.4.
conda create -n safemvdrive-video python=3.9
conda activate safemvdrive-video
python -m pip install torch==2.5.1 torchvision==0.20.1
cd T2VGenerator
git submodule update --init --recursive
python -m pip install -r requirements.txt
cd T2VGenerator
bash T2VGeneration.shWe take UniAD as an example. To evaluate it on the generated adversarial dataset (or our SafeMVDrive dataset), follow these steps:
- Setup environment required by UniAD and download pretrained weight.
- Comment out the following lines (and fix indentation) in
/path/to/uniad/env/site-packages/nuscenes/eval/common/loaders.py
if scene record['name'] in splits[eval_split]:else:
raise ValueError('Error: Requested split {} which this function cannot map to the correct NuScenes version.'
.format(eval_split))- The remaining required modifications for UniAD have been placed in
eval/UniAD, intended to replace the corresponding files under{ROOT_OF_UNIAD}.
- Create a symbolic link to the generated dataset directory at
{ROOT_OF_UNIAD}/data/nuscenesand execute{ROOT_OF_UNIAD}/tools/uniad_create_data.shto extract metadata.sh to extract metadata. - Run
{ROOT_OF_UNIAD}/tools/uniad_dist_eval.shto evaluate.
By default, we use the output obj_box_col as the basis for calculating the sample level collision rate.
To compute the scene-level collision rate instead, please modify the following lines accordingly:
planning_metrics.py (Lines 176–179)
If you find this repository helpful, please consider citing our paper:
@article{zhou2025safemvdrive,
title={SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain},
author={Zhou, Jiawei and Lyu, Linye and Tian, Zhuotao and Zhuo, Cheng and Li, Yu},
journal={arXiv preprint arXiv:2505.17727},
year={2025}
}We would like to thank the developers of CTG and OpenDWM, upon which our work is built.




