Skip to content

Official implementation for “SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain”

License

Notifications You must be signed in to change notification settings

zhoujiawei3/SafeMVDrive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

*SafeMVDrive

Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain


License arXiv Dataset Project Page


Adversarial vehicle suddenly cuts in; ego vehicle slightly steers right to avoid.

Rear adversarial vehicle suddenly accelerates; ego vehicle also speeds up to evade.

Rear adversarial vehicle suddenly accelerates; ego vehicle changes lane left to evade.

Front adversarial vehicle suddenly slows down; ego vehicle changes lane and decelerates to avoid.

Abstract

Safety-critical scenarios are rare yet pivotal for evaluating and enhancing the robustness of autonomous driving systems. While existing methods generate safety-critical driving trajectories, simulations, or single-view videos, they fall short of meeting the demands of advanced end-to-end autonomous systems (E2E AD), which require real-world, multi-view video data. To bridge this gap, we introduce SafeMVDrive, the first framework designed to generate high-quality, safety-critical, multi-view driving videos grounded in real-world domains. SafeMVDrive strategically integrates a safety-critical trajectory generator with an advanced multi-view video generator. To tackle the challenges inherent in this integration, we first enhance scene understanding ability of the trajectory generator by incorporating visual context -- which is previously unavailable to such generator -- and leveraging a GRPO-finetuned vision-language model to achieve more realistic and context-aware trajectory generation. Second, recognizing that existing multi-view video generators struggle to render realistic collision events, we introduce a two-stage, controllable trajectory generation mechanism that produces collision-evasion trajectories, ensuring both video quality and safety-critical fidelity. Finally, we employ a diffusion-based multi-view video generator to synthesize high-quality safety-critical driving videos from the generated trajectories. Experiments conducted on an E2E AD planner demonstrate a significant increase in collision rate when tested with our generated data, validating the effectiveness of SafeMVDrive in stress-testing planning modules.

Pipeline

SafeMVDrive has the following pipeline: SafeMVDrive Pipeline

  1. VLM-based Adversarial Vehicle Selector: Identifies the adversarial vehicle from multi-view images.
  2. Two-Stage Evasion Trajectory Generator: First generates a collision trajectory, then refines it into a realistic evasion trajectory.
  3. Trajectory-to-Video Generator: Synthesizes multi-view videos from the generated trajectories.

Getting Started

The codebase is organized into two primary modules:

  • vlm-selector/: VLM-based Adversarial Vehicle Selector
  • two-stage-simulator/: Two-stage Evasion Trajectory Generator
  • T2VGenerator/: Trajectory-to-Video Generator
  • eval/: E2E driving module evaluation

Each module requires a separate environment.

1. Parsing the nuScenes Dataset and model weights

Model Weights

First, prepare the weights for our three components.

vlm-selector: Get our GRPO-finetuned Qwen2.5 VL-Instruct-7B model from: https://huggingface.co/JiaweiZhou/SafeMVDrive/tree/main/vlm-selector/checkpoint-2600.

two-stage-simulator: Get our diffusion-based trajectory generation model trained with one-frame context from from: https://huggingface.co/JiaweiZhou/SafeMVDrive/tree/main/two-stage-simulator.

T2VGenerator: Pretrained weights required by UniMLVG

Organize the directory structure as follows:

    ${CODE_ROOT}/
    ├── T2VGenerator
    ├── ...
    ├── weights
    │   ├── vlm-selector
    │   │   ├── checkpoint-2600
    │   │   │   ├── global_step2600
    │   │   │   ├── ...        
    │   ├── two-stage-simulator
    │   │   ├── config.json 
    │   │   ├── iter80000.ckpt
    │   ├── T2VGenerator
    │   │   ├── ctsd_unimlvg_tirda_bm_nwa_60k.pth

Dataset

Download the original nuScenes dataset from nuScenes and organize the directory structure as follows:

    ${CODE_ROOT}/
    ├── T2VGenerator
    ├── ...
    ├── nuscenes
    │   ├── v1.0-trainval-zip
    │   │   ├── nuScenes-map-expansion-v1.3.zip
    │   ├── can_bus
    │   ├── maps
    │   ├── samples
    │   ├── sweeps
    │   ├── v1.0-trainval

2. VLM-based Adversarial Vehicle Select

Test on CUDA 12.4.

Setup

conda create -n safemvdrive-vlm python=3.10
conda activate safemvdrive-vlm
cd vlm-selector/
bash setup.sh

Preprocessing and VLM Inference

cd vlm-selector/src
bash VLM_selector.sh

You can modify the DATA_COUNT in the script to change the number of samples randomly selected from the nuscenes val dataset.

3. Two-stage Evasion Trajectory Generation

Test on CUDA 11.3.

Setup

conda create -n safemvdrive-trajectory python=3.9
conda activate safemvdrive-trajectory
cd two-stage-simulator/
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu113
pip install pip==24.0 # 
pip install numpy==1.23.4 # ignore confict
pip install -e .
cd trajdata
pip install -r trajdata_requirements.txt
pip install -e .

Install Pplan

git clone https://github.com/NVlabs/spline-planner.git Pplan
cd Pplan
pip install -e .

Evasion Trajectory Generation

cd two-stage-simulator
bash two-stage-simulate.sh

4. Trajectory-to-Video Generation

Test on CUDA 12.4.

Setup

conda create -n safemvdrive-video python=3.9
conda activate safemvdrive-video
python -m pip install torch==2.5.1 torchvision==0.20.1
cd T2VGenerator
git submodule update --init --recursive
python -m pip install -r requirements.txt

Trajectory-to-Video Generation

cd T2VGenerator
bash T2VGeneration.sh

5. Evaluating End-to-End Autonomous Driving Models on the Generated Adversarial Dataset

We take UniAD as an example. To evaluate it on the generated adversarial dataset (or our SafeMVDrive dataset), follow these steps:

  1. Setup environment required by UniAD and download pretrained weight.
  2. Comment out the following lines (and fix indentation) in /path/to/uniad/env/site-packages/nuscenes/eval/common/loaders.py
if scene record['name'] in splits[eval_split]:
else:
    raise ValueError('Error: Requested split {} which this function cannot map to the correct NuScenes version.'
                         .format(eval_split))
  1. The remaining required modifications for UniAD have been placed in eval/UniAD, intended to replace the corresponding files under {ROOT_OF_UNIAD}.
  1. Create a symbolic link to the generated dataset directory at {ROOT_OF_UNIAD}/data/nuscenes and execute {ROOT_OF_UNIAD}/tools/uniad_create_data.sh to extract metadata.sh to extract metadata.
  2. Run {ROOT_OF_UNIAD}/tools/uniad_dist_eval.sh to evaluate.

By default, we use the output obj_box_col as the basis for calculating the sample level collision rate. To compute the scene-level collision rate instead, please modify the following lines accordingly: planning_metrics.py (Lines 176–179)

Citation

If you find this repository helpful, please consider citing our paper:

@article{zhou2025safemvdrive,
  title={SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain},
  author={Zhou, Jiawei and Lyu, Linye and Tian, Zhuotao and Zhuo, Cheng and Li, Yu},
  journal={arXiv preprint arXiv:2505.17727},
  year={2025}
}

Acknowledgements

We would like to thank the developers of CTG and OpenDWM, upon which our work is built.

About

Official implementation for “SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain”

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages