Implementation of
This repo contains implementation of the method Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation based on NAVSIM repo.
-
Getting started:
Start with looking through the Getting started section of the original NavSim repo below which can give a broader understanding of the code structure. Download and installation part is crucial for proceeding. The original repo requires the directory structure as shown below (example forminiandtrainvalsplits) which requires rearranging after dowloading scripts.download/ ├── sensor_blobs/ │ ├── trainval/ │ └── mini/ ├── navsim_logs/ │ ├── trainval/ │ └── mini/ └── maps/ -
Dataset preparation:
In order to train the model it is required to process the dataset to generate preprocessed targets and labels. In our implementation we pre-compute PDM metrics as targets which takes about 30 hours on 32 cores. As the task is computationally intensive it is recommended to use a separate script which benefit for parallelizationbash scripts/hydra_scripts/prepare_dataset_cache.sh
this command will produce
exp/training_cachedirectory containing preprocessed dataset. Feel free to changeTRAIN_TEST_SPLIT=navtrainwithin the script based on your needs. -
Training model:
In order to useWandBlogging, first, please set theHYDRA_WANDB_ENTITYandHYDRA_WANDB_PROJECTenv variables. Then, the training could be run byscripts/hydra_scripts/train_pdm_score_only.shIn
train_pdm_score_only.shonly the final pdm score is used as the prediction target and as a cost function for selecting the best trajectory. In contrast, another examplescripts/hydra_scripts/train_multiple_loss_targets.shoptimizes for multiple targets. Also,
agent.config.cost_function_weightscan be changed for trained model during inference. -
Inferencing model:
4.1 Local score computation:
First, it is required to prepare metric cache by runningbash scripts/evaluation/run_metric_caching.shand then computing metrics
bash scripts/hydra_scripts/local_evaluation.shwhich will produce
.csvfile with per-scenario and aggregated metrics.4.2 Submission generation:
Runningbash scripts/hydra_scripts/prepare_submission.shwill produce
.pklfile that can be sent to leaderboard (currently disabled).
Hydra-MDP selects the most optimal trajectroy from fixed trajectories dictionary based on the cost function score. For reproducibility easiness we provide precomputed dictionaries within this repo in navsim/agents/hydra/trajectory_vocab/real directory. However, the user can prepare their own dictionary with the help of navsim/agents/hydra/prepare_trajectories_bank.py script.
The following models were evaluated on navhard_two_stage split (Number of successful scenarios: 5912):
| metric | train_multiple_loss_targets | hydra_pdm_score_only | train_multiple_loss_targets_1024 | imitation_loss_only |
|---|---|---|---|---|
| token | extended_pdm_score_combined | extended_pdm_score_combined | extended_pdm_score_combined | extended_pdm_score_combined |
| no_at_fault_collisions_stage_one | 0.9544 | 0.96 | 0.97 | 0.9533 |
| drivable_area_compliance_stage_one | 0.9489 | 0.9289 | 0.9378 | 0.6978 |
| driving_direction_compliance_stage_one | 0.9933 | 0.9989 | 0.9911 | 0.9756 |
| traffic_light_compliance_stage_one | 1.0 | 0.9978 | 0.9978 | 0.9933 |
| ego_progress_stage_one | 0.8352 | 0.8149 | 0.835 | 0.8295 |
| time_to_collision_within_bound_stage_one | 0.9556 | 0.96 | 0.9644 | 0.9378 |
| lane_keeping_stage_one | 0.9489 | 0.9267 | 0.9689 | 0.9311 |
| history_comfort_stage_one | 0.9778 | 0.9756 | 0.9756 | 0.9756 |
| pdm_score_no_masking_stage_one | 0.7619 | 0.7442 | 0.7635 | 0.5404 |
| pdm_score_proxy_stage_one | 0.8012 | 0.7705 | 0.7994 | 0.5906 |
| two_frame_extended_comfort_stage_one | 0.6 | 0.5467 | 0.6044 | 0.6578 |
| no_at_fault_collisions_stage_two | 0.8233 | 0.8337 | 0.8115 | 0.7967 |
| drivable_area_compliance_stage_two | 0.8315 | 0.8302 | 0.8232 | 0.6397 |
| driving_direction_compliance_stage_two | 0.8783 | 0.8917 | 0.8719 | 0.7932 |
| traffic_light_compliance_stage_two | 0.9826 | 0.9787 | 0.9794 | 0.9809 |
| ego_progress_stage_two | 0.8594 | 0.8154 | 0.8509 | 0.8409 |
| time_to_collision_within_bound_stage_two | 0.787 | 0.8018 | 0.7791 | 0.7609 |
| lane_keeping_stage_two | 0.482 | 0.4734 | 0.4556 | 0.445 |
| history_comfort_stage_two | 0.9527 | 0.9609 | 0.9634 | 0.9646 |
| two_frame_extended_comfort_stage_two | 0.5686 | 0.5879 | 0.6091 | 0.6533 |
| score | 0.3536 | 0.3445 | 0.3454 | 0.1786 |
| wandb_run | link | link | link | link |
| checkpoint | download | download | download | download |
NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking
Daniel Dauner1,2, Marcel Hallgarten1,5, Tianyu Li3, Xinshuo Weng4, Zhiyu Huang4,6, Zetong Yang3
Hongyang Li3, Igor Gilitschenski7,8, Boris Ivanovic4, Marco Pavone4,9, Andreas Geiger1,2, and Kashyap Chitta1,21University of Tübingen, 2Tübingen AI Center, 3OpenDriveLab at Shanghai AI Lab, 4NVIDIA Research
5Robert Bosch GmbH, 6Nanyang Technological University, 7University of Toronto, 8Vector Institute, 9Stanford UniversityAdvances in Neural Information Processing Systems (NeurIPS), 2024
Track on Datasets and Benchmarks
🔥 NAVSIM gathers simulation-based metrics (such as progress and time to collision) for end-to-end driving by unrolling simplified bird's eye view abstractions of scenes for a short simulation horizon. It operates under the condition that the policy has limited influence on the environment, which enables efficient, open-loop metric computation while being better aligned with closed-loop evaluations than traditional displacement errors.
This branch contains the code for NAVSIM v2, used in the 2025 NAVSIM challenge. For NAVSIM v1, as well as its navtest leaderboard, please check the v1.1 branch.
- Download and installation
- Understanding and creating agents
- Understanding the data format and classes
- Dataset splits vs. filtered training / test splits
- Understanding the Extended PDM Score
- Understanding the traffic simulation
- Submitting to the Leaderboard
[2025/04/28]NAVSIM v2.2 release (official devkit version for AGC 2025)- Release of
private_test_harddataset (see splits) for the HuggingFace NAVSIM v2 End-to-End Driving Challenge 2025 Leaderboard.- The submission deadline is 2025-05-11 00:00:00 UTC
- You are limited to one upload per day on the challenge leaderboard, which should take approximately 2 hours to evaluate after a succesful submission.
- Fixed bug in
openscene_meta_datasfornavhardandwarmup⚠️ IMPORTANT: If you usednavhard_two_stage/openscene_meta_datasorwarmup_two_stage/openscene_meta_datasto evaluate your model, please re-download and use the new data.
- Release of
[2025/04/24]NAVSIM v2.1.2 release- Release of
navhard_two_stagedataset (see splits) - Updated Extended Predictive Driver Model Score (EPDMS) for the Hugging Face Warmup leaderboard. See see metrics for details regarding the implementation.
- Release of
[2025/04/13]NAVSIM v2.1.1 release- Updated dataset for the warmup leaderboard with minor fixes
[2025/04/08]NAVSIM v2.1 release- Added new dataset for the Hugging Face Warmup leaderboard (see submission)
- Introduced support for two-stage reactive traffic agents (see traffic simulation)
[2025/02/28]NAVSIM v2.0 release- Extends the PDM Score with more metrics and penalties (see metrics)
- Adds a new two-stage pseudo closed-loop simulation (see metrics)
- Adds support for reactive traffic agent policies (see traffic simulation)
[2024/09/03]NAVSIM v1.1 release- Leaderboard for
navteston Hugging Face - Release of baseline checkpoints on Hugging Face
- Updated docs for submission and paper
- Leaderboard for
[2024/04/21]NAVSIM v1.0 release (official devkit version for AGC 2024)- Parallelization of metric caching / evaluation
- Adds Transfuser baseline (see agents)
- Adds standardized training and test filtered splits (see splits)
- Visualization tools (see tutorial_visualization.ipynb)
[2024/04/03]NAVSIM v0.4 release- Support for test phase frames of competition
- Download script for trainval
- Egostatus MLP Agent and training pipeline
[2024/03/25]NAVSIM v0.3 release- Adds code for Leaderboard submission
[2024/03/11]NAVSIM v0.2 release- Easier installation and download
- mini and test data split integration
- Privileged
Humanagent
[2024/02/20]NAVSIM v0.1 release (initial demo)- OpenScene-mini sensor blobs and annotation logs
- Naive
ConstantVelocityagent
All assets and code in this repository are under the Apache 2.0 license unless specified otherwise. The datasets (including nuPlan and OpenScene) inherit their own distribution licenses. Please consider citing our paper and project if they help your research.
@inproceedings{Dauner2024NEURIPS,
author = {Daniel Dauner and Marcel Hallgarten and Tianyu Li and Xinshuo Weng and Zhiyu Huang and Zetong Yang and Hongyang Li and Igor Gilitschenski and Boris Ivanovic and Marco Pavone and Andreas Geiger and Kashyap Chitta},
title = {NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2024},
}@misc{Contributors2024navsim,
title={NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking},
author={NAVSIM Contributors},
howpublished={\url{https://github.com/autonomousvision/navsim}},
year={2024}
}- SLEDGE | tuPlan garage | CARLA garage | Survey on E2EAD
- PlanT | KING | TransFuser | NEAT

