VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning

Liyun Zhu^1,2, Qixiang Chen¹, Xi Shen³, Xiaodong Cun²
¹Australian National University ²GVC Lab, Great Bay University ³Intellindust
🌐 Project Website ｜ 📑 Paper ｜ 🤗 Data

This repository contains the official implementation of our paper: VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning.

We propose VAU-R1, a Reinforcement Fine-Tuning (RFT) framework that improves the reasoning ability of MLLMs for video anomaly understanding (VAU). Specifically, we adopt Group Relative Policy Optimization (GRPO) to optimize the model with task-specific rewards, such as answer format, accuracy, and temporal Intersection-over-Union (IoU). We decompose the VAU task into four complementary tasks to facilitate comprehensive reasoning: multiple-choice QA, temporal anomaly grounding, anomaly reasoning, and anomaly classification.

Get Started

1. Environment Setup

We use Qwen2-VL and Qwen2.5-VL as our base model. We provide a simple installation:

pip install transformers
pip install qwen_vl_utils

2. Prepare the training data

We construct VAU-Bench, a unified benchmark built from MSAD, UCF-Crime, and ECVA, enriched with Chain-of-Thought (CoT) annotations, including: (i) video descriptions, (ii) temporal boundaries, (iii) multiple-choice QA, and (iv) reasoning rationales.

Please download the original video files from UCF-Crime , ECVA and MSAD for our experiments. Our Chain of Thought annotation for these three datasets can be found in annotations/ folder or link here.

Training

We use scripts/training/run_grpo_video_qa.sh for training the RFT for Multi-choice QA task.

sh scripts/training/run_grpo_video_qa.sh

We use scripts/training/run_grpo_video_tag.sh for training the RFT for temporal anomaly grouding task.

sh scripts/training/run_grpo_video_tag.sh

Evaluation

Please follow the evaluation scripts in scripts/evaluation folder to evaluate the model performance on four tasks.

Citation

If you find our VAU-R1 useful in your research, please consider cite our work or star our repo.

@misc{zhu2025vaur1,
      title={VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning}, 
      author={Liyun Zhu and Qixiang Chen and Xi Shen and Xiaodong Cun},
      year={2025},
      eprint={2505.23504},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.23504}, 
}

Acknowledgement

This codebase is built on top of VideoChat-R1, and we thank the authors for their work.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
annotations		annotations
assets		assets
configs		configs
scripts		scripts
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning

Get Started

1. Environment Setup

2. Prepare the training data

Training

Evaluation

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

GVCLab/VAU-R1

Folders and files

Latest commit

History

Repository files navigation

VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning

Get Started

1. Environment Setup

2. Prepare the training data

Training

Evaluation

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages