NeurIPS 2025
Mojtaba Nafez, Mobina Poulaei*, Nikan Vasei*, Bardia soltani moakhar, Mohammad Sabokrou, Mohammad Hossein Rohban
Weakly Supervised Video Anomaly Detection (WSVAD) has achieved notable advancements, yet existing models remain vulnerable to adversarial attacks, limiting their reliability. Due to weak supervisionโwhere only video-level labels are availableโtraditional adversarial defenses like adversarial training are ineffective, as video-level perturbations are too weak.
FrameShield introduces a Pseudo-Anomaly Generation method called Spatiotemporal Region Distortion (SRD), which creates localized synthetic anomalies in normal videos while maintaining temporal consistency. These synthetic samples, combined with noisy pseudo-labels, reduce label noise and enable effective adversarial training.
FrameShield substantially enhances robustness across benchmarks, outperforming state-of-the-art methods by +71.0% average AUROC improvement under adversarial settings.
FrameShield/
โ
โโโ clip/ # Base CLIP implementation
โโโ configs/ # Dataset-specific configuration files
โโโ datasets/ # Dataset and DataLoader creation scripts
โโโ models/ # X-CLIP and FrameShield model definitions
โโโ runners/ # Bash scripts for training/evaluation
โโโ utils/ # Logging, checkpointing, and helper utilities
โ
โโโ main.py # Standard (non-adversarial) training / testing
โโโ main_advtrain.py # Adversarial training / pseudo label entry point
โโโ main_attack.py # Adversarial attack evaluation
โ
โโโ requirements.txt # Python dependencies
Firstly, download and move into the repository.
git clone https://github.com/rohban-lab/FrameShield.git
cd FrameShieldSecondly, create a python environment. Below there is an example using conda.
It is recommended to use python 3.10, as it is the version we used for the development.
conda create -n FS python=3.10
conda activate FSFinally, install are the required dependencies using the command below.
pip install -r requirements.txtFrameShield supports multiple benchmark datasets used in the paper:
| Dataset | Source | Notes |
|---|---|---|
| ShanghaiTech | Train (Kaggle) / Test (Kaggle) | Official ShanghaiTech University website |
| TAD | Train+Test (Kaggle) | Official repository |
| UCF Crime | Project Website | Preprocessed for FrameShield |
| MSAD | Project Website | Apply for the dataset directly on their website. |
| UCSD-Ped2 | Official Paper | Preprocessed for FrameShield |
Each dataset should be placed under a root directory, which is specified in your config file.
DATA:
ROOT: '../SHANGHAI/' <-- The root directory
TRAIN_FILE: 'configs/shanghai/SHANGHAI_train.txt'
VAL_FILE: 'configs/shanghai/SHANGHAI_test.txt'
DATASET: shanghai
.
.See example configs in configs/.
Note: Depending on your dataset format, and how clean the frame indices are, you might need to change how the frames are loaded into the dataloaders.
| Model | Dataset | Link | Notes |
|---|---|---|---|
| Backbone | Kinetics-400 | Google Drive | Initial weights for the PromptMIL stage. |
You just need to download the weight and specify its path using the --pretrained PATH argument.
python ... --pretrained ../weights/k400_16_8.pthFor each task, you can choose either of these methods:
- Run pre-implemented bash files that are located in the
runners/.NUM_GPUsindicates the number of GPUs you want to use for the run.
- Use python commands manually. Regarding the arguments:
-cfg: path of your config file.--batch-size: training batch size--accumulation-steps: optimizer's accumulation steps--output: output directory path to save the logs--pretrained: path of the pretrained weight--only_test: if you want to clean test / generate pseudo labels
Below there are examples for each task using both methods.
To use the train.sh bash file:
bash runners/train.sh NUM_GPUsTo run manually using python command:
python -m torch.distributed.launch --rdzv_endpoint=localhost:29450 --nproc_per_node=1 main.py -cfg configs/traffic/traffic_server.yaml --batch-size 1 --accumulation-steps 8 --output output/train --pretrained ../weights/k400_16_8.pthTo use the test.sh bash file:
bash runners/test.sh NUM_GPUsTo run manually using python command:
python -m torch.distributed.launch --rdzv_endpoint=localhost:29450 --nproc_per_node=1 main.py -cfg configs/traffic/traffic_server.yaml --output output/test --pretrained ../weights/best.pth --only_testTo use the genlabels.sh bash file:
bash runners/genlabels.sh NUM_GPUsTo run manually using python command:
python -m torch.distributed.launch --nproc_per_node=1 --rdzv_endpoint=localhost:29450 main_advtrain.py -cfg configs/traffic/traffic_advtrain.yaml --batch-size 1 --accumulation-steps 8 --output output/gen_pseudo_labels --pretrained ../weights/best.pth --only_testTo use the advtrain.sh bash file:
bash runners/advtrain.sh NUM_GPUsTo run manually using python command:
python -m torch.distributed.launch --nproc_per_node=1 --rdzv_endpoint=localhost:29450 main_advtrain.py -cfg configs/traffic/traffic_advtrain.yaml --batch-size 1 --accumulation-steps 8 --output output/adv_train --pretrained ../weights/best.pthTo use the attack.sh bash file:
bash runners/attack.sh NUM_GPUsTo run manually using python command:
python -m torch.distributed.launch --nproc_per_node=1 --rdzv_endpoint=localhost:29450 main_attack.py -cfg configs/traffic/traffic_attack.yaml --batch-size 1 --accumulation-steps 8 --output output/attack --pretrained ../weights/best.pthYou can access sample configs, text files, and labels list for each of the datasets in configs.
DATA:
ROOT: <str> # Root directory of the dataset
TRAIN_FILE: <str> # Path to the training dataset text file
VAL_FILE: <str> # Path to the validation/test dataset text file
DATASET: <str> # Dataset name (e.g., "shanghai", "traffic", "ucf")
NUM_CLIPS: <int> # Number of temporal clips (chunks) per video
NUM_FRAMES: <int> # Number of frames per clip
FRAME_INTERVAL: <int> # Frame sampling interval
NUM_CLASSES: <int> # Number of classes (e.g., 2 for normal/anomaly)
LABEL_LIST: <str> # Path to label list file (class names)
FILENAME_TMPL: <str> # Template for frame filenames (e.g., "img_{:05d}.jpg")
MODEL:
ARCH: <str> # Model backbone architecture (e.g., "ViT-B/32")
TRAIN:
BATCH_SIZE: <int> # Batch size for training
ACCUMULATION_STEPS: <int> # Gradient accumulation steps
AUTO_RESUME: <bool> # Resume automatically from latest checkpoint (True/False)
ADV_TRAIN:
EPS: <float> # Perturbation strength (epsilon) for adversarial attacks
LOSS: <str> # Loss type ("ce" for cross-entropy or "mil" for multiple-instance learning)
PSEUDO_LABEL: <bool> # Use pseudo-labels generated from the model (True/False)
PSEUDO_ANOMALY: <bool> # Use pseudo anomalies from SRD (True/False)
OpenReview:
@inproceedings{
nafez2025frameshield,
title={FrameShield: Adversarially Robust Video Anomaly Detection},
author={Mojtaba Nafez and Mobina Poulaei and Nikan Vasei and Bardia soltani moakhar and Mohammad Sabokrou and Mohammad Hossein Rohban},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=7FLKzOqsKd}
}arXiv
@misc{
nafez2025frameshieldadversariallyrobustvideo,
title={FrameShield: Adversarially Robust Video Anomaly Detection},
author={Mojtaba Nafez and Mobina Poulaei and Nikan Vasei and Bardia Soltani Moakhar and Mohammad Sabokrou and MohammadHossein Rohban},
year={2025},
eprint={2510.21532},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2510.21532},
}