Official Code for Steam: Sparse Transformer and Explicit Attention Module for Multimodal Object Detection
The code will be released after our paper is accepted !
A novel multimodal object detection framework with sparse transformer and explicit attention module. The illustration of our proposed multimodal object detection framework is shown in the following figure.
If you find our research beneficial to your work, please cite our paper.
@ARTICLE{11006394,
author={Lan, Xiaoxiong and Liu, Shenghao and Zhang, Sude and Zhang, Zhiyong},
journal={IEEE Sensors Journal},
title={Steam: Sparse Transformer and Explicit Attention Module for Multimodal Object Detection},
year={2025},
volume={25},
number={13},
pages={24873-24885}}
Training and testing environments:
- Ubuntu 20.04.5 LTS
- NVIDIA GeForce RTX 3090
Step 1: Clone the steam repository:
To get started, first clone the our repository and navigate to the project directory:
git clone https://github.com/lanxx314/steam
cd steamStep 2: Create a conda virtual environment and activate it
Steam recommends setting up a conda environment. Use the following commands to set up your environment:
conda create -n steam python=3.9 -y
conda activate steam
pip install -r requirements.txt
conda install pytorch==1.10.1 cudatoolkit==11.3.1 torchvision==0.11.2 -c pytorch
cd utils/nms_rotated
python setup.py develop # or "pip install -v -e ."Step 3: Install DOTA_devkit
It's just a tool to split the high resolution image and evaluation the obb, you can clone the latest version of the YOLO_OBB repository:
cd yolo_obb/DOTA_devkit
sudo apt-get install swig
swig -c++ -python polyiou.i
python setup.py build_ext --inplaceStep 4: Prepare the dataset
You can organize your dataset as the following directory:
root
├── DataSet
│ ├── rgb
│ │ ├── train
│ │ │ ├── images
│ │ │ ├── labels
│ │ ├── val
│ │ │ ├── images
│ │ │ ├── labels
│ │ ├── test
│ │ │ ├── images
│ │ │ ├── labels
│ ├── ir
│ │ ├── train
│ │ │ ├── images
│ │ │ ├── labels
│ │ ├── val
│ │ │ ├── images
│ │ │ ├── labels
│ │ ├── test
│ │ │ ├── images
│ │ │ ├── labels
Step 5: Train
You can train on public data or customer data by the following command:
python train.py --batch--size 16Step 6: Test
Evaluate the performance on the test set:
python test.py --save-json --name 'test'Evaluate the performance on the validation set:
python val.py --save-json --name 'val'To train and validate our method on LLVIP and aligned-FLIR, adapt our module to the CFT.
Our code mainly improves on ultralytics, yolo_obb, and CFT. Many thanks to the authors !
