Steam: Sparse Transformer and Explicit Attention Module for Multimodal Object Detection

Xiaoxiong Lan¹ Shenghao Liu¹ Sude Zhang¹ Zhiyong Zhang¹

¹Sun Yat-Sen University

Intro

Official Code for Steam: Sparse Transformer and Explicit Attention Module for Multimodal Object Detection

The code will be released after our paper is accepted !

Overview

A novel multimodal object detection framework with sparse transformer and explicit attention module. The illustration of our proposed multimodal object detection framework is shown in the following figure.

Citation

If you find our research beneficial to your work, please cite our paper.

@ARTICLE{11006394,
  author={Lan, Xiaoxiong and Liu, Shenghao and Zhang, Sude and Zhang, Zhiyong},
  journal={IEEE Sensors Journal}, 
  title={Steam: Sparse Transformer and Explicit Attention Module for Multimodal Object Detection}, 
  year={2025},
  volume={25},
  number={13},
  pages={24873-24885}}

Getting Started

Training and testing environments：

Ubuntu 20.04.5 LTS
NVIDIA GeForce RTX 3090

Step 1: Clone the steam repository:

To get started, first clone the our repository and navigate to the project directory:

git clone https://github.com/lanxx314/steam
cd steam

Step 2: Create a conda virtual environment and activate it

Steam recommends setting up a conda environment. Use the following commands to set up your environment:

conda create -n steam python=3.9 -y
conda activate steam
pip install -r requirements.txt
conda install pytorch==1.10.1 cudatoolkit==11.3.1 torchvision==0.11.2 -c pytorch
cd utils/nms_rotated
python setup.py develop  # or "pip install -v -e ."

Step 3: Install DOTA_devkit

It's just a tool to split the high resolution image and evaluation the obb, you can clone the latest version of the YOLO_OBB repository:

cd yolo_obb/DOTA_devkit
sudo apt-get install swig
swig -c++ -python polyiou.i
python setup.py build_ext --inplace

Step 4: Prepare the dataset

You can organize your dataset as the following directory:

root
├── DataSet
│   ├── rgb
│   │   ├── train
│   │   │   ├── images
│   │   │   ├── labels
│   │   ├── val
│   │   │   ├── images
│   │   │   ├── labels
│   │   ├── test
│   │   │   ├── images
│   │   │   ├── labels
│   ├── ir
│   │   ├── train
│   │   │   ├── images
│   │   │   ├── labels
│   │   ├── val
│   │   │   ├── images
│   │   │   ├── labels
│   │   ├── test
│   │   │   ├── images
│   │   │   ├── labels

Step 5: Train

You can train on public data or customer data by the following command:

python train.py --batch--size 16

Step 6: Test

Evaluate the performance on the test set:

python test.py --save-json --name 'test'

Evaluate the performance on the validation set:

python val.py --save-json --name 'val'

To train and validate our method on LLVIP and aligned-FLIR, adapt our module to the CFT.

Acknowledgment

Our code mainly improves on ultralytics, yolo_obb, and CFT. Many thanks to the authors !

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
__pycache__		__pycache__
data		data
imgs		imgs
models		models
tools		tools
utils		utils
Arial.ttf		Arial.ttf
Dockerfile		Dockerfile
H11H.jpeg		H11H.jpeg
HH.jpeg		HH.jpeg
HHtwo123.jpeg		HHtwo123.jpeg
LICENSE		LICENSE
README.md		README.md
detect.py		detect.py
export.py		export.py
hubconf.py		hubconf.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
shapes_rgb.txt		shapes_rgb.txt
train.py		train.py
val.py		val.py
valtest.py		valtest.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Steam: Sparse Transformer and Explicit Attention Module for Multimodal Object Detection

Intro

Overview

Citation

Getting Started

Acknowledgment

About

Uh oh!

Releases

Packages

Languages

License

lanxx314/steam

Folders and files

Latest commit

History

Repository files navigation

Steam: Sparse Transformer and Explicit Attention Module for Multimodal Object Detection

Intro

Overview

Citation

Getting Started

Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages