Skip to content

dlutor/TMF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Two-Stage Dynamic Fusion Framework for Multimodal Classification Tasks

Preparation

Install the required dependencies:

pip install -r requirements.txt

Datasets

Please download the datasets manually from the following sources and place them into the specified directories:

MVSA: Download from MVSA kaggle. Put data to datasets/MVSA_Single.

UPMC Food101: Download from UPMC Food101 kaggle. Put images to datasets/food101.

CrisisMMD: Download from CrisisMMD v2.0. Put data_image to datasets/CrisisMMD.

N24News: Download from N24News. Put imgs to datasets/N24News.

Stage 1: Train and Test

Run the following shell scripts to train and test the baseline models:

bash ./shells/train_MVSA.sh
bash ./shells/trainCrisisMMD_h.sh
bash ./shells/trainfood101.sh
bash ./shells/trainfood101_vit.sh
bash ./shells/trainN24News_a.sh

Stage 2: Regression-based Fusion

Enter the stage 2 directory:

cd stage2

Run the following command:

MVSA

python stage2.py --output_dir ../saved --name MVSA_Single --dataset MVSA_Single \
--model KNet  \
--nlayers 1  \
--n_nodes 128  \
--top_k_logits 200  \
--epochs 100  \
--batch_size 128  \
--lr 1e-3 \
--gpu 0 \
--noise 0 \
--data_nums 0

CrisisMMD

python stage2.py --output_dir ../saved --name CrisisMMD --dataset CrisisMMD \
--model KNet  \
--nlayers 1  \
--n_nodes 128  \
--top_k_logits 200  \
--epochs 100  \
--batch_size 128  \
--lr 1e-3 \
--gpu 0 \
--noise 0 \
--data_nums 0

Food101

python stage2.py --output_dir ../saved --name food101 --dataset food101 \
--model KNet  \
--nlayers 1  \
--n_nodes 128  \
--top_k_logits 200  \
--epochs 100  \
--batch_size 128  \
--lr 1e-3 \
--gpu 0 \
--noise 0 \
--data_nums 0

N24News

python stage2.py --output_dir ../saved --name N24News --dataset N24News \
--model KNet  \
--nlayers 1  \
--n_nodes 128  \
--top_k_logits 200  \
--epochs 100  \
--batch_size 128  \
--lr 1e-3 \
--gpu 0 \
--noise 0 \
--data_nums

About

Multimodal Text-Image Classification code

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published