This repository is the official PyTorch implementation of MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding (WACV2020) by Geondo Park, Chihye Han, Wonjun Yoon, Daeshik Kim.
Download the dataset files: coco, Flickr
We use splits produced by Andrej Karpathy. Download the json file from here. Then, put it same directory, respectively. (ref, get_data_path in utils.py)
For train the new model with coco dataset, run train_coco.py Overall hyperparameters for training are set by default.
python train_coco.py -hop 10 -name <model_name> -p-coeff 0.1
For train the new model with flickr dataset, run train_flickr.py Overall hyperparameters for training are set by default.
python train_flickr.py -hop 10 -name <model_name> -p-coeff 0.1
- Our code is implemented based on : vsepp
@inproceedings{park2020mhsan,
title={MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding},
author={Park, Geondo and Han, Chihye and Yoon, Wonjun and Kim, Daeshik},
booktitle={The IEEE Winter Conference on Applications of Computer Vision},
pages={1518--1526},
year={2020}
}