Skip to content

leduckhai/MultiMed-ST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation

Paper Dataset Models License Stars

📘 EMNLP 2025

Khai Le-Duc*, Tuyen Tran*, Bach Phan Tat, Nguyen Kim Hai Bui, Quan Dang, Hung-Phong Tran, Thanh-Thuy Nguyen, Ly Nguyen, Tuan-Minh Phan, Thi Thu Phuong Tran, Chris Ngo, Nguyen X. Khanh**, Thanh Nguyen-Tang**

*Equal contribution   |   **Equal supervision


If you find this work useful, please consider starring the repo and citing our paper!


🧠 Abstract

Multilingual speech translation (ST) in the medical domain enhances patient care by enabling effective communication across language barriers, alleviating workforce shortages, and improving diagnosis and treatment — especially in global health emergencies.

In this work, we introduce MultiMed-ST, the first large-scale multilingual medical speech translation dataset, spanning all translation directions across five languages:
🇻🇳 Vietnamese, 🇬🇧 English, 🇩🇪 German, 🇫🇷 French, 🇨🇳 Traditional & Simplified Chinese.

With 290,000 samples, MultiMed-ST represents:

  • 🧩 the largest medical MT dataset to date
  • 🌐 the largest many-to-many multilingual ST dataset across all domains

We also conduct the most comprehensive ST analysis in the field's history, to our best knowledge, covering:

  • ✅ Empirical baselines
  • 🔄 Bilingual vs. multilingual study
  • 🧩 End-to-end vs. cascaded models
  • 🎯 Task-specific vs. multi-task seq2seq approaches
  • 🗣️ Code-switching analysis
  • 📊 Quantitative & qualitative error analysis

All code, data, and models are publicly available: 👉 GitHub Repository

MultiMed-ST Poster


🧰 Repository Overview

This repository provides scripts for:

  • 🎙️ Automatic Speech Recognition (ASR)
  • 🌍 Machine Translation (MT)
  • 🔄 Speech Translation (ST) — both cascaded and end-to-end seq2seq models

It includes:

  • ⚙️ Model preparation & fine-tuning
  • 🚀 Training & inference scripts
  • 📊 Evaluation & benchmarking utilities

📦 Dataset & Models

You can explore and download all fine-tuned models for MultiMed-ST directly from our Hugging Face repository:

🔹 Whisper ASR Fine-tuned Models (Click to expand)
Language Model Link
Chinese whisper-small-chinese
English whisper-small-english
French whisper-small-french
German whisper-small-german
Multilingual whisper-small-multilingual
Vietnamese whisper-small-vietnamese
🔹 LLaMA-based MT Fine-tuned Models (Click to expand)
Source → Target Model Link
Chinese → English llama_Chinese_English
Chinese → French llama_Chinese_French
Chinese → German llama_Chinese_German
Chinese → Vietnamese llama_Chinese_Vietnamese
English → Chinese llama_English_Chinese
English → French llama_English_French
English → German llama_English_German
English → Vietnamese llama_English_Vietnamese
French → Chinese llama_French_Chinese
French → English llama_French_English
French → German llama_French_German
French → Vietnamese llama_French_Vietnamese
German → Chinese llama_German_Chinese
German → English llama_German_English
German → French llama_German_French
German → Vietnamese llama_German_Vietnamese
Vietnamese → Chinese llama_Vietnamese_Chinese
Vietnamese → English llama_Vietnamese_English
Vietnamese → French llama_Vietnamese_French
Vietnamese → German llama_Vietnamese_German
🔹 m2m100_418M MT Fine-tuned Models (Click to expand)
Source → Target Model Link
de → en m2m100_418M-finetuned-de-to-en
de → fr m2m100_418M-finetuned-de-to-fr
de → vi m2m100_418M-finetuned-de-to-vi
de → zh m2m100_418M-finetuned-de-to-zh
en → de m2m100_418M-finetuned-en-to-de
en → fr m2m100_418M-finetuned-en-to-fr
en → vi m2m100_418M-finetuned-en-to-vi
en → zh m2m100_418M-finetuned-en-to-zh
fr → de m2m100_418M-finetuned-fr-to-de
fr → en m2m100_418M-finetuned-fr-to-en
fr → vi m2m100_418M-finetuned-fr-to-vi
fr → zh m2m100_418M-finetuned-fr-to-zh
vi → de m2m100_418M-finetuned-vi-to-de
vi → en m2m100_418M-finetuned-vi-to-en
vi → fr m2m100_418M-finetuned-vi-to-fr
vi → zh m2m100_418M-finetuned-vi-to-zh
zh → de m2m100_418M-finetuned-zh-to-de
zh → en m2m100_418M-finetuned-zh-to-en
zh → fr m2m100_418M-finetuned-zh-to-fr
zh → vi m2m100_418M-finetuned-zh-to-vi

👨‍💻 Core Developers

  1. Khai Le-Duc

University of Toronto, Canada

📧 duckhai.le@mail.utoronto.ca
🔗 https://github.com/leduckhai

  1. Tuyen Tran: 📧 tuyencbt@gmail.com

Hanoi University of Science and Technology, Vietnam

  1. Nguyen Kim Hai Bui: 📧 htlulem185@gmail.com

Eötvös Loránd University, Hungary

🧾 Citation

If you use our dataset or models, please cite:

📄 arXiv:2504.03546

@inproceedings{le2025multimedst,
  title={MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation},
  author={Le-Duc, Khai and Tran, Tuyen and Tat, Bach Phan and Bui, Nguyen Kim Hai and Anh, Quan Dang and Tran, Hung-Phong and Nguyen, Thanh Thuy and Nguyen, Ly and Phan, Tuan Minh and Tran, Thi Thu Phuong and others},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
  pages={11838--11963},
  year={2025}
}

About

[EMNLP 2025] MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published