AES AIMLA Challenge 2025 Baseline System

Query-by-Vocal Imitation Challenge

Query by Vocal Imitation (QVIM) enables users to search a database of sounds via a vocal imitation of the desired sound. This offers sound designers an intuitively expressive way of navigating large sound effects databases.

We invite participants to submit systems that accept a vocal imitation query and retrieve a perceptually similar recording from a large database of sound effects.

Important Dates

Challenge start: April 1, 2025
Challenge end: June 15, 2025
Challenge results announcement: July 15, 2025

For more details, please have a look at our website.

For Updates please register here.

Baseline System

This repository contains the baseline system for the AES AIMLA Challenge 2025. The architecture and the training procedure is based on "Improving Query-by-Vocal Imitation with Contrastive Learning and Audio Pretraining" (DCASE2025 Workshop).

The training loop is implemented using PyTorch and PyTorch Lightning.
Logging is implemented using Weights and Biases.
It uses the MobileNetV3 (MN) pretrained on AudioSet to encode audio recordings.
The system is trained on VimSketch and evaluated on the public evaluation dataset described on our website.

Getting Started

Prerequisites

linux (tested on Ubuntu 24.04)
conda, e.g., Miniconda3-latest-Linux-x86_64.sh

Clone this repository.

git clone https://github.com/qvim-aes/qvim-baseline.git

Create and activate a conda environment with Python 3.10:

conda create -n environment.yml
conda activate qvim-baseline

Install 7z, e.g.,

# (on linux)
sudo apt install p7zip-full
# (on windows)
conda install -c conda-forge 7zip

For linux users: do not use conda package p7zip - this package is based on the outdated version 16.02 of 7zip; to extract the dataset, you need a more recent version of p7zip.

If you have not used Weights and Biases for logging before, you can create a free account. On your machine, run wandb login and copy your API key from this link to the command line.

Training

To start the training, run the following command.

cd MAIN_FOLDER_OF_THIS_REPOSITORY
export PYTHONPATH=$(pwd)/src
python src/qvim_mn_baseline/ex_qvim.py

Evaluation Results

Model Name	MRR (exact match)	NDCG (category match)
random	0.0444	~0.337
2DFT	0.1262	0.4793
MN baseline	0.2726	0.6463

The Mean Reciprocal Rank (MRR) is the metric used to select submitted systems for the subjective evaluation. The MRR gives the average inverse rank $\frac{1}{r_i}$ of the reference sound $i$ averaged over all imitations $Q$:

$$\textrm{MRR} = \frac{1}{\lvert Q \rvert} \sum_{i=1}^{\lvert Q \rvert} \frac{1}{r_i}$$

The Normalized Discounted Cumulative Gain (NDCG) measures the systems' ability to retrieve sounds of the imitated category (i.e., how good is the system at retrieving an arbitrary dog bark if a specific dog bark was imitated). The NDCG will not be used for ranking.

Contact

For questions or inquiries, please contact paul.primus@jku.at.

Citation

@inproceedings{Greif2024,
    author = "Greif, Jonathan and Schmid, Florian and Primus, Paul and Widmer, Gerhard",
    title = "Improving Query-By-Vocal Imitation with Contrastive Learning and Audio Pretraining",
    booktitle = "Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024)",
    address = "Tokyo, Japan",
    month = "October",
    year = "2024",
    pages = "51--55"
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
resources		resources
src		src
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AES AIMLA Challenge 2025 Baseline System

Query-by-Vocal Imitation Challenge

Baseline System

Getting Started

Training

Evaluation Results

Contact

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

qvim-aes/qvim-baseline

Folders and files

Latest commit

History

Repository files navigation

AES AIMLA Challenge 2025 Baseline System

Query-by-Vocal Imitation Challenge

Baseline System

Getting Started

Training

Evaluation Results

Contact

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages