Skip to content

qvim-aes/qvim-baseline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AES AIMLA Challenge 2025 Baseline System

Query-by-Vocal Imitation Challenge

Query by Vocal Imitation (QVIM) enables users to search a database of sounds via a vocal imitation of the desired sound. This offers sound designers an intuitively expressive way of navigating large sound effects databases.

We invite participants to submit systems that accept a vocal imitation query and retrieve a perceptually similar recording from a large database of sound effects.

Important Dates

  • Challenge start: April 1, 2025
  • Challenge end: June 15, 2025
  • Challenge results announcement: July 15, 2025

For more details, please have a look at our website.

For Updates please register here.

Baseline System

This repository contains the baseline system for the AES AIMLA Challenge 2025. The architecture and the training procedure is based on "Improving Query-by-Vocal Imitation with Contrastive Learning and Audio Pretraining" (DCASE2025 Workshop).

Getting Started

Prerequisites

  1. Clone this repository.
git clone https://github.com/qvim-aes/qvim-baseline.git
  1. Create and activate a conda environment with Python 3.10:
conda create -n environment.yml
conda activate qvim-baseline
  1. Install 7z, e.g.,
# (on linux)
sudo apt install p7zip-full
# (on windows)
conda install -c conda-forge 7zip

For linux users: do not use conda package p7zip - this package is based on the outdated version 16.02 of 7zip; to extract the dataset, you need a more recent version of p7zip.

  1. If you have not used Weights and Biases for logging before, you can create a free account. On your machine, run wandb login and copy your API key from this link to the command line.

Training

To start the training, run the following command.

cd MAIN_FOLDER_OF_THIS_REPOSITORY
export PYTHONPATH=$(pwd)/src
python src/qvim_mn_baseline/ex_qvim.py

Evaluation Results

Model Name MRR (exact match) NDCG (category match)
random 0.0444 ~0.337
2DFT 0.1262 0.4793
MN baseline 0.2726 0.6463
  • The Mean Reciprocal Rank (MRR) is the metric used to select submitted systems for the subjective evaluation. The MRR gives the average inverse rank $\frac{1}{r_i}$ of the reference sound $i$ averaged over all imitations $Q$:

$$\textrm{MRR} = \frac{1}{\lvert Q \rvert} \sum_{i=1}^{\lvert Q \rvert} \frac{1}{r_i}$$

  • The Normalized Discounted Cumulative Gain (NDCG) measures the systems' ability to retrieve sounds of the imitated category (i.e., how good is the system at retrieving an arbitrary dog bark if a specific dog bark was imitated). The NDCG will not be used for ranking.

Contact

For questions or inquiries, please contact paul.primus@jku.at.

Citation

@inproceedings{Greif2024,
    author = "Greif, Jonathan and Schmid, Florian and Primus, Paul and Widmer, Gerhard",
    title = "Improving Query-By-Vocal Imitation with Contrastive Learning and Audio Pretraining",
    booktitle = "Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024)",
    address = "Tokyo, Japan",
    month = "October",
    year = "2024",
    pages = "51--55"
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages