This is the repository of LLaQo, a Large Language Query-based music coach that leverages audio language modeling to provide detailed and formative assessments of music performances.
Our environment lam2 is downloadable from here. After downloading, simply do source /path/to/your/envs/lam2/bin/activate . Or install via pip with requirement.txt.
checkpoints: please access from here. It contains: Vicuna-7b model; Our checkpoint; and audio encoder.
For the gradio inference demo, after setting up the environment and put the ckpts/ under root directory, please do:
python LLaQo-chat.py
For our new NeuroPiano-dataset, please refer to the hf repository as well as its analysis report. For other datasets, please see the following table for accessing audio data from their original place and our metadata file which contains the instruction-tuned QA. Additionally the qagen/ directory contains processing prompts for CROCUS and expert_novice.
The codebase is adapted from the codebase of APT, which was originally adapted from the BLIP-2, and the lavis codebase.
@INPROCEEDINGS{Zhang2025LLaQo,
author={Zhang, Huan and Cheung, Vincent K.M. and Nishioka, Hayato and Dixon, Simon and Furuya, Shinichi},
booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={LLaQo: Towards a Query-Based Coach in Expressive Music Performance Assessment},
year={2025},
pages={1-5},
doi={10.1109/ICASSP49660.2025.10890522}}
