Text-to-text alignment algorithm for speech recognition error analysis. ErrorAlign helps you dig deeper into your speech recognition projects by accurately aligning each word in a reference transcript with the model-generated transcript. Unlike traditional methods, such as Levenshtein-based alignment, it is not restricted to simple one-to-one alignment, but can map a single reference word to multiple words or subwords in the model output. This enables quick and reliable identification of error patterns in rare words, names, or domain-specific terms that matter most for your application.
→ Update [2025-12-10]: As of version 0.1.0b5, error-align will include a word-level pass to efficiently identify unambiguous matches, along with C++ extensions to accelerate beam search and backtrace construction. The combined speedup is ~15× over the pure-Python implementation ⚡
Contents | Installation | Quickstart | Citation and Research |
pip install error-align
from error_align import error_align
ref = "Some things are worth noting!"
hyp = "Something worth nothing period?"
alignments = error_align(ref, hyp)Resulting alignments:
Alignment(SUBSTITUTE: "Some"- -> "Some"),
Alignment(SUBSTITUTE: -"thing" -> "things"),
Alignment(DELETE: "are"),
Alignment(MATCH: "worth" == "worth"),
Alignment(SUBSTITUTE: "noting" -> "nothing"),
Alignment(INSERT: "period")@article{borgholt2025text,
title={A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems},
author={Borgholt, Lasse and Havtorn, Jakob and Igel, Christian and Maal{\o}e, Lars and Tan, Zheng-Hua},
journal={arXiv preprint arXiv:2509.24478},
year={2025}
}
To reproduce results from the paper:
- Install with extra evaluation dependencies - only supported with Python 3.12:
pip install error-align[evaluation]
- Clone this repository:
git clone https://github.com/corticph/error-align.git
- Navigate to the evaluation directory:
cd error-align/evaluation
- Transcribe a dataset for evaluation. For example:
python transcribe_dataset.py --model_name whisper --dataset_name commonvoice --language_code fr
- Run evaluation script on the output file. For example:
python evaluate_dataset.py --transcript_file transcribed_data/whisper_commonvoice_test_fr.parquet
Notes:
- To reproduce results on the
primock57dataset, first run:python prepare_primock57.py. - Use the
--helpflag to see all available options fortranscribe_dataset.pyandevaluate_dataset.py. - All results reported in the paper are based on the test sets.
Collaborators: