Given two biological sequences, a query sequence (q) of length n and a target sequence (t) of length m, the goal of sequence alignment is to find a sequence of operations that maps one sequence to another while minimizing a given cost function. Classical dynamic programming (DP) algorithms solve this problem by computing and storing multiple DP matrices, three for gap-affine and five for dual gap-affine alignments, each of size (n+1) × (m+1). These matrices are later traced back to recover the optimal alignment.
This repository contains implementations of the Singletrack backtrace algorithm presented here. Singletrack enables efficient backtracking in gap-affine and dual gap-affine sequence alignment requiring storing only a single DP matrix while preserving exact alignment results. This reduces memory usage by 3× for gap-affine and 5× for dual gap-affine alignments and improves memory hierarchy utilization on modern hardware, often translating into faster execution.
The repository provides three implementations of Singletrack, each with its own README explaining usage:
- basic: A basic C++ implementation of the classical dynamic programming algorithm with and without using Singletrack.
- WFA2-lib: Singletrack integrated into WFA2-lib, commit
2ec28919af3a1b545acfb38e9bdefd160a87f266. - KSW2: Singletrack integrated into KSW2, commit
289609bd9e5381a13b16239d0a7703f1ff03f9ca.
The datasets used in our paper are available in Zenodo.
All aligners in this repository expect a dataset file containing one sequence per line, organized as pairs. The first sequence in each pair (target) starts with >, and the second (query) starts with <. For example:
>AA
<AG
>ACAC
<CTG
This repository includes a tool for generating synthetic sequence datasets, located in tools/python.
Usage example:
python3 tools/python/generate_dataset.py -n 1000 -l 100 -e 0.1This command generates a dataset of 1,000 pairs of DNA sequences, each approximately 100 bases long, with an error rate of 10%. The error rate indicates the proportion of edit operations (insertions, deletions, or substitutions) needed to transform one sequence into its paired sequence.
Lorién López-Villellas, Cristian Iñiguez, Albert Jiménez-Blanco, Quim Aguado-Puig, Miquel Moretó, Jesús Alastruey-Benedé, Pablo Ibáñez, Santiago Marco-Sola. Singletrack: An Algorithm for Improving Memory Consumption and Performance of Gap-Affine Sequence Alignment, bioRXiv, 2025.
@article{LpezVillellas2025,
title = {Singletrack: An Algorithm for Improving Memory Consumption and Performance of Gap-Affine Sequence Alignment},
url = {http://dx.doi.org/10.1101/2025.10.31.685625},
DOI = {10.1101/2025.10.31.685625},
publisher = {Cold Spring Harbor Laboratory},
author = {López-Villellas, Lorién and Iñiguez, Cristian and Jiménez-Blanco, Albert and Aguado-Puig, Quim and Moretó, Miquel and Alastruey-Benedé, Jesús and Ibáñez, Pablo and Marco-Sola, Santiago},
year = {2025},
month = nov
}