FastAAExtractor is a simple, fast tool for extracting amino acid sequences from bacterial genomes using coordinate tables.
- Easy batch processing: just point to your genome and coordinate folders
- Automatic matching: genome and coordinate files are paired by name
- Parallel extraction for speed
- Flexible gene selection
- Works on Windows, Mac, Linux
pip install -r requirements.txt
pip install -e .fasta_aa_extractor --genome-dir input/genomes/ --coords-dir input/coords/ --genes "acrA,acrB,tolC" --parallel --max-workers 8 --output-dir results/--genome-dir: folder with.fastaor.fafiles--coords-dir: folder with.tsvfiles (named to match genomes)--genes: comma-separated list or@genes.txtfile--parallel: enables fast parallel processing--max-workers: number of parallel jobs (default: 4)--output-dir: where to save results
- Creates one
.faafile per genome-gene pair:GenomeName_GeneName.faa - Standard FASTA format, ready for BLAST, alignment, etc.
fasta_aa_extractor --genome-dir input/ --coords-dir input/ --genes "acrA,acrB" --parallel --output-dir proteins/- See QUICKSTART.md for more details
- For advanced options, run:
fasta_aa_extractor --help
Simple. Fast. No CSVs. Just point and extract!