🧬 FastAAExtractor

FastAAExtractor is a simple, fast tool for extracting amino acid sequences from bacterial genomes using coordinate tables.

Features

Easy batch processing: just point to your genome and coordinate folders
Automatic matching: genome and coordinate files are paired by name
Parallel extraction for speed
Flexible gene selection
Works on Windows, Mac, Linux

Quick Start

1. Install

pip install -r requirements.txt
pip install -e .

2. Extract from a folder of genomes

fasta_aa_extractor --genome-dir input/genomes/ --coords-dir input/coords/ --genes "acrA,acrB,tolC" --parallel --max-workers 8 --output-dir results/

--genome-dir: folder with .fasta or .fa files
--coords-dir: folder with .tsv files (named to match genomes)
--genes: comma-separated list or @genes.txt file
--parallel: enables fast parallel processing
--max-workers: number of parallel jobs (default: 4)
--output-dir: where to save results

3. Output

Creates one .faa file per genome-gene pair: GenomeName_GeneName.faa
Standard FASTA format, ready for BLAST, alignment, etc.

Example

fasta_aa_extractor --genome-dir input/ --coords-dir input/ --genes "acrA,acrB" --parallel --output-dir proteins/

Need help?

See QUICKSTART.md for more details
For advanced options, run: fasta_aa_extractor --help

Simple. Fast. No CSVs. Just point and extract!

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
scripts		scripts
src/fasta_aa_extractor		src/fasta_aa_extractor
tests		tests
tools		tools
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
COMPREHENSIVE_ASSESSMENT.md		COMPREHENSIVE_ASSESSMENT.md
CONTRIBUTING.md		CONTRIBUTING.md
Comprehensive_Gene_Analysis.xlsx		Comprehensive_Gene_Analysis.xlsx
EFFLUX_PUMPS_ANALYSIS.md		EFFLUX_PUMPS_ANALYSIS.md
EXTRACTION_SUMMARY.md		EXTRACTION_SUMMARY.md
FastaAAExtractor.ipynb		FastaAAExtractor.ipynb
Gene_Presence_Absence_Matrix.xlsx		Gene_Presence_Absence_Matrix.xlsx
LICENSE		LICENSE
MASTER_Gene_Presence_Absence.xlsx		MASTER_Gene_Presence_Absence.xlsx
QUICKSTART.md		QUICKSTART.md
README.md		README.md
Target_5_Genes_Matrix.xlsx		Target_5_Genes_Matrix.xlsx
check_duplicates.py		check_duplicates.py
create_master_excel.py		create_master_excel.py
create_target_matrix.py		create_target_matrix.py
gene_list.txt		gene_list.txt
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧬 FastAAExtractor

Features

Quick Start

1. Install

2. Extract from a folder of genomes

3. Output

Example

Need help?

About

Uh oh!

Releases

Packages

Languages

License

vihaankulkarni29/FastaAAExtractor

Folders and files

Latest commit

History

Repository files navigation

🧬 FastAAExtractor

Features

Quick Start

1. Install

2. Extract from a folder of genomes

3. Output

Example

Need help?

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages