SubScan is a lightweight Python command-line tool designed to extract amino acid substitutions from EMBOSS alignment files. It simplifies comparison between protein sequences—especially for AMR gene variation studies across lab isolates.
- ✅ Extracts amino acid substitutions from pairwise EMBOSS alignments
- ✅ Outputs clean reports in Excel (.xlsx) format
- ✅ Automatically labels gene names using filenames
- ✅ Process single files or entire directories
- ✅ Cross-platform support (Windows, macOS, Linux)
- ✅ Command-line interface with flexible options
- Antimicrobial resistance (AMR) gene comparisons
- Mutational analysis across multiple protein alignments
- Easy visualization and tabulation of substitutions
- High-throughput screening of sequence variants
- EMBOSS alignment files (
.txt,.aln,.align) - Each file contains 3-line aligned blocks:
Ref 1 MIKLSTLAIL 10 ||||||.||| Query 1 MIKLSTTAIL 10
A single Excel spreadsheet with:
| Gene Name | Substitution |
|---|---|
| acrB | I348T |
| tetA | L219F |
| marR | P20L |
- Python 3.7 or higher
- pip (Python package installer)
pip install -r requirements.txtOr install manually:
pip install pandas openpyxlProcess a single alignment file:
python SubScan.py alignments/acrB.alnProcess all alignment files in a directory:
python SubScan.py alignments/Specify custom output filename:
python SubScan.py alignments/ -o my_results.xlsxusage: SubScan.py [-h] [-o OUTPUT] input
positional arguments:
input Path to alignment file or directory containing alignment files
optional arguments:
-h, --help Show help message and exit
-o OUTPUT, --output OUTPUT
Output Excel filename (default: Amino_Acid_Substitution_Report.xlsx)
python SubScan.py data/acrB.alnOutput:
🔬 SubScan - Amino Acid Substitution Analyzer
==================================================
📂 Processing file: acrB.aln
✓ Found 1 substitution(s)
✅ Report saved as 'Amino_Acid_Substitution_Report.xlsx'
📊 Processed 1 file(s), found 1 result(s)
python SubScan.py alignments/ -o amr_analysis_2024.xlsxSubScan/
├── SubScan.py # Main application script
├── requirements.txt # Python dependencies
├── README.md # This file
├── LICENSE # MIT License
├── .gitignore # Git ignore rules
└── examples/ # Sample data and documentation
├── sample_alignment.aln
└── USAGE.md
# Test with sample data
python SubScan.py examples/sample_alignment.alnContributions are welcome! Please feel free to submit a Pull Request.
- Python 3.7+
- pandas >= 2.0.0
- openpyxl >= 3.1.0
Issue: "No module named 'pandas'"
pip install pandas openpyxlIssue: "Permission denied"
- Ensure you have write permissions in the output directory
- Try specifying a different output path using
-o
Issue: "No files found in directory"
- Check that alignment files have extensions:
.txt,.aln, or.align - Verify the directory path is correct
This project is licensed under the MIT License - see the LICENSE file for details.
Vihaan Kulkarni – Bioinformatics Enthusiast
- Built with pandas for data manipulation
- Uses openpyxl for Excel generation
- Designed for EMBOSS alignment output format
For issues, questions, or contributions, please open an issue on GitHub.
Note: This is a standalone command-line tool. The original Google Colab version is no longer maintained. For local usage, please follow the installation instructions above.