The bioseq package is designed to facilitate bioinformatics operations on DNA, RNA and amino acid sequences and provides a robust object-oriented framework for sequence manipulation.
- Filtering fastq sequences
- Operations on DNA and RNA sequences: transcription, complementation, reversal, reverse complementation, and GC content calculation
- Analysis of aminoacid sequences, including hydrophobicity calculationn
Before usage ensure you have Python installed with the required dependencies (biopython, numpy):
pip install biopython numpyfiltered = filter_fastq("example.fastq", gc_bounds=(40, 60), length_bounds=(50, 150), quality_threshold=20)
if isinstance(filtered, dict):
for name, (seq, qual) in filtered.items():
print(f"ID: {name}, Seq: {seq[:10]}..., Quality: {qual[:10]}...")
else:
print(filtered)
from bio_seq import DNASequence, RNASequence
# DNA example
dna = DNASequence("GATTACA")
print(dna) # GATTACA
print(dna.complement()) # CTAATGT
print(dna.reverse()) # ACATTAG
print(dna.reverse_complement()) # TGTAATC
print(dna.count_gc()) # 28.57
rna = dna.transcribe()
print(rna) # GAUUACA
# RNA example
rna = RNASequence("AUGCGU")
print(rna) # AUGCGU
print(rna.complement()) # UACGCUfrom bio_seq import AminoAcidSequence
protein = AminoAcidSequence("MILVFW")
print(protein) # MILVFW
print(protein.calculate_hydrophobicity()) # 2.72This bio_seq package now based an OOP approach, making bioinformatics analyses more modular and extensible nevertheless it can be enhanced with additional functionalities and more advanced features as needed.
Guys, I can't take it anymore...
