🧬DORi:

Determine the characteristics of amino acid sequences, Obtaine processed DNA or RNA, Remove low-quality fastq sequences

Dori contains tools that were written while doing home tasks in Python course in Bioinformatics Institute (OOP, API, testing, multiprocessing).

🐠dori.py:

The idea of this module is to make life easier for experimenters and bioinformatics working with nucleic acids, amino acid sequences (both long and short), as well as fastq sequences.

The module includes:

RNASequence/DNASequence/AminoAcidSequence/BiologicalSequence/NucleicAcidSequence - designed to manipulate with three main bioinformatics data types - DNA, RNA and proteins. Methods:

complement - returns a complementary sequence
reverse - returns a reversed sequence
gc_content - calculate GC-content of DNAsequence or RNAsequence
transcribe - returns a transcribed sequence (for DNAsequence only)
count_mol_weight - calculate count_mol_weight of AminoAcidSequence

filter_read - this function is designed to filter a fastq file by specified parameters(read length, gc content, quality)

telegram_logger decorator - a decorator that allows you to send information about the function of interest via telegram bot. The information includes the result of run, time of running, stdout and stderr output in logfile. This function is implemented based on Telegram API without specific libraries.

run_genscan - makes queries on the Genscan website to predict possible CDS, exons, and introns in the sequence of interest (input: file or string). Output: object GenscanOutput. Also, Intron and Exon classes are implemented for more convenient use of the received data.

MeasureTime - context manager for measuring time

🧬bio_files_processor:

This module is designed to work with bioinformatic files, in particular with fasta, gbk, blast output.

convert_multiline_fasta_to_oneline — this module is designed to translate a multi-line sequence entry in a fasta file into a single-line view.
select_genes_from_gbk_to_fasta — this module is designed to isolate a certain number of genes before and after the gene of interest and save their protein sequence (translation) to a fasta file.
change_fasta_start_pos - this module is designed to shift the starting position in the sequence and write a sequence file with a shifted start.
parse_blast_output - this module is designed to work with files received after blast processing. The module allows you to get the best match with the database for each sequence that has been analyzed in blast. The output is a file with a list of protein names that are closest to the sequence that was analyzed in blast.
OpenFasta - context manager for iterating on the fasta file. To open fasta file and return individual FASTA records including id, description and sequence. The implementation of the OpenFasta context manager is similar to the built-in open context manager.
FastaRecord - dataclass for storing Fasta

🌲custom_random_forest.py:

RandomForestClassifierCustom - this class represents a custom implementation of the classifier using the random forest algorithm. Additionally, this class supports the use of multiple threads (the n_jobs parameter) to speed up the fitting and prediction process.

🛠️test_dori.py:

The script test_dori.py it includes functions for testing the tools presented above.

Example:

Examples of running some programs are presented in Showcases.ipynb.

I would like to express my deep gratitude to the team of the Institute of Bioinformatics. Because thanks to them, this repository was born and began to grow up❤️

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data		data
README.md		README.md
Showcases.ipynb		Showcases.ipynb
bio_files_processor.py		bio_files_processor.py
custom_random_forest.py		custom_random_forest.py
dori.py		dori.py
requarement.txt		requarement.txt
test_dori.py		test_dori.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬DORi:

Determine the characteristics of amino acid sequences, Obtaine processed DNA or RNA, Remove low-quality fastq sequences

🐠dori.py:

🧬bio_files_processor:

🌲custom_random_forest.py:

🛠️test_dori.py:

Example:

About

Uh oh!

Releases

Packages

Languages

JuliGen/Dori

Folders and files

Latest commit

History

Repository files navigation

🧬DORi:

Determine the characteristics of amino acid sequences, Obtaine processed DNA or RNA, Remove low-quality fastq sequences

🐠dori.py:

🧬bio_files_processor:

🌲custom_random_forest.py:

🛠️test_dori.py:

Example:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages