BioSeqTool

English version

BioSeqTool

NASeqTool is a bioinformatics tool for analyzing Biological sequences.

The repository consists of a main script are used for DNA/RNA sequence processing and sequencing data processing. The scripts are written in the Python programming language.

Installation

Clone the repository to a local device.

git clone git@github.com:Valeriisht/BioSeqTool.git

The utility was designed to run on Python version 3.9 or newer.

BioSeqTool

A utility for manipulating Biologucal molecules. The script is written using object-oriented programming.

Biological sequences

Usage

You will need:

DNA/RNA or amino sequences are list (or string when performing an operation on a single sequence), which are required parameters for function calls.

The utility has five Classes:

Polymer Sequence
- Nuclein Acid Sequence
  - DNA Sequence
  - RNA Sequence
- Amino Acid Sequence

The utility can perform the following operations:

Function reverse in Nuclein Acid Sequence class

Returns the complementary DNA strand in the 3' to 5' direction. For an RNA molecule, cDNA is returned.

Function reverse_complement in Nuclein Acid Sequence class

Returns the complementary DNA/cDNA strand in the 5' to 3' direction.

Function gc_content in Nuclein Acid Sequence class

Counts the GC composition of the sequence and returns the percentage of GC nucleotides.

Function Motif in Amino Sequence class.

Finds a specific motif in a protein molecule

Function transcribe in DNA Sequence class.

The meaningful DNA sequence is translated into an mRNA molecule.

Function is_na_sequence in RNA Sequence class.

Checks whether the sequence is DNA or RNA.

Function length_cds in RNA Sequence class

Determines the length of the CDS coding sequence.

filter_fastq_tool

The utility for analyzing sequencing data in fastq format. The function is written using the functionality from BioPython

Using:

SeqIO is a Biopython module that allows to read and write sequence files in various formats such as FASTA, FASTQ, GenBank
SeqUtils provides various functions for working with biological sequences (includes tools for analyzing and manipulating DNA, RNA, and protein sequences - GC-content, weight)
SeqRecord is an object that contains a sequence and information about it. It allows the storage of biological sequences

The utility performs the steps below:

Filtering the sequence with regard to gc_composition. The gc- composition is counted and determines if the value is within the thresholds. By default, the threshold values are defined from 0 to 100.
Filtering the sequence with regard to length. Checks whether the length of the sequence is within the specified threshold values. By default, the threshold values are defined from 0 to 2**32.
Sequence filtering with respect to sequence quality. Checks whether the quality on the phred33 scale is within the specified threshold values. The sequence quality is translated from phred33 scale to ASCII. By default, the threshold value is 0.

The result of the program is an output_file containing only those reads that have passed each of the three filters.

bio_files_processor

This utility is located in the root of the repository and is designed to work with large .gbk, .fasta, .txt files.

Usage

You will need:

A .fasta file for the convert_multiline_fasta_to_oneline function.
A .txt file for the parse_blast_output function.
A .gbk file for the select_genes_from_gbk_to_fasta function.

The utility performs the following actions:

Function convert_multiline_fasta_to_oneline

Converts fasta format from multiline to single line format

The function accepts 2 arguments (input_fasta and output_fasta - the default parameter). Reads the input fasta file, saves it to a new fasta file where each sequence is written to a single line.

Function parse_blast_output

Parses the file resulting after alignment in the Blast program. Pulling the name of the best match with the database and saving the results in a single file.

The function takes 2 arguments (input_file, output_file) as input. The program reads the given txt file, for each QUERY best match against the base on the first line from the Description column. The set of obtained proteins is saved to a new file with one column sorted alphabetically.

Function select_genes_from_gbk_to_fasta

Detect a certain number of genes before and after each gene of interest and save their protein sequence into a new fasta file. The fasta-file can be sent to the input of BLAST explicitly.

The function takes as input input_gbk (path to the input GBK file), genes (genes of interest near which neighbors are searched), n_before, n_after (number of genes before and after (>=1)), output_fasta (name of the output file).

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
image		image
.gitignore		.gitignore
NASeqTool.py		NASeqTool.py
README.md		README.md
bio_files_processor.py		bio_files_processor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioSeqTool

Installation

BioSeqTool

Biological sequences

Usage

The utility has five Classes:

The utility can perform the following operations:

Function `reverse` in Nuclein Acid Sequence class

Function `reverse_complement` in Nuclein Acid Sequence class

Function `gc_content` in Nuclein Acid Sequence class

Function `Motif` in Amino Sequence class.

Function `transcribe` in DNA Sequence class.

Function `is_na_sequence` in RNA Sequence class.

Function `length_cds` in RNA Sequence class

filter_fastq_tool

The utility performs the steps below:

bio_files_processor

Usage

The utility performs the following actions:

Function `convert_multiline_fasta_to_oneline`

Function `parse_blast_output`

Function `select_genes_from_gbk_to_fasta`

About

Uh oh!

Releases

Packages

Languages

Valeriisht/BioSeqTool

Folders and files

Latest commit

History

Repository files navigation

BioSeqTool

Installation

BioSeqTool

Biological sequences

Usage

The utility has five Classes:

The utility can perform the following operations:

Function reverse in Nuclein Acid Sequence class

Function reverse_complement in Nuclein Acid Sequence class

Function gc_content in Nuclein Acid Sequence class

Function Motif in Amino Sequence class.

Function transcribe in DNA Sequence class.

Function is_na_sequence in RNA Sequence class.

Function length_cds in RNA Sequence class

filter_fastq_tool

The utility performs the steps below:

bio_files_processor

Usage

The utility performs the following actions:

Function convert_multiline_fasta_to_oneline

Function parse_blast_output

Function select_genes_from_gbk_to_fasta

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Function `reverse` in Nuclein Acid Sequence class

Function `reverse_complement` in Nuclein Acid Sequence class

Function `gc_content` in Nuclein Acid Sequence class

Function `Motif` in Amino Sequence class.

Function `transcribe` in DNA Sequence class.

Function `is_na_sequence` in RNA Sequence class.

Function `length_cds` in RNA Sequence class

Function `convert_multiline_fasta_to_oneline`

Function `parse_blast_output`

Function `select_genes_from_gbk_to_fasta`

Packages