Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 83 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,83 @@
# misc_module
# misc_module

This module contains assignments for the BI course on python. The scripts include tasks on OOP, API and various functions useful for basic bioinformatics.
The following functions can be found in `bio_files_processor.py`:

```
convert_multiline_fasta_to_oneline
```

Example:

`
convert_multiline_fasta_to_oneline(input_fasta='example_multiline_fasta.fasta')
`

Input file:

![before](https://github.com/sme229/misc_module/assets/104040609/65e68a7a-a47c-4335-8d10-a88387fa3bdd)

After conversion to a single line:

![after](https://github.com/sme229/misc_module/assets/104040609/c85e4283-295e-4689-a156-5c464cec2164)

```
OpenFasta
```

This is a context manager that works with fasta files.

- It returns records as `FastaRecord` class objects
- Includes `read_record` and `read_records` methods

Input and output example:

```fasta
>GTD326487.1 Species anonymous 24 chromosome
ATCGACTACGACTAGCATCACGATCACGATACG
ATGCATCAGTAGCACTAGATCA
```

```python
id = 'GTD326487.1'
description = 'Species anonymous 24 chromosome'
sequence = 'ATCGACTACGACTAGCATCACGATCACGATACGATGCATCAGTAGCACTAGATCA'
```

In biopython_fastq_filter.py the following functions are located:

```
fastq_filter
```

This function uses BioPython and filters fastq sequences by GC content, sequence length and quality score.


```
BiologicalSequence
```

This is an abstract class that includes:

Class NucleicAcidSequence which has `complement` and `gc_content` methods. It's a parent class to DNASequence and RNASequence classes.

Class AminoAcidSequence has `amino_acid_frequency` method.


```
telegram_logger
```

This function send a message from a telegram bot about the status of some process:

![Untitled](https://github.com/sme229/misc_module/assets/104040609/141f1cd1-1430-48c7-b8ab-dda41db214ea)


```
run_genscan
```

This is a python API for this web tool http://hollywood.mit.edu/GENSCAN.html


In `custom_random_forest.py` there is a `RandomForestClassifierCustom` class that works with a custom number of threads which makes it fast.
Loading