Hw 18 #4

Alisa411 · 2024-05-01T12:13:47Z

No description provided.

…for cycle

…an_wunsch

…ctions

…'T' nucleotides

iam28th

Алиса, добрый день!

1. Рефакторинг

Readme и в целом репозиторий весьма опрятные.
Разве что примеры в ноутбуке было бы здорово снадбить пояснениями.

К коду есть ряд претензий: не везде проаннтированы типы, кое-где форматирование поехало, импорты не в правильном порядке; для последнего советую настроить и использовать isort или ruff. Тут -3б

Ещё -1б из-за неточного содержания requirements.txt

11/15

2. Лес

-1б за усложнение/дублирование кода: лучше запускать pool с одним потоком вместо отдельного if-а.
В остальном всё здорово!

24/25

3. Тесты

Тут всё хорошо, только на 2 меньше чем просили в задании, поэтому 8.4/10 баллов.

Итого у вас 43.4/50 за это дз, хорошая работа!

Успехов с защитой в ИБ и в дальнейшей карьере!

iam28th · 2024-05-06T14:28:53Z

README.md

+To install fast_seqs tools you need to clone the git repository using the following command:
+```bash
+git clone git@github.com:Alisa411/fast_seqs.git \
+cd fast_seqs


этого недостаточно чтоб заработало - нужно ещё зависимости через pip поставить

iam28th · 2024-05-06T14:30:59Z

HW18_main.py

+import requests
+import os
+import sys
+import io
+import datetime
+from abc import ABC, abstractmethod
+from dataclasses import dataclass
+from random import choice
+from Bio import SeqIO
+from Bio.SeqUtils import gc_fraction
+from bs4 import BeautifulSoup


Suggested change

import requests

import os

import sys

import io

import datetime

from abc import ABC, abstractmethod

from dataclasses import dataclass

from random import choice

from Bio import SeqIO

from Bio.SeqUtils import gc_fraction

from bs4 import BeautifulSoup

import datetime

import io

import os

import sys

from abc import ABC, abstractmethod

from dataclasses import dataclass

from random import choice

import requests

from Bio import SeqIO

from Bio.SeqUtils import gc_fraction

from bs4 import BeautifulSoup

iam28th · 2024-05-06T19:23:37Z

HW18_main.py

+        Methods:
+            __len__(): Returns the length of the biological sequence.
+            __getitem__(index): Gets the character at the specified index or returns a subsequence.
+            __str__(): Returns a string representation of the biological sequence.


в целом эти дандеры можно наверное было особо не комментировать, они делают плюс-минус то, что от них всегда ожидается (особенно это касается __str__)

iam28th · 2024-05-06T19:24:38Z

HW18_main.py

+        gc_count = sum(base in 'GCgc' for base in self.sequence)
+        total_count = len(self.sequence)
+        gc_content = gc_count / total_count if total_count > 0 else 0
+        return gc_content * 100 if as_percentage else gc_content


в этих классах не хватает type hint-ов

iam28th · 2024-05-06T19:27:10Z

HW18_main.py

+
+    def transcribe(self):
+        transcribed_seq = ''.join(self.TRANSCRIBE_DICT.get(base, base) for base in self.sequence)
+        return transcribed_seq


дело вкуса, но я бы сразу return делал без временной переменной

к слову, прикольное использование второго аргумента в dict.get !

iam28th · 2024-05-06T19:36:14Z

custom_random_forest.py

+            results = [self._fit_tree(arg) for arg in args]
+        else:
+            with Pool(n_jobs) as pool:
+                results = pool.map(self._fit_tree, args)


можно лишний раз не if-ать, Pool при n_jobs == 1 должен корректно отрабатывать

iam28th · 2024-05-06T19:37:27Z

custom_random_forest.py

+            args = [(tree, feat_ids, X) for tree, feat_ids in zip(self.trees, self.feat_ids_by_tree)]
+            with Pool(n_jobs) as pool:
+                results = pool.map(self._predict_proba_tree, args)
+                probas = np.sum(results, axis=0)


здесь тоже первая if-а визуально прям сильно усложняет код

iam28th · 2024-05-06T19:38:36Z

custom_random_forest.py

+        n_jobs = self.n_jobs if n_jobs is None else n_jobs
+        probas = self.predict_proba(X, n_jobs)
+        predictions = np.argmax(probas, axis=1)
+        return predictions


в этом файле не достаёт аннтотаций типов

iam28th · 2024-05-06T19:39:27Z

requirements.txt

+urllib3==2.2.1
+
+
+


пустые строки ни к чему
и вроде бы здесь много лишнего (кажется, вы не используете pandas или scipy)

iam28th · 2024-05-06T19:46:01Z

test_module.py

+    url = "http://argonaute.mit.edu/cgi-bin/genscanw_py.cgi"
+    response = requests.post(url)
+
+    assert response.status_code == 200


здесь либо переименовать тест (мол что он что именно доступность сервера проверяет), либо использовать ту же самую (глобальную) переменную , которая используется в run_genscan

потому из того, что в тесте вы захордкодили правильную ссылку, не обязательно следует, что в другом месте вы захардкодили тоже правильную ссылку)

Alisa411 and others added 30 commits October 8, 2023 15:26

Initial commit

0f4d9a9

add gc_content function

ed1312f

add seq_length function

d3316d0

add comment for seq_length function

4bafafa

add mean_encoding_offset function

a777063

fix return in seq_length function

862ae1a

fix var in gc_content function and add main function

b3e5d9f

delete example of usage

40e7678

move file from fast_seqs to data_processing_script folder

88c6097

fix bags in transcribe, complement, reverse_complement functions and …

bb73cd0

…for cycle

move fastq.py to the data_processing_script directory

2916b38

add das_protein_tools.py file

4029176

fix bugs in the functions and delete build_scoring_matrix and needlem…

d032f19

…an_wunsch

add dictionaries as protein_dict.py file

517c30f

add set of letters corresponding to amino acids

cea8cba

correct the name of AA_LETTERS set

63ce202

fix bugs corresponded to import protein_dict.py

939fa7b

create a new dictionary for dna/rna sequences

03d00be

add dictionaries for transcribe and complement functions

74c3c87

delete dictionaries

f5c0e56

add import of dna_rna_dict

477a767

add dictionaries for dna and rna letters

9f96328

delete function determining protein sequence

ede9207

add dockstrings to each function

f85eea1

correct input as only dna or rna sequences in main and transcribe fun…

bf40edd

…ctions

fix bug in main function: now cannot take sequence with both 'U' and …

6574845

…'T' nucleotides

add import of dna_rna_dict module

0ed9813

add main_script.py file for writing main functions

33526e0

correct the main function

d8918a2

change the name file to fastq_tools.py

b5cf7bd

Alisa411 and others added 29 commits February 25, 2024 14:21

Update README.md

d3bb160

Add example of fastq file

fe02ea2

Update HW14_main.py

f07c6fb

Add new script with class RandomForestClassifierCustom

2659999

Update class RandomForestClassifierCustom

7f4b1f7

update filter_fastq function

ba632c6

update def gc_content in class NucleicAcidSequence

38318a8

update def complement in class NucleicAcidSequence

e53adbf

add run_genscan function

53fc56a

add telegram_logger function

8cd3f1a

update telegram_logger function

6d4e2f5

update telegram_logger function

23a5b8d

rename main script

bba0e79

add FastaRecord and OpenFasta classes

2a8f04c

removed HW14_main.py

e89fa7f

delete unnecessary libraries

4192cf1

add Showcases notebook

4506583

add test file

5fb0a93

add 6 tests

f239bd2

rename test_module.py

9cb1b02

add dataclasses library

716c797

add data folder with fasta files

9e4c5c6

move example_fastq.fastq to the data folder

637a831

add requirements.txt file

1744b17

update examples in Showcases.ipynb

9778cd3

Delete test.py

64edd8b

Update README.md

67bb0cc

Update HW18_main.py

e6e1ec9

Update custom_random_forest.py

6fa6be2

iam28th reviewed May 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hw 18 #4

Hw 18 #4

Uh oh!

Alisa411 commented May 1, 2024

Uh oh!

iam28th left a comment •

edited

Loading

Uh oh!

iam28th May 6, 2024

Uh oh!

iam28th May 6, 2024

Uh oh!

iam28th May 6, 2024

Uh oh!

iam28th May 6, 2024

Uh oh!

iam28th May 6, 2024

Uh oh!

iam28th May 6, 2024

Uh oh!

iam28th May 6, 2024

Uh oh!

iam28th May 6, 2024

Uh oh!

iam28th May 6, 2024

Uh oh!

iam28th May 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		urllib3==2.2.1

Hw 18 #4

Are you sure you want to change the base?

Hw 18 #4

Uh oh!

Conversation

Alisa411 commented May 1, 2024

Uh oh!

iam28th left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

1. Рефакторинг

2. Лес

3. Тесты

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

iam28th left a comment •

edited

Loading