Feedback by ArtemVaska · Pull Request #1 · ArtemVaska/UltimateBioinformaticsTools

ArtemVaska · 2023-10-08T10:20:31Z

No description provided.

Add dna_tools.py, protein_tools.py, fastq_tools.py

…dna_rna_tools.py

Add "Possible commands" to run_dna_rna_tools and run_ultimate_protein_tools functions

albidgy

Общие комментарии:

Работа правда хорошая! Сделана аккуратно. Я открывала ее в PyCharm, и он практически ни на что не ругался (в плане стиля кода).
В README не хватает примеров для функций dna_rna_tools и protein_tools. А вот часть про фильтрацию fastq - отличная.
Комментарии к коммитам замечательные!
Обрати внимание на то, как иначе можно было сделать проверку качества ридов.
Постарайся чуть более пристально писать Typing, в некоторых функциях встречается, что не все возможные типы указаны.
Большинство замечаний учтено, но часть комментариев Никиты по обработке аминокислотных последовательностей не учтена.

Баллы:

3 фильтрации FASTQ 3/3
Главная функция 1/1
README 1.8/2 (-0.2 за отсутствие примеров)
Структура репозитория и качество кода 2.9/3 (-0.1 за Typing)
Улучшение кода ДНК/РНК и белковых тулов 0.8/1

Итого: 9.5 баллов

albidgy · 2023-10-24T14:33:02Z

README.md

+```python
+import ultimate_tools as ut
+```


Тогда уже можно было бы написать, что нужно скачать проект с гитхаба git clone... Но это нюанс.

albidgy · 2023-10-24T14:33:06Z

src/fastq_tools.py

@@ -0,0 +1,167 @@
+from typing import List, Tuple, Dict
+


Лучше ставить 2 переноса строки. По PEP8 можно и так, и так, но чаще встречается все же вариант с переносом 2 строк

albidgy · 2023-10-24T14:33:08Z

src/fastq_tools.py

+    'I': [73, 40]
+    }
+
+EXAMPLE_FASTQ = {


Не надо вставлять пример прям в код, смотрится нагроможденно, лучше вынести в README

albidgy · 2023-10-24T14:33:10Z

src/fastq_tools.py

+    - float: GC-content in percentages
+    """
+
+    return (seq.upper().count('G') + seq.upper().count('C')) / len(seq) * 100


Здорово, что учел регистр

albidgy · 2023-10-24T14:33:12Z

src/fastq_tools.py

+    for key in seqs:
+        qual_seq = seqs[key][1]
+        qual_list = []
+        for value in qual_seq:
+            qual_list.append(ASCII_Q_SCORE[value][1])
+        mean_qual = sum(qual_list) / len(qual_list)
+        if mean_qual >= quality_threshold:
+            quality_in_bounds[key] = True
+        else:
+            quality_in_bounds[key] = False
+    return quality_in_bounds


Вот это решение верное, но неудачное. Лучше было не заводить словарь, а погуглить, как это можно автоматизировать. Логика PHRED базируется на том, что качество представляется в виде конкретного символа, значение которого можно получить, сделав перевод в Юникод. Тогда логику функции quality_seq можно сделать так:

def mean_quality_seq(qual_seq): sum_phred = 0 for elem in qual_seq: sum_phred += ord(elem) - 33 mean_quality = sum_phred / len(qual_seq) return mean_quality

Ну и добавить сюда логику для проверки условия по качеству, или вынести в отдельную функцию.

albidgy · 2023-10-24T14:33:27Z

ultimate_tools.py

+    - Dict[str, Tuple[str]]: a dictionary with filtered sequences
+    """
+
+    if isinstance(gc_bounds, (int, float)):  # input check


Кстати, забавный факт, с python3.10 можно еще и так писать:

Suggested change

if isinstance(gc_bounds, (int, float)): # input check

if isinstance(gc_bounds, int | float): # input check

albidgy · 2023-10-24T14:33:29Z

ultimate_tools.py

+    if gc_bounds[1] > 100 or \
+            (gc_bounds[0] > gc_bounds[1]) or \
+            (length_bounds[0] > length_bounds[1]) or \
+            not 0 <= quality_threshold <= 40:  # bounds check
+        raise ValueError('The bounds are indicated incorrectly!')


Здорово, что сделал проверку на корректность входных данных

albidgy · 2023-10-24T14:33:31Z

ultimate_tools.py

+    quality_in_bounds = (
+        fastq_tools.is_quality_in_bounds(seqs, quality_threshold))


Не поняла, зачем здесь внешние скобки, без них работает также

Suggested change

quality_in_bounds = (

fastq_tools.is_quality_in_bounds(seqs, quality_threshold))

quality_in_bounds = fastq_tools.is_quality_in_bounds(seqs, quality_threshold)

albidgy · 2023-10-24T14:33:33Z

ultimate_tools.py

+        fastq_tools.is_quality_in_bounds(seqs, quality_threshold))
+
+    filtered_fastq_seqs = {}
+    for key in seqs:


Вообще в этой функции несколько неоптимально идет проверка. Каждая из 3 функций-проверок иттерируются по словарю с ридами. Лучше цикл for вынести в тело главной функции. Тогда бы не нужны были списки, в которых хранятся True/False, а для каждой последовательности индивидуально принималось решение.

albidgy · 2023-10-24T14:33:35Z

ultimate_tools.py

+        if (seqs_gc_and_len_in_bounds[key][0] and
+                seqs_gc_and_len_in_bounds[key][1] and
+                quality_in_bounds[key]) is True:


Здесь не нужно писать is True. Фактически в условиях if проводится какая-то проверка, которая возвращает True/False. Если значение True, то тогда код заходит в тело if. Твои функции и так содержать болевые значения True/False. Например:

a = 2 if a == 2: -> True some action

Поэтому твою запись можно сделать так:

Suggested change

if (seqs_gc_and_len_in_bounds[key][0] and

seqs_gc_and_len_in_bounds[key][1] and

quality_in_bounds[key]) is True:

if (seqs_gc_and_len_in_bounds[key][0] and

seqs_gc_and_len_in_bounds[key][1] and

quality_in_bounds[key]):

То есть это фактически

if (True and True and True) == True:

ArtemVaska and others added 27 commits October 7, 2023 16:11

Initial commit

c282b23

Create directory src with modules

0df23b6

Add dna_tools.py, protein_tools.py, fastq_tools.py

Add import and constants to fastq_tools.py

729cfe3

Add calculate_gc_content function to fastq_tools.py

886123b

Add is_gc_and_length_in_bounds function to fastq_tools.py

c37cda3

Add is_quality_in_bounds function to fastq_tools.py

f9503fa

Update import in fastq_tools.py

224fbad

Add import to ultimate_tools.py

bc8004c

Update import and add filter_fastq_seqs function in ultimate_tools.py

851554a

Update docstring of filter_fastq_seqs function in ultimate_tools.py

81d03e9

Rename dna_tools.py to dna_rna_tools.py and add constants

9932d8d

Add transcribe, complement, reverse, reverse_complement functions to …

d03ccd1

…dna_rna_tools.py

Update import and add run_dna_rna_tools function in ultimate_tools.py

04f6cfd

Add import and constants in protein_tools.py

52f00c3

Add is_protein_valid function to protein_tools.py

422a5d6

Add get_protein_rnas_number function to protein_tools.py

ddb605a

Add get_length_of_protein function to protein_tools.py

a1a377b

Add count_aa function to protein_tools.py

29c7ef2

Add get_fracture_of_aa function to protein_tools.py

a6aa31b

Update import and add run_ultimate_protein_tools to ultimate_tools.py

3117e5e

Update docstrings in ultimate_tools.py

aacd2b7

Add "Possible commands" to run_dna_rna_tools and run_ultimate_protein_tools functions

Update docstring of run_dna_rna_tools in ultimate_tools.py

f2a6af0

Beautify ultimate_tools.py

4a6a596

Update formatting in docstrings of some functions in ultimate_tools.py

100ada0

Update run_ultimate_protein_tools function in ultimate_tools.py

ce8596f

Update README.md

f78c03b

Update README.md

d66396f

albidgy reviewed Oct 24, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feedback#1

Feedback#1
ArtemVaska wants to merge 27 commits intomainfrom
dev

ArtemVaska commented Oct 8, 2023

Uh oh!

albidgy left a comment

Uh oh!

albidgy Oct 24, 2023

Uh oh!

albidgy Oct 24, 2023

Uh oh!

albidgy Oct 24, 2023

Uh oh!

albidgy Oct 24, 2023

Uh oh!

albidgy Oct 24, 2023

Uh oh!

albidgy Oct 24, 2023

Uh oh!

albidgy Oct 24, 2023

Uh oh!

albidgy Oct 24, 2023

Uh oh!

albidgy Oct 24, 2023

Uh oh!

albidgy Oct 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if isinstance(gc_bounds, (int, float)): # input check
	if isinstance(gc_bounds, int \| float): # input check

		quality_in_bounds = (
		fastq_tools.is_quality_in_bounds(seqs, quality_threshold))

	quality_in_bounds = (
	fastq_tools.is_quality_in_bounds(seqs, quality_threshold))
	quality_in_bounds = fastq_tools.is_quality_in_bounds(seqs, quality_threshold)

Conversation

ArtemVaska commented Oct 8, 2023

Uh oh!

albidgy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants