File input for BSAT #2

grishchenkoira · 2023-10-17T19:41:11Z

In this branch we configured the scripts to work with files as input.

The FASTQ module has been improved
A new script bio_files_processor.py which works with standard formats from genomic databases has been created

P.S. The commit was made to the main branch so as not to accept the previous request and not lose comments on it.

… of this one

…STQ-seq

…n to new file

…_Tool.py

…essor.py

…r.py

…ocessor.py

…cessor.py

…t.py

…gbk_to_fasta

nvaulin

Привет!

Хорошая работа

Хороший README. В целом в таких больших ридми можно не прям все сопровождать примерами, а какую-то часть.
Хорошие названия коммитов.
Чтение и запись FASTQ-файлов в фильтраторе кажется работает не совсем так как нужно. Как минимум на выходе окаызываются все сиквенсы целиком). Хотя кажется почти все работает правильно, где-то надо что-то подправить.
Две другие функции работают тоже хорошо, это вообще супер, ты молодец.
Очень очень много комментариев по коду. Посмотри все пожалуйста. У тебя почти все окей, но вот такие мелкие моментики не дают довести это все дело до идеала. Где то где комменты повторяются я уже не стал их отмечать по много раз.

Баллы

Добработка FASTQ-модуля: 1/2 балла
convert_multiline_fasta_to_oneline: 4/4 балла
select_genes_from_gbk_to_fasta: 4/4 балла

-0.5 за общее качество кода, вещи по типу проверок, нелогичных неймингов и т д. Старался везде помечать где как лучше сделать.

В любом случае - очень понравилась работа:)

Итого: 8.5 баллов

nvaulin · 2023-10-21T14:10:44Z

Bio_Seq_Analysis_Tool.py

+from modules_for_BSAT import fastq_analysis as fq
+from modules_for_BSAT  import dna_rna_analysis as na
+from modules_for_BSAT import protein_analysis as pa


Выглядит красиво! Только наверное na не очень хорошее название, сразу кажется что это что-то из статистики

nvaulin · 2023-10-21T14:11:26Z

Bio_Seq_Analysis_Tool.py

+        if operation in OPERATION_DICT.keys():
+            analysis.append(OPERATION_DICT[operation](seq))
+        else:
+            raise ValueError(f'Wrong operation!')


Suggested change

raise ValueError(f'Wrong operation!')

raise ValueError(f'Unsupported operation {operation}!')

Чуть добавим заботы о пользователе:)

nvaulin · 2023-10-21T14:12:30Z

Bio_Seq_Analysis_Tool.py

+def analyse_fastq(input_path: str, 
+                  gc_bounds: Union[int, float, Tuple [int], Tuple [float]] = (0, 100), 
+                  length_bounds: Union[int, Tuple [int]] = (0, 2**32),
+                  quality_threshold: float = 0.0, filtered_file_name: Union[None, str] = None) -> Dict[str,str]:


Аннотация просто мощь. Только название - все таки мы их не анаизируем, а фильтруем - filter_fastq

nvaulin · 2023-10-21T14:13:37Z

Bio_Seq_Analysis_Tool.py

+    :raises ValueError: if sequence not RNA or DNA, also if the argument values are outside the allowed ones
+    """
+    seqs, path_to_file = fq.read_fastq(input_path)
+    file_name = path_to_file.split("/")[-1]


Это ок, но мало ли у человека винда. Для этого есть вообще крутая штука:

Suggested change

file_name = path_to_file.split("/")[-1]

file_name = os.basename(path_to_file)

nvaulin · 2023-10-21T14:14:07Z

Bio_Seq_Analysis_Tool.py

+    seqs, path_to_file = fq.read_fastq(input_path)
+    file_name = path_to_file.split("/")[-1]
+    if type(gc_bounds) == float or type(gc_bounds) == int:
+        gc_bounds = (0,gc_bounds)


Suggested change

gc_bounds = (0,gc_bounds)

gc_bounds = (0, gc_bounds)

nvaulin · 2023-10-21T15:29:34Z

Bio_Files_Processor.py

+        for key, item in gene_and_seq.items():
+            new_seq_fasta.write(key + '\n')
+            new_seq_fasta.write(item + '\n')
+    return 'All sequences processed!'


👍
В IT еще обычно это дело делают через print, а возвращают код 1 или 0
Хотя тут не обязательно в целом что-то возвращать:)

nvaulin · 2023-10-21T15:30:58Z

Bio_Files_Processor.py

+    gene_and_seq = dict()
+    with open (input_fasta) as seq_fasta:
+        for line in seq_fasta:
+            if gene == 1:


Не совсем понятно что значит gene 1 или 0. То есть я понимаю что для тебя это флаг, и тут это ок, но можно было бы его как то более информативно задать. Типо is_new_gene, там True / False, что-то в таком духе.
Опять же, так ок, просто пища для размышлений

nvaulin · 2023-10-21T15:39:11Z

Bio_Files_Processor.py

+    if input_gbk.find('.gbk') == 0:
+        raise ValueError(f'Wrong file format in input!')
+    if os.path.exists(os.path.join('.', 'Analyzed_data')) == False:
+        os.mkdir(os.path.join('.', 'Analyzed_data'))


Тут аналогичные комменты

nvaulin · 2023-10-21T15:46:49Z

Bio_Files_Processor.py

+    for el in genes_for_search:
+        genes_for_search_in_gbk += [gn for gn in genes_gbk if el in gn]


Тут немного конечно праздник двухбуквенных переменных)))
Но в целом окей:)

nvaulin · 2023-10-21T16:03:31Z

README.md

+
+## Contact
+
+*This is the repo for the 5th homework of the BI Python 2023 course*


Не только:)

grishchenkoira added 25 commits October 7, 2023 23:46

Innitial commit for dna_rna_analysis.py

91e416f

Innitial commit for fastq_analysis.py

52772ec

Innitial commit for protein_analysis

edf755b

Add python script with all functions for this module

b247115

Add python script with all functions for this module

78dc3b5

Add python script with all functions for protein module

c6ade48

Initail commit for main script of Bio_Seq_Analysis_Tool

caee4ae

Add forder with required modules for Bio_Seq_Analysis_Tool.py

10e061b

Add python script with all functions into Bio_Seq_Analysis_Tool

1de06fd

Add README for Bio_Seq_Analysis_Tool module with detailed description…

be10c57

… of this one

Add def for read FASTQ-seq and def for creating file with filtered FA…

247f2ab

…STQ-seq

Add into def analyze_fastq reading data from a file and writing retur…

f24c092

…n to new file

Fixs bag in def read_fastq and write_fastq

8222e3f

Fix bags in def analyse_fastq

8333b66

Include boundaries for analysis in fastq_analysis in Bio_seq_Analysis…

c7b0739

…_Tool.py

Initial commit for Bio_Files_Processor.py

6c5a702

Add import of standard modules in Bio_Files_Processor.py

8665b41

Add function 'convert_multiline_fasta_to_oneline' into Bio_Files_Proc…

f9af641

…essor.py

Add function 'select_genes_from_gbk_to_fasta' into Bio_Files_Processo…

16fa99a

…r.py

Add corrections to the description of the functions into Bio_Files_Pr…

52eceb8

…ocessor.py

Fix output_fasta parametr in 'Convert...' function into Bio_Files_Pro…

4735049

…cessor.py

Fix input parametr in 'Convert...' function into Bio_Files_Processor.py

fea880d

Add data format check in select_genes_from_gbk_to_fasta

0aea07f

Add data format check in convert_multiline_fasta_to_oneline

640a391

Add information about Bio_Files_Processor.py into README.md

180e726

grishchenkoira closed this Oct 17, 2023

grishchenkoira reopened this Oct 17, 2023

grishchenkoira closed this Oct 17, 2023

grishchenkoira deleted the file_input_for_BSAT branch October 17, 2023 19:46

grishchenkoira restored the file_input_for_BSAT branch October 17, 2023 19:47

grishchenkoira deleted the file_input_for_BSAT branch October 17, 2023 19:47

grishchenkoira restored the file_input_for_BSAT branch October 17, 2023 19:48

grishchenkoira reopened this Oct 17, 2023

grishchenkoira closed this Oct 17, 2023

grishchenkoira reopened this Oct 17, 2023

grishchenkoira added 3 commits October 17, 2023 22:52

Delete modules_for_BSAT/.ipynb_checkpoints/protein_analysis-checkpoin…

db4d2ea

…t.py

Delete modules_for_BSAT/.ipynb_checkpoints/dna_rna_analysis-checkpoin…

fe68710

…t.py

Delete modules_for_BSAT/.ipynb_checkpoints/fastq_analysis-checkpoint.py

9e52d57

grishchenkoira changed the title ~~File input for bsat~~ File input for BSAT Oct 17, 2023

grishchenkoira added 2 commits October 18, 2023 23:00

Rewritten code to add complete protein sequence in select_genes_from_…

a1d5be1

…gbk_to_fasta

Rewrite function to generate a FASTA-file

dca7a00

nvaulin reviewed Oct 21, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File input for BSAT #2

File input for BSAT #2

Uh oh!

grishchenkoira commented Oct 17, 2023 •

edited

Loading

Uh oh!

nvaulin left a comment

Uh oh!

nvaulin Oct 21, 2023

Uh oh!

nvaulin Oct 21, 2023

Uh oh!

nvaulin Oct 21, 2023

Uh oh!

nvaulin Oct 21, 2023

Uh oh!

nvaulin Oct 21, 2023

Uh oh!

nvaulin Oct 21, 2023

Uh oh!

nvaulin Oct 21, 2023

Uh oh!

nvaulin Oct 21, 2023

Uh oh!

nvaulin Oct 21, 2023

Uh oh!

nvaulin Oct 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	raise ValueError(f'Wrong operation!')
	raise ValueError(f'Unsupported operation {operation}!')

	file_name = path_to_file.split("/")[-1]
	file_name = os.basename(path_to_file)

		for el in genes_for_search:
		genes_for_search_in_gbk += [gn for gn in genes_gbk if el in gn]


		## Contact

		This is the repo for the 5th homework of the BI Python 2023 course

File input for BSAT #2

Are you sure you want to change the base?

File input for BSAT #2

Uh oh!

Conversation

grishchenkoira commented Oct 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nvaulin left a comment

Choose a reason for hiding this comment

Баллы

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

grishchenkoira commented Oct 17, 2023 •

edited

Loading