-
Notifications
You must be signed in to change notification settings - Fork 0
Dev hw5 to dev #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
file for new functions
added functionality for file processing
update complement function for better adequacy
modified: modules/fastq_tools.py
writing filtered sequence to a file is wrapped in a function
added pycache and .ipynb
modified: modules/fastq_tools.py
reading sequence for filtering from a file is wrapped into a function
update transcribe function for better adequacy
added convert_multiline_fasta_to_oneline
added info about bio_files_processor.py and its functions
added functions for bio_files_processor: write_genes_seq_to_fasta and find_genes_with_neighbors
added parse_blast_output
modified: modules/fastq_tools.py
bug fixed in output file creation
|
Привет! 🧚 Хорошая работа, сделано много! Замечания оставила в комментариях. Распределение баллов:
Итоговые баллы за задание: 85б + 1 доп Ты молодец! Учти все замечания, и дальше только лучше! |
| from os import path | ||
| from sys import exit | ||
| from pathlib import Path | ||
| import modules.dna_rna_tools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Для четкости лучше импортировать конкретные функции, которые в модуле используются.
|
|
||
| path_to_write = Path("filtered", f"{output_fastq}") | ||
| if path_to_write.is_file(): | ||
| is_overwtire = input(f"The file {path_to_write} already exists, want to overwrite it? Y/N") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Лучше такие побочные эффекты не прописывать в модулях - представим, что мы хотим использовать какую-то функцию, а программа вдруг зависает и ждет от нас какие-то значения для ввода. Лучше вызывать ошибку или предупреждение в таком случае (например, использование warnings.warn). Cкрытые побочные эффекты являются частой причиной сложно отлаживаемых ошибок в коде
| status = True | ||
|
|
||
| with open(input_fastq, "r") as file: | ||
| while status: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Всегда осторожно с такими циклами - легко улететь в бесконечный.
| rez_gc_bounds = {} | ||
|
|
||
| for key in seqs.keys(): | ||
| if (gc_bounds[0] <= modules.fastq_tools.gc_count(seqs[key][0])) and (modules.fastq_tools.gc_count(seqs[key][0]) <= gc_bounds[1]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Почему бы не импортировать сразу модуль fastq_tools?
Код очень сложно читаем.
| if (gc_bounds[0] <= modules.fastq_tools.gc_count(seqs[key][0])) and (modules.fastq_tools.gc_count(seqs[key][0]) <= gc_bounds[1]): | ||
| rez_gc_bounds[key] = seqs[key] | ||
|
|
||
| rez_length_bounds = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Супер неоптимально - мы записываем в три разных переменных результаты проверки одной последовательности. Лучше сразу просто проверить последовательность на все фильтры.
| line = file.readline().strip() | ||
|
|
||
| with open(path_to_write, "w") as file_w: | ||
| for k in rez.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Почему бы сразу не итеррироваться по значениям?
for k, seq in rez.items()
| 'translation': current_translation | ||
| } | ||
|
|
||
| in_cds = line.startswith(' CDS') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
А если будет меньше пробелов?
| # For future use: | ||
| # to avoid new-parsing the if the data is already collected in a convenient form (dictionary) | ||
| with open(f'{input_gbk.split(".")[0]}.json', 'w', encoding='utf-8') as f: | ||
| json.dump(genes_parsed, f, ensure_ascii=False, indent=4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Можно сохранит просто в формате .txt
| file_w.write(k+'\n') | ||
| file_w.write(''.join(rez[k])+'\n') | ||
|
|
||
| def parse_blast_output(input_gbk: str, genes: Union[int, tuple, list], output_fasta: str, n_before: int = 1, n_after: int = 1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Получился какой-то гибрид - две функции в одной. Должны быть две различные функции.
| 'translation': current_translation | ||
| } | ||
|
|
||
| #print(genes_parsed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Пустой список получился.
Dev hw5 to dev (hw4)