-
Notifications
You must be signed in to change notification settings - Fork 0
Completed HW6_Files #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Rewrite README with full description of BioSeqTools
SidorinAnton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
В целом неплохо!
Основные моменты:
- Мусор в репозитории!
.DS_Storeи__pycache__. Используй .gitignore - read и readlines. В целом это может работать, но если файл очень большой, то мы можем упасть с ошибкой типа
MemoryError.
Так что всё-таки надежнее читать файлы (особенно биоинформатические) построчно ))
| for amino_acid, count in amino_acid_counts.items(): | ||
| percentage = round(((count / total_amino_acids) * 100), 2) | ||
| amino_acid_percentages[amino_acid] = percentage | ||
| return f'Amino acids percentage of the sequence {seq}: {amino_acid_percentages}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Почему возвращаешь строку? ))
| weight = 18.02 # for the H and OH at the termini | ||
| for amino_acid in seq: | ||
| weight += amino_acid_weights[amino_acid] | ||
| return f'Molecular weight of the sequence {seq}: {round(weight, 2)} Da' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Почему возвращаешь строку? ))
|
|
||
| # Substitute all found values into the formula and calculate pI | ||
| pI = total_pK / count | ||
| return f"Isoelectric point for the sequence {sequence}: {pI}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Почему возвращаешь строку? ))
| sequences = {} | ||
| current_id = None | ||
|
|
||
| for line in fasta_lines: | ||
| line = line.strip() | ||
|
|
||
| if line.startswith('>'): | ||
| current_id = line[1:] | ||
| sequences[current_id] = '' | ||
| else: | ||
| if current_id: | ||
| sequences[current_id] += line | ||
|
|
||
| if output_fasta is None: | ||
| output_fasta = input_fasta + ".fasta" | ||
|
|
||
| with open(output_fasta, 'w') as output_file: | ||
| for seq_id, sequence in sequences.items(): | ||
| output_file.write(f'>{seq_id}\n{sequence}\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Можно проще. Что-нибудь в духе:
with open(INPUT) as inp_fa, open(OUTPUT, "w") as opt_fa:
for line in inp_fa:
if line.startswith(">"):
opt_fa.write("\n") # Можно аккуратнее сделать проверку на первую строку, тогда не будет \n вначале
opt_fa.write(line)
continue
opt_fa.write(line.strip())| @@ -0,0 +1,103 @@ | |||
| def convert_multiline_fasta_to_oneline(input_fasta, output_fasta=None): | |||
| with open(input_fasta, 'r') as input_file: | |||
| fasta_lines = input_file.readlines() | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Не оч хорошо, т.к. если файл очень большой, то можем упасть с ошибкой ))
Лучше делать итерацию по строкам
| n_after=1, output_fasta=None): | ||
|
|
||
| with open(input_gbk, 'r') as gbk_file: | ||
| gbk_data = gbk_file.read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Не оч хорошо, т.к. если файл очень большой, то можем упасть с ошибкой ))
Лучше делать итерацию по строкам
|
|
||
| def change_fasta_start_pos(input_fasta, shift, output_fasta): | ||
| with open(input_fasta, 'r') as input_file: | ||
| lines = input_file.readlines() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Не оч хорошо, т.к. если файл очень большой, то можем упасть с ошибкой ))
Лучше делать итерацию по строкам
|
|
||
| def parse_blast_output(input_file, output_file=None): | ||
| with open(input_file, 'r') as f: | ||
| lines = f.read().splitlines() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Не оч хорошо, т.к. если файл очень большой, то можем упасть с ошибкой ))
Лучше делать итерацию по строкам
| @@ -0,0 +1,45 @@ | |||
| def read_fastq_file(file_path): | |||
| with open(file_path, 'r') as file: | |||
| lines = file.readlines() | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Не оч хорошо, т.к. если файл очень большой, то можем упасть с ошибкой ))
Лучше делать итерацию по строкам
| # Usage example | ||
| run_filter_fastq('./HW6_Files/example_fastq.fastq', gc_bounds=(10, 30), quality_threshold=30, output_filename='filtered_output.fastq') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Не, тут это не надо )))
Если хочется, можно, например, унести в другой файл
No description provided.