-
Notifications
You must be signed in to change notification settings - Fork 0
Development branch #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
SidorinAnton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
В целом неплохо, но всё-таки немного перемудрил )))
Импорты. Вот если бы ты импортировал сразу функции, то у тебя бы код не раздулся ))
Т.е. вместо additional_modules.run_fastq.gc_filter было бы просто gc_filter.
Ну и + импорты внутри функций почти никогда не делают. Такое бывает, но для специфичных случаев.
Сам фильтратор. Писал в комменте, но тут тоже продублирую, что заменять на "T" не имеет особого смысла )))
Если ты хочешь именно такую реализацию (где фильтры возвращают именно словарь ридов), то можно было бы просто сделать пересечение по ключам ))
Условно:
a = {"1": 1, "2": 2}
b = {"2": 2, "3": 3}
c = {"2": 2, "4": 4}
# 2 прошел все фильтры
intersection = a.keys() & b.keys() & c.keys()
print(intersection)Да, тут нужно было бы тогда погуглить, как сделать пересечение, но это первая ссылка в гугле )))
| amino_acid_weights = { | ||
| 'A': 89, 'R': 174, 'N': 132, 'D': 133, 'C': 121, | ||
| 'E': 147, 'Q': 146, 'G': 75, 'H': 155, 'I': 131, | ||
| 'L': 131, 'K': 146, 'M': 149, 'F': 165, 'P': 115, | ||
| 'S': 105, 'T': 119, 'W': 204, 'Y': 181, 'V': 117 | ||
| } | ||
| most_frequent_codon_for_amino_acid_e_coli = { | ||
| 'A': 'GCT', 'R': 'CGT', 'N': 'AAC', 'D': 'GAC', 'C': 'TGC', | ||
| 'E': 'GAA', 'Q': 'CAG', 'G': 'GGC', 'H': 'CAC', 'I': 'ATC', | ||
| 'L': 'CTG', 'K': 'AAA', 'M': 'ATG', 'F': 'TTC', 'P': 'CCG', | ||
| 'S': 'TCT', 'T': 'ACC', 'W': 'TGG', 'Y': 'TAC', 'V': 'GTT', | ||
| 'a': 'gct', 'r': 'cgt', 'n': 'aac', 'd': 'gac', 'c': 'tgc', | ||
| 'e': 'gaa', 'q': 'cag', 'g': 'ggc', 'h': 'cac', 'i': 'atc', | ||
| 'l': 'ctg', 'k': 'aaa', 'm': 'atg', 'f': 'ttc', 'p': 'ccg', | ||
| 's': 'tct', 't': 'acc', 'w': 'tgg', 'y': 'tac', 'v': 'gtt' | ||
| } | ||
| dict_class_acid = { | ||
| 'hydrophilic': ['t', 'q', 'r', 's', 'y', 'd', 'e', 'g', | ||
| 'c', 'n', 'h', 'k', 'T', 'Q', 'R', 'S', | ||
| 'Y', 'D', 'E', 'G', 'C', 'N', 'H', 'K'], | ||
| 'hydrophobic': ['V', 'W', 'P', 'w', 'v', 'p', 'i', 'F', | ||
| 'f', 'm', 'A', 'a', 'L', 'M', 'l', 'I']} | ||
| dict_charge_acid = { | ||
| 'negative_charge': ['E', 'D', 'e', 'd'], | ||
| 'positive_charge': ['K', 'R', 'H', 'k', 'r', 'h'], | ||
| 'neutral_charge': ['V', 'W', 'P', 'w', 'v', 'p', 'i', 'F', 'f', 'm', 'A', | ||
| 'a', 'L', 'M', 'l', 'I', 'S', 's', 'T', 't', 'N', 'n', | ||
| 'Q', 'q', 'C', 'c', 'Y', 'y', 'G', 'g']} | ||
| aminoacid_dict = { | ||
| 'GLY': 'G', 'ALA': 'A', 'VAL': 'V', 'LEU': 'L', 'ILE': 'I', 'MET': 'M', | ||
| 'PRO': 'P', 'PHE': 'F', 'TRP': 'W', 'SER': 'S', 'THR': 'T', 'ASN': 'N', 'GLN': 'Q', | ||
| 'TYR': 'Y', 'CYS': 'C', 'LYS': 'K', 'ARG': 'R', 'HIS': 'H', 'ASP': 'D', 'GLU': 'E', | ||
| 'gly': 'g', 'ala': 'a', 'val': 'v', 'leu': 'l', 'ile': 'i', 'met': 'm', | ||
| 'pro': 'p', 'phe': 'f', 'trp': 'w', 'ser': 's', 'thr': 't', 'asn': 'n', 'gln': 'q', | ||
| 'tyr': 'y', 'cys': 'c', 'lys': 'k', 'arg': 'r', 'his': 'h', 'asp': 'd', 'glu': 'e', | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Это всё константы, так что нужно капсом
| :return: int | ||
| """ | ||
| sequence_upper = sequence.upper() | ||
| molecular_weight = sum(amino_acid_weights.get(aa, 0) for aa in sequence_upper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Почему .get вместо []? )))
| if percent: | ||
| result_dict = {"Percentage of positively charged amino acids": | ||
| (round((amount_positive * 100) / len(amino_seq))), | ||
| "Percentage of neutrally charged amino acids": | ||
| (round((amount_neutral * 100) / len(amino_seq))), | ||
| "Percentage of negatively charged amino acids": | ||
| (round((amount_negative * 100) / len(amino_seq)))} | ||
| else: | ||
| result_dict = {"Number of positively charged amino acids": amount_positive, | ||
| "Number of neutrally charged amino acids": amount_neutral, | ||
| "Number of negatively charged amino acids": amount_negative} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Не оч хорошо, что возвращается словарь с разными ключами. Ты же скорее всего эту функцию будешь вызывать не только для отображения. Соответственно, в другой функции тебе нужно будет доставать значение по ключу
| if percent: | ||
| result_dict = {'Percentage of hydrophilic amino acids': | ||
| (round((amount_hydrophilic * 100) / len(amino_seq), 2)), | ||
| 'Percentage of hydrophobic amino acids': | ||
| (round((amount_hydrophobic * 100) / len(amino_seq), 2))} | ||
| else: | ||
| result_dict = {'Number of hydrophilic amino acids': amount_hydrophilic, | ||
| 'Number of hydrophobic amino acids': amount_hydrophobic} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Та же история и тут, что ключи различаются
| dna_rna_complement_dict = {'A': 'U', 'G': 'C', 'C': 'G', 'T': 'A', | ||
| 'a': 'u', 'g': 'c', 'c': 'g', 't': 'a'} | ||
| dna_rna_transcribe = {'A': 'A', 'G': 'G', 'C': 'C', 'T': 'U', | ||
| 'a': 'a', 'g': 'g', 'c': 'c', 't': 'u'} | ||
| DNA_COMPLEMENT = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C', | ||
| 'a': 't', 't': 'a', 'c': 'g', 'g': 'c'} | ||
| RNA_COMPLEMENT = {'A': 'U', 'U': 'A', 'C': 'G', 'G': 'C', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
)))
Верхние 2 словаря -- тоже константы ))
| :param args: str or list of sequence of nucleotides and name of function | ||
| :return:str or list of processed sequence | ||
| """ | ||
| import additional_modules.run_dna_rna_tools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ой не, не надо ... импорты внутри функции делаются довольно редко
| :param quality_threshold: threshold value of average read quality for filtering. Default = 0 | ||
| :return: dictionary consisting of filtered fastq sequences matching all filters | ||
| """ | ||
| import additional_modules.run_fastq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ой не, не надо ... импорты внутри функции делаются довольно редко
| if type(sequence) == list: | ||
| for nucleotides in sequence: | ||
| for nucleotide in nucleotides: | ||
| if nucleotide in additional_modules.run_dna_rna_tools.dna_rna_complement_dict.keys() and nucleotides in additional_modules.run_dna_rna_tools.dna_rna_complement_dict.values(): | ||
| raise ValueError('use correct sequence') | ||
| if ('U' or 'u') in nucleotides and (function == 'transcribe' or function == 'complement_transcribe'): | ||
| raise ValueError('use correct sequence') | ||
| else: | ||
| for nucleotides in sequence: | ||
| if nucleotides in additional_modules.run_dna_rna_tools.dna_rna_complement_dict.keys() and nucleotides in additional_modules.run_dna_rna_tools.dna_rna_complement_dict.values(): | ||
| raise ValueError('use correct sequence') | ||
| if ('U' or 'u') in nucleotides and (function == 'transcribe' or function == 'complement_transcribe'): | ||
| raise ValueError('use correct sequence') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Для проверки типов используется функция
isinstance - Чет как-то переусложненный код ))0)0
| seqs_qual_filtered = additional_modules.run_fastq.quality_filter(seqs, quality_threshold=dict_bounds['quality_threshold']) | ||
| seqs_filtered = {} | ||
| for keys, values in seqs.items(): | ||
| if seqs_gc_filtered[keys] == 'T' and seqs_len_filtered[keys] == 'T' and seqs_qual_filtered[keys] == 'T': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Это очень странное поведение, если честно ))
По сути ты мог просто давать 3 словаря, где будут только отфильтрованные риды. Дальше ты бы сделал пересечение => остались бы только те, которые прошли все 3 фильтра ))
| return additional_modules.run_dna_rna_tools.reverse_complement(sequence) | ||
|
|
||
|
|
||
| def run_fastq(seqs: dict, gc_bounds: Union[int, float, tuple] = (0, 100), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Мм? Запустить fastq? ))
Кажется, тут явно не хватает хотя бы слова filter ))
HW5_Slepov. Add module sequence_processing