Development branch #1

IuriiSl · 2023-10-08T11:23:06Z

HW5_Slepov. Add module sequence_processing

Fixed line breaks

SidorinAnton

В целом неплохо, но всё-таки немного перемудрил )))
Импорты. Вот если бы ты импортировал сразу функции, то у тебя бы код не раздулся ))
Т.е. вместо additional_modules.run_fastq.gc_filter было бы просто gc_filter.
Ну и + импорты внутри функций почти никогда не делают. Такое бывает, но для специфичных случаев.

Сам фильтратор. Писал в комменте, но тут тоже продублирую, что заменять на "T" не имеет особого смысла )))
Если ты хочешь именно такую реализацию (где фильтры возвращают именно словарь ридов), то можно было бы просто сделать пересечение по ключам ))
Условно:

a = {"1": 1, "2": 2}
b = {"2": 2, "3": 3}
c = {"2": 2, "4": 4}
# 2 прошел все фильтры
intersection = a.keys() & b.keys() & c.keys()
print(intersection)

Да, тут нужно было бы тогда погуглить, как сделать пересечение, но это первая ссылка в гугле )))

SidorinAnton · 2023-10-30T12:49:42Z

additional_modules/run_aminoacid_seq.py

+amino_acid_weights = {
+        'A': 89, 'R': 174, 'N': 132, 'D': 133, 'C': 121,
+        'E': 147, 'Q': 146, 'G': 75, 'H': 155, 'I': 131,
+        'L': 131, 'K': 146, 'M': 149, 'F': 165, 'P': 115,
+        'S': 105, 'T': 119, 'W': 204, 'Y': 181, 'V': 117
+    }
+most_frequent_codon_for_amino_acid_e_coli = {
+        'A': 'GCT', 'R': 'CGT', 'N': 'AAC', 'D': 'GAC', 'C': 'TGC',
+        'E': 'GAA', 'Q': 'CAG', 'G': 'GGC', 'H': 'CAC', 'I': 'ATC',
+        'L': 'CTG', 'K': 'AAA', 'M': 'ATG', 'F': 'TTC', 'P': 'CCG',
+        'S': 'TCT', 'T': 'ACC', 'W': 'TGG', 'Y': 'TAC', 'V': 'GTT',
+        'a': 'gct', 'r': 'cgt', 'n': 'aac', 'd': 'gac', 'c': 'tgc',
+        'e': 'gaa', 'q': 'cag', 'g': 'ggc', 'h': 'cac', 'i': 'atc',
+        'l': 'ctg', 'k': 'aaa', 'm': 'atg', 'f': 'ttc', 'p': 'ccg',
+        's': 'tct', 't': 'acc', 'w': 'tgg', 'y': 'tac', 'v': 'gtt'
+    }
+dict_class_acid = {
+    'hydrophilic': ['t', 'q', 'r', 's', 'y', 'd', 'e', 'g',
+                    'c', 'n', 'h', 'k', 'T', 'Q', 'R', 'S',
+                    'Y', 'D', 'E', 'G', 'C', 'N', 'H', 'K'],
+    'hydrophobic': ['V', 'W', 'P', 'w', 'v', 'p', 'i', 'F',
+                    'f', 'm', 'A', 'a', 'L', 'M', 'l', 'I']}
+dict_charge_acid = {
+    'negative_charge': ['E', 'D', 'e', 'd'],
+    'positive_charge': ['K', 'R', 'H', 'k', 'r', 'h'],
+    'neutral_charge': ['V', 'W', 'P', 'w', 'v', 'p', 'i', 'F', 'f', 'm', 'A',
+                       'a', 'L', 'M', 'l', 'I', 'S', 's', 'T', 't', 'N', 'n',
+                       'Q', 'q', 'C', 'c', 'Y', 'y', 'G', 'g']}
+aminoacid_dict = {
+    'GLY': 'G', 'ALA': 'A', 'VAL': 'V', 'LEU': 'L', 'ILE': 'I', 'MET': 'M',
+    'PRO': 'P', 'PHE': 'F', 'TRP': 'W', 'SER': 'S', 'THR': 'T', 'ASN': 'N', 'GLN': 'Q',
+    'TYR': 'Y', 'CYS': 'C', 'LYS': 'K', 'ARG': 'R', 'HIS': 'H', 'ASP': 'D', 'GLU': 'E',
+    'gly': 'g', 'ala': 'a', 'val': 'v', 'leu': 'l', 'ile': 'i', 'met': 'm',
+    'pro': 'p', 'phe': 'f', 'trp': 'w', 'ser': 's', 'thr': 't', 'asn': 'n', 'gln': 'q',
+    'tyr': 'y', 'cys': 'c', 'lys': 'k', 'arg': 'r', 'his': 'h', 'asp': 'd', 'glu': 'e',
+}


Это всё константы, так что нужно капсом

SidorinAnton · 2023-10-30T12:50:15Z

additional_modules/run_aminoacid_seq.py

+    :return: int
+    """
+    sequence_upper = sequence.upper()
+    molecular_weight = sum(amino_acid_weights.get(aa, 0) for aa in sequence_upper)


Почему .get вместо []? )))

SidorinAnton · 2023-10-30T12:50:46Z

additional_modules/run_aminoacid_seq.py

+    if percent:
+        result_dict = {"Percentage of positively charged amino acids":
+                           (round((amount_positive * 100) / len(amino_seq))),
+                       "Percentage of neutrally charged amino acids":
+                           (round((amount_neutral * 100) / len(amino_seq))),
+                       "Percentage of negatively charged amino acids":
+                           (round((amount_negative * 100) / len(amino_seq)))}
+    else:
+        result_dict = {"Number of positively charged amino acids": amount_positive,
+                       "Number of neutrally charged amino acids": amount_neutral,
+                       "Number of negatively charged amino acids": amount_negative}


Не оч хорошо, что возвращается словарь с разными ключами. Ты же скорее всего эту функцию будешь вызывать не только для отображения. Соответственно, в другой функции тебе нужно будет доставать значение по ключу

SidorinAnton · 2023-10-30T12:51:50Z

additional_modules/run_aminoacid_seq.py

+    if percent:
+        result_dict = {'Percentage of hydrophilic amino acids':
+                           (round((amount_hydrophilic * 100) / len(amino_seq), 2)),
+                       'Percentage of hydrophobic amino acids':
+                           (round((amount_hydrophobic * 100) / len(amino_seq), 2))}
+    else:
+        result_dict = {'Number of hydrophilic amino acids': amount_hydrophilic,
+                       'Number of hydrophobic amino acids': amount_hydrophobic}


Та же история и тут, что ключи различаются

SidorinAnton · 2023-10-30T12:52:32Z

additional_modules/run_dna_rna_tools.py

+dna_rna_complement_dict = {'A': 'U', 'G': 'C', 'C': 'G', 'T': 'A',
+                           'a': 'u', 'g': 'c', 'c': 'g', 't': 'a'}
+dna_rna_transcribe = {'A': 'A', 'G': 'G', 'C': 'C', 'T': 'U',
+                                 'a': 'a', 'g': 'g', 'c': 'c', 't': 'u'}
+DNA_COMPLEMENT = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C',
+                  'a': 't', 't': 'a', 'c': 'g', 'g': 'c'}
+RNA_COMPLEMENT = {'A': 'U', 'U': 'A', 'C': 'G', 'G': 'C',


)))
Верхние 2 словаря -- тоже константы ))

SidorinAnton · 2023-10-30T13:02:51Z

sequence_processing.py

+    :param args: str or list of sequence of nucleotides and name of function
+    :return:str or list of processed sequence
+    """
+    import additional_modules.run_dna_rna_tools


Ой не, не надо ... импорты внутри функции делаются довольно редко

SidorinAnton · 2023-10-30T13:02:55Z

sequence_processing.py

+    :param quality_threshold: threshold value of average read quality for filtering. Default = 0
+    :return: dictionary consisting of filtered fastq sequences matching all filters
+    """
+    import additional_modules.run_fastq


Ой не, не надо ... импорты внутри функции делаются довольно редко

SidorinAnton · 2023-10-30T13:04:21Z

sequence_processing.py

+    if type(sequence) == list:
+        for nucleotides in sequence:
+            for nucleotide in nucleotides:
+                if nucleotide in additional_modules.run_dna_rna_tools.dna_rna_complement_dict.keys() and nucleotides in additional_modules.run_dna_rna_tools.dna_rna_complement_dict.values():
+                    raise ValueError('use correct sequence')
+                if ('U' or 'u') in nucleotides and (function == 'transcribe' or function == 'complement_transcribe'):
+                    raise ValueError('use correct sequence')
+    else:
+        for nucleotides in sequence:
+            if nucleotides in additional_modules.run_dna_rna_tools.dna_rna_complement_dict.keys() and nucleotides in additional_modules.run_dna_rna_tools.dna_rna_complement_dict.values():
+                raise ValueError('use correct sequence')
+            if ('U' or 'u') in nucleotides and (function == 'transcribe' or function == 'complement_transcribe'):
+                raise ValueError('use correct sequence')


Для проверки типов используется функция isinstance

Чет как-то переусложненный код ))0)0

SidorinAnton · 2023-10-30T13:06:15Z

sequence_processing.py

+    seqs_qual_filtered = additional_modules.run_fastq.quality_filter(seqs, quality_threshold=dict_bounds['quality_threshold'])
+    seqs_filtered = {}
+    for keys, values in seqs.items():
+        if seqs_gc_filtered[keys] == 'T' and seqs_len_filtered[keys] == 'T' and seqs_qual_filtered[keys] == 'T':


Это очень странное поведение, если честно ))
По сути ты мог просто давать 3 словаря, где будут только отфильтрованные риды. Дальше ты бы сделал пересечение => остались бы только те, которые прошли все 3 фильтра ))

SidorinAnton · 2023-10-30T13:08:05Z

sequence_processing.py

+        return additional_modules.run_dna_rna_tools.reverse_complement(sequence)
+
+
+def run_fastq(seqs: dict, gc_bounds: Union[int, float, tuple] = (0, 100),


Мм? Запустить fastq? ))
Кажется, тут явно не хватает хотя бы слова filter ))

IuriiSl and others added 6 commits October 8, 2023 12:11

Add file run_aminoacid_seq.py

410c271

Add file run_dna_rna_tools.py

fd7289f

Add file run_fastq.py

799df57

Add file sequence_processing.py

80f16d7

Edited README.md

779acec

Update README.md

cb98979

Fixed line breaks

SidorinAnton reviewed Oct 30, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development branch #1

Development branch #1

Uh oh!

IuriiSl commented Oct 8, 2023

Uh oh!

SidorinAnton left a comment

Uh oh!

SidorinAnton Oct 30, 2023

Uh oh!

SidorinAnton Oct 30, 2023

Uh oh!

SidorinAnton Oct 30, 2023

Uh oh!

SidorinAnton Oct 30, 2023

Uh oh!

SidorinAnton Oct 30, 2023

Uh oh!

SidorinAnton Oct 30, 2023

Uh oh!

SidorinAnton Oct 30, 2023

Uh oh!

SidorinAnton Oct 30, 2023

Uh oh!

SidorinAnton Oct 30, 2023

Uh oh!

SidorinAnton Oct 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return additional_modules.run_dna_rna_tools.reverse_complement(sequence)


		def run_fastq(seqs: dict, gc_bounds: Union[int, float, tuple] = (0, 100),

Development branch #1

Are you sure you want to change the base?

Development branch #1

Uh oh!

Conversation

IuriiSl commented Oct 8, 2023

Uh oh!

SidorinAnton left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants