Bioinf dev tools hw6 #2

VovaGrig · 2023-10-24T00:16:26Z

No description provided.

…naries.py

…ionaries.py

…torein_tools.py

SidorinAnton

В целом неплохо!
Основной момент (по файлам) -- в контекстном менеджере (блоке with) можно открыть сразу несколько файлов через запятую.

SidorinAnton · 2023-11-20T19:52:45Z

src/fastq_filter.py

+        count = 0
+        for line in seqs_file:
+            count += 1
+            if count == 1 or (count - 1) % 4 == 0:
+                seqs[line.strip()] = []
+            if count % 2 == 0 and count % 4 != 0:
+                seqs[list(seqs)[-1]].append(line.strip())
+            if count % 4 == 0:
+                seqs[list(seqs)[-1]].append(line.strip())


Очень сложно ))
Мы же можем обнулять счетчик => явно проверять с 0, 1 и 3

SidorinAnton · 2023-11-20T19:54:02Z

src/fastq_filter.py

+    if verbose != True and verbose != False:
+        raise ValueError("Invalid *verbose* argument given")


Имхо, эта проверка не нужна. Более того, если хочется явно проверять на bool, то лучше использовать isinstance

SidorinAnton · 2023-11-20T19:54:39Z

src/fastq_filter.py

+            raise ValueError("Invalid quality sequence given")
+    if verbose != True and verbose != False:
+        raise ValueError("Invalid *verbose* argument given")
+    return seqs, gc_bounds, length_bounds, quality_threshold, verbose, output_filename


В таких случаях лучше возвращать словарь

SidorinAnton · 2023-11-20T19:58:25Z

src/fastq_filter.py

+        gc_result = is_gc_good(seq, gc_bounds, verbose)
+        if gc_result:
+            len_result = is_len_good(seq, length_bounds, verbose)
+            if len_result:
+                qual_result = is_qual_good(seq_qual, quality_threshold, verbose)
+                if qual_result:
+                    seqs_filtered[seq_name] = seqs[seq_name]


good? ))) Скорее тогда что-нить в духе passed

Тут не критично, но такая архитектура не оч, т.к. если добавится еще 20 фильтров, то код далеко уедет

SidorinAnton · 2023-11-20T19:59:34Z

src/fastq_filter.py

+    return seqs_filtered
+
+
+NEW_LINE = "\n"  # needed for output in f-strings


Мм? )))
В f-строках спокойно можно использовать \n ...

print(f"Lol{1234}\nKek")

Спасибо! Я кажется гуглил и вообщем до 3.12 вроде как нельзя, потому сделал как посоветовали
https://stackoverflow.com/questions/44780357/how-can-i-use-newline-n-in-an-f-string-to-format-output

SidorinAnton · 2023-11-20T20:12:00Z

src/protein_tools.py

+def three_one_letter_code(sequences: (tuple[str] or list[str])) -> list:
+    """
+    Reverse the protein sequences from one-letter to three-letter format and vice-versa
+
+    Case 1: get three-letter sequence\n
+    Use one-letter amino-acids sequences of any letter case
+
+    Case 2: get one-letter sequence\n
+    Use three-letter amino-acid separated by "-" sequences.
+    Please note that sequences without "-" are parsed as one-letter code sequences\n
+    Example: for sequence "Ala" function will return "Ala-leu-ala"
+
+    Arguments:
+    - sequences (tuple[str] or list[str]): protein sequences to convert\n
+    Example: ["WAG", "MkqRe", "msrlk", "Met-Ala-Gly", "Met-arg-asn-Trp-Ala-Gly", "arg-asn-trp"]
+
+    Return:
+    - list: one-letter/three-letter protein sequences\n
+    Example: ["Met-Ala-Gly", "Met-arg-asn-Trp-Ala-Gly", "arg-asn-trp", "WAG", "MkqRe", "rlk"]
+    """
+    inversed_sequences = []
+    for sequence in sequences:
+        inversed_sequence = []
+        if "-" not in sequence:
+            for letter in sequence:
+                if letter.islower():
+                    inversed_sequence.append(
+                        dictionaries.AMINO_ACIDS[letter.capitalize()].lower()
+                    )
+                else:
+                    inversed_sequence.append(dictionaries.AMINO_ACIDS[letter])
+            inversed_sequences.append("-".join(inversed_sequence))
+        else:
+            aa_splitted = sequence.split("-")
+            for aa in aa_splitted:
+                aa_index = list(dictionaries.AMINO_ACIDS.values()).index(
+                    aa.capitalize()
+                )
+                if aa[0].islower():
+                    inversed_sequence.append(
+                        list(dictionaries.AMINO_ACIDS.keys())[aa_index].lower()
+                    )
+                else:
+                    inversed_sequence.append(
+                        list(dictionaries.AMINO_ACIDS.keys())[aa_index]
+                    )
+            inversed_sequences.append("".join(inversed_sequence))
+    return inversed_sequences


Архитектура не оч ))
Если хочется сделать конвертацию 1 -> 3 и 3 -> 1, то лучше написать 2 отдельные функции.

SidorinAnton · 2023-11-20T20:12:18Z

src/protein_tools.py

+    - dictionary: sequences (str] as keys , starting positions for presented motif (list) as values\n
+        Example: {"AMGAGW": [2], "GAWSGRAGA": [0, 7]}
+    """
+    new_line = "\n"


зачем? )))

SidorinAnton · 2023-11-20T20:17:22Z

src/protein_tools.py

+    if nucl_acids == "RNA":
+        del nucl_acid_seqs["DNA"]
+    if nucl_acids == "DNA":
+        del nucl_acid_seqs["RNA"]


Зачем? )))
Если хочется возвращать только одну н.к., то тогда имеет смысл сразу кидать значение (список)

SidorinAnton · 2023-11-20T20:26:38Z

bio_files_processor.py

+    with open(input_fasta, "r") as input:
+        with open(output_path, "w"):
+            while True:
+                line = input.readline().strip()
+                print(line)
+                if not line:
+                    break
+                if not line.startswith(">"):
+                    line = line[shift:] + line[:shift]
+                with open(output_path, "a") as output:
+                    output.write(line + "\n")


Не оч понимаю, а почему нельзя было открыть сразу на чтение и запись, чтоб потом не открывать на дозапись? with open(...) as ..., open(..., "w") as ...:

Зачем while? Почему просто не итерироваться по строкам? ))

SidorinAnton · 2023-11-20T20:28:38Z

bio_files_processor.py

+    with open(input_fasta, "r") as input:
+        with open(output_path, "w"):
+            read = []
+            while True:
+                line = input.readline().strip()
+                if not line:
+                    break
+                if line.startswith(">"):
+                    line += "\n"
+                    if read:
+                        with open(output_path, "a") as output:
+                            output.write("".join(read) + "\n")
+                    read = [line]
+                else:
+                    read.append(line)
+            with open(output_path, "a") as output:
+                output.write("".join(read))


Опять сложно ))

with open(INPUT) as inp_fa, open(OUTPUT, "w") as opt_fa: for line in inp_fa: if line.startswith(">"): opt_fa.write("\n") # Можно аккуратнее сделать проверку на первую строку, тогда не будет \n вначале opt_fa.write(line) continue opt_fa.write(line.strip())

VovaGrig added 27 commits October 7, 2023 17:57

Create bioinf_tools.py

0ebdc00

Create bioinf_modules with dna_rna_tools.py, protein_tools.py, dictio…

bb282ed

…naries.py

Increase performance,add dockstrings to dna_rna_tools.py, modify dict…

a9106af

…ionaries.py

Increase performance, delete lower case in all dictionaries, update p…

eb189cb

…torein_tools.py

Create fastq_tools.py, functions and dockstrings within

0acfa30

Rename bioinf_modules to src

7671daa

Increase performance, update bioinf_tools.py

cd6bdcc

Rename fastq_tools.py to fastq_filter.py, update bioinf_tools.py

9edb943

Update README.md

3e68daa

Update README.md

c84a045

Update README.md

34b1e88

Update README.md

c2cac33

Update README.md

4b614be

Delete useless files

f243add

Add fixes from PR

62bb4b8

Add fasta file parsing to dict in fastq_filter.py

24afd1a

Fix fasta file parsing to dict in fastq_filter.py

419d187

Fix fasta file parsing to dict in fastq_filter.py

683a12a

Add save_filtered_seqs function in fastq_filter.py

b95b0f2

Update docstrings

ec4623d

Update docstrings

ccca0f3

Add bio_files_processor.py with 3 functions

93addf3

Update bio_files_processor.py with 3

ca5abed

Update fastq_filter.py

313ed97

Update bio_files_processor.py

f8bcddf

Update bio_files_processor.py

49756b1

Update README.md

21dce21

SidorinAnton reviewed Nov 20, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bioinf dev tools hw6 #2

Bioinf dev tools hw6 #2

Uh oh!

VovaGrig commented Oct 24, 2023

Uh oh!

SidorinAnton left a comment

Uh oh!

SidorinAnton Nov 20, 2023

Uh oh!

SidorinAnton Nov 20, 2023

Uh oh!

SidorinAnton Nov 20, 2023

Uh oh!

SidorinAnton Nov 20, 2023

Uh oh!

SidorinAnton Nov 20, 2023

Uh oh!

VovaGrig Nov 20, 2023

Uh oh!

SidorinAnton Nov 20, 2023

Uh oh!

SidorinAnton Nov 20, 2023

Uh oh!

SidorinAnton Nov 20, 2023

Uh oh!

SidorinAnton Nov 20, 2023

Uh oh!

SidorinAnton Nov 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if verbose != True and verbose != False:
		raise ValueError("Invalid verbose argument given")

		return seqs_filtered


		NEW_LINE = "\n" # needed for output in f-strings

Bioinf dev tools hw6 #2

Are you sure you want to change the base?

Bioinf dev tools hw6 #2

Uh oh!

Conversation

VovaGrig commented Oct 24, 2023

Uh oh!

SidorinAnton left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants