Skip to content

Iterate#27

Merged
khetherin merged 44 commits intoEBIvariation:mainfrom
khetherin:iterate
Jan 22, 2026
Merged

Iterate#27
khetherin merged 44 commits intoEBIvariation:mainfrom
khetherin:iterate

Conversation

@khetherin
Copy link
Collaborator

@khetherin khetherin commented Jan 19, 2026

- created a function called create_coordinate_range
- created a test called  test_create_coordinate_range
…pty only the mandatory VCF headers are printed
…checking for duplicates and merging duplicates
@khetherin khetherin requested review from tcezard January 19, 2026 10:28
# Conflicts:
#	README.md
#	convert_gvf_to_vcf/convertGVFtoVCF.py
#	convert_gvf_to_vcf/vcfline.py
#	tests/test_vcfline.py
generate_info_field_symbolic_allele, generate_info_field_for_imprecise_variant, generate_info_field_for_precise_variant.
added new function convert_to_ci_bound to avoid repetition
Comment on lines +481 to +495
merge_or_kept_vcf_objects = get_list_of_merged_vcf_objects(list_of_vcf_objects, samples)
# identify if duplicates are present after merging
has_dups, chrom_pos_list = has_duplicates(merge_or_kept_vcf_objects)
# while duplicates are present, merge, then re-check for dups
max_iterations = 100
iteration = 0
list_of_vcf_objects_to_be_filtered = merge_or_kept_vcf_objects
while has_dups and iteration < max_iterations:
filtered_merge_or_kept_vcf_objects = filter_duplicates_by_merging(chrom_pos_list, has_dups,
list_of_vcf_objects,
list_of_vcf_objects_to_be_filtered, samples)
has_dups, chrom_pos_list = has_duplicates(filtered_merge_or_kept_vcf_objects)
iteration += 1
list_of_vcf_objects_to_be_filtered = filtered_merge_or_kept_vcf_objects
logger.info(f"Iteration of merge (remove dups): {iteration}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason why the merging algorithm not capable of removing all the merge in one pass ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The merge function compares the current line with the previous line (it is limited to 2 lines in its comparison and merge). For example:
The lines in the file: lineA, lineB, lineC
After the merge: lineAandB, lineC
After another iteration: lineAandBandC

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The merge function compares the current line with the previous line (it is limited to 2 lines in its comparison and merge).

Ok so now that you have identified the limitation you should work removing it rather engineer something around.
The issue is in get_list_of_merged_vcf_objects where you compare and merge separately.
A better algorithm would be:

  • For all line starting with line 2
    • take the current and previous line and compare them
    • if they are equal:
      • merge and set the merge result as the previous line
    • otherwise set the current line as the previous line

khetherin and others added 4 commits January 22, 2026 09:50
Co-authored-by: Timothee Cezard <tcezard@ebi.ac.uk>
Co-authored-by: Timothee Cezard <tcezard@ebi.ac.uk>
@khetherin khetherin merged commit 140cf35 into EBIvariation:main Jan 22, 2026
1 check passed
@khetherin khetherin deleted the iterate branch January 22, 2026 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants