Skip to content

shameful huge commit complete overhaul#33

Open
jsture wants to merge 3 commits intomainfrom
32-update-pipeline-to-ml-corrected
Open

shameful huge commit complete overhaul#33
jsture wants to merge 3 commits intomainfrom
32-update-pipeline-to-ml-corrected

Conversation

@jsture
Copy link
Contributor

@jsture jsture commented Dec 19, 2025

  • Use ML-corrected WGS, 23411 not 23410, which comes with some major changes
  • Fixed the "find VCF block file" to use the available UKB helper_file and only generate a table once for reference
  • Set array_required_elements to True
  • Moved all filtering before checkpoint. Likely massive speedup with much less I/O
  • Print counts only after checkpoints
  • Removed all code related to VEP
  • .annotations and .setlist file no longer generated here
  • Fixed some writing / uploading logic to avoid hadoop fs snafu
  • Also moved bad setlist logic into matrixtables.py. Deprecated broke functions

@jsture jsture self-assigned this Dec 19, 2025
@jsture jsture linked an issue Dec 19, 2025 that may be closed by this pull request
"This notebook performs extraction, quality control, and formatting of UK Biobank Whole Genome Sequencing (WGS) data for specific genes of interest. It is optimized for the **DRAGEN 500k Release** (pVCF format).\n",
"\n",
"### VEP is gone\n",
"VEP functionality has been removed from this notebook since it was brittle and heavy. Currently, .annotation and .setlist files have to be built manually until someone steps up and vibe codes an actual notebook based on the exported _variants.tsv\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we leave it in tho. i will be needing it soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update pipeline to ML-corrected

2 participants