This repository contains a few different files - each tuned for certain requirements.
├── Throughput_PairedSingleSampleWf_optimized.inputs.json → WGS Throughput JSON file
├── Latency_PairedSingleSampleWf_optimized.inputs.json → WGS Latency JSON file
├── Exome_2T_PairedSingleSampleWf_optimized.inputs.json → WES Throughput JSON file
├── Exome_56T_PairedSingleSampleWf_optimized.inputs.json → WES Latency JSON file
├── PairedSingleSampleWf_noqc_nocram_optimized.wdl → WGS WDL optimized for on-prem
├── Exome_PairedSingleSampleWf_noqc_nocram_optimized.wdl → WES WDL optimized for on-prem
Modify the following lines in the WDL files to reflect the paths where datasets reside in your cluster:
- PairedSingleSampleWf_noqc_nocram_optimized.wdl
- PairedSingleSampleWf_noqc_nocram_withcleanup_optimized.wdl
- Exome_PairedSingleSampleWf_noqc_nocram_optimized.wdl
In the JSON files, modify the paths to the datasets and tools where they reside in your cluster.
Example: modify Latency_PairedSingleSampleWf_optimized.inputs.json for tools directory.
The datasets used for the WGS workflow turning can be obtained from: https://console.cloud.google.com/storage/browser/broad-public-datasets/NA12878/unmapped/.
Contact Broad/Intel for access to the WES data needed for this workflow.
The other reference files and resource files can be downloaded from:
| Data Type | Filename | File Path | |
| Reference Genome |
ref_dict | Homo_sapiens_assembly38.dict | https://console.cloud.google.com/storage/browser/broad-references/hg38/v0 |
| ref_fasta | Homo_sapiens_assembly38.fasta | ||
| ref_fasta_index | Homo_sapiens_assembly38.fasta.fai | ||
| ref_alt | Homo_sapiens_assembly38.fasta.64.alt | ||
| ref_sa | Homo_sapiens_assembly38.fasta.64.sa | ||
| ref_amb | Homo_sapiens_assembly38.fasta.64.amb | ||
| ref_bwt | Homo_sapiens_assembly38.fasta.64.bwt | ||
| ref_ann | Homo_sapiens_assembly38.fasta.64.ann | ||
| ref_pac | Homo_sapiens_assembly38.fasta.64.pac | ||
| contamination_sites_ud | Homo_sapiens_assembly38.contam.UD | ||
| contamination_sites_bed | Homo_sapiens_assembly38.contam.bed | ||
| contamination_sites_mu | Homo_sapiens_assembly38.contam.mu | ||
| Resource Files |
dbSNP_vcf | Homo_sapiens_assembly38.dbsnp138.vcf | |
| dbSNP_vcf_index | Homo_sapiens_assembly38.dbsnp138.vcf.idx | ||
| known_snps_sites_vcf | Mills_and_1000G_gold_standard.indels.hg38.vcf.gz | ||
| known_snps_sites_vcf_index | Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi | ||
| known_indels_sites_VCFs | Mills_and_1000G_gold_standard.indels.hg38.vcf.gz | ||
| Homo_sapiens_assembly38.known_indels.vcf.gz | |||
| known_indels_sites_indices | Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi | ||
| Homo_sapiens_assembly38.known_indels.vcf.gz.tbi | |||
| Interval Files |
wgs_calling_interval_list | wgs_calling_regions.hg38.interval_list *SEE NOTE BELOW | |
| wgs_coverage_interval_list | wgs_coverage_regions.hg38.interval_list | ||
| wgs_evaluation_interval_list | wgs_evaluation_regions.hg38.interval_list | ||
| Small Test Input Datasets |
flowcell_unmapped_bams | H06HDADXX130110.1.ATCACGAT.20k_reads.bam | |
| H06HDADXX130110.2.ATCACGAT.20k_reads.bam | |||
| H06JUADXX130110.1.ATCACGAT.20k_reads.bam | |||
NOTE: The Exome Interval file whole_exome_illumina_coding_v1.Homo_sapiens_assembly38.targets.interval_list is hosted at https://console.cloud.google.com/storage/browser/gatk-test-data/intervals/.
For on-prem, the workflow uses non-dockerized tools:
GATK Version can be download from here: https://github.com/broadinstitute/gatk/releases
SAMTools can be downloaded from here: http://www.htslib.org/download/
Picard tool can be downloaded here: https://broadinstitute.github.io/picard/ \