PhaseME is a tool set to assess the quality of the per read phasing information and help to reduce the errors during this process.
1- You require your VCF file to be read based phased, which can be generated by e.g. WhatsHap.
2- Run PhaseME using Python3 on Linux to obtain stats and improve the quality of phase blocks. The only requirement is Numpy.
tar -xzf precomputed/pairlist.tar.gz
python phaseme.py improver my.vcf output_prefix
If you only want to have the quality assessment report use quality instead of improver.
Please try our sample data to establish the correctness of the pipeline installation. This can be found in the folder example.
python phaseme.py improver example/my.vcf example/out
The output will be a quality assessment report example/out/quality.csv as well as an improved version of the input phased VCF example/out/improved.vcf.
In quick start section the precomputed linkage information is used. Here, individual-specific linkage information is considered. For grasping the full advantage of PhaseME, few steps are needed prior using PhaseME.
1- You need to download 1000 Genomes reference panel haplotypes.
wget https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.tgz
Warning: These are more than 10Gb.
2- You need to download the Shapeit
3- Now, run PhasME as following.
python phaseme.py improver my.vcf output_prefix /path/to/shapeit /path/to/1000G/dataset
PhaseME can also assess and improve the phasings results using parental data instead of linkage information. The user should prepare a three-sample VCF including son, mother and father SNV in this order. This can be done using e.g. bcftools merge. Prior to that you may need bgzip, tabix and bcftools index on all three samples.
To obtain quality insights:
python phaseme.py quality example/trio.vcf example/out_trio_q trio
Once you want to improve phasing results:
python phaseme.py improver example/trio.vcf example/out_trio trio
For using PhaseME in MAC computer please check the folder mac.
Please see and cite our manuscript: "PhaseME: automatic assessment of phasing quality and phasing improvement", GigaSceince, 2020.
PhaseME has been registered in BioTools.