-
Notifications
You must be signed in to change notification settings - Fork 9
Validating variants
Benjamin L. Moore edited this page Apr 11, 2018
·
4 revisions
Full method details of how variants are validated are given in the PG manuscript.
As a simplified overview, the steps in building the PG truthset are:
- Sequence all pedigree individuals (originally this was done using an Illumina HiSeq2000 instrument with a TruSeq PCR-Free sample prep, to a depth of around 50x per individual).
- Use highest confidence SNVs to infer inheritance of the four parental haplotypes through the lower pedigree. Pedigree haplotyping was done using the MERLIN software, with manual curation to reduce the number of crossovers.
- Process sequencing data with a variety of different pipelines, each tuned for sensitivity. These pipelines included BWA + GATK HaplotypeCaller, Illumina ISAS (isaac aligner + Strelka variant caller), BWA + FreeBayes and BWA + Platypus.
- Use known haplotype inheritance to filter candidates which violate Mendelian inheritance.
- Apply k-mer filtering to screen artefacts and errors. This step builds 51-mers representing local expected haplotype sequence and looks for exact matches in the aligned reads of individuals who have inherited each variant.
- Use homozygous reference calls over all individuals to generate a confident regions track.
For more information or to cite Platinum Genomes, see:
- Eberle, MA et al. (2017) A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Research, 27:157-164. doi:10.1101/gr.210500.116