Skip to content

Validating variants

Benjamin L. Moore edited this page Apr 11, 2018 · 4 revisions

Full method details of how variants are validated are given in the PG manuscript.

As a simplified overview, the steps in building the PG truthset are:

  1. Sequence all pedigree individuals (originally this was done using an Illumina HiSeq2000 instrument with a TruSeq PCR-Free sample prep, to a depth of around 50x per individual).
  2. Use highest confidence SNVs to infer inheritance of the four parental haplotypes through the lower pedigree. Pedigree haplotyping was done using the MERLIN software, with manual curation to reduce the number of crossovers.
  3. Process sequencing data with a variety of different pipelines, each tuned for sensitivity. These pipelines included BWA + GATK HaplotypeCaller, Illumina ISAS (isaac aligner + Strelka variant caller), BWA + FreeBayes and BWA + Platypus.
  4. Use known haplotype inheritance to filter candidates which violate Mendelian inheritance.
  5. Apply k-mer filtering to screen artefacts and errors. This step builds 51-mers representing local expected haplotype sequence and looks for exact matches in the aligned reads of individuals who have inherited each variant.
  6. Use homozygous reference calls over all individuals to generate a confident regions track.
Clone this wiki locally