Skip to content

Hybrid truthset

Moore, Ben edited this page Aug 14, 2017 · 1 revision

Platinum Genomes v2017.1 introduces new hybrid truthsets which combine the strengths of the Platinum Genomes (PG) validation by inheritance with the diversity of the Genome in a Bottle (GiaB) high confidence calls in order to generate a more comprehensive characterisation of the NA12878/hg001 sample.

Inputs

Input truthsets were:

  • Platinum Genomes v2017.1 NA12878
  • Genome in a Bottle v3.3.2 hg001

Method

In brief, the method for building this truthset was:

  1. Find records which are exclusive to GiaB using hap.py and RTGtools vcfeval
  2. Merge GiaB-exclusive records with the PG v2017.1 NA12878 VCF
  3. Use a modified version of k-mer validation to look for exact k-mer support in the lower CEPH 1463 pedigree (11 children) which are expect to have inherited the respective haplotype
  4. For the small number of unphased GiaB records, try all possible haplotypes and use inherited k-mer counts to phased the NA12878 record
  5. Confidence blocks spanning validated truth variants are added to the PG 2017.1 confidence regions

This process was performed independently for hg19/GRCh37 and hg38 truthsets. The resulting files contain more comprehensive coverage of variation than either truthset independently, with hg19 versions containing over 80,000 more indels than either input truthset (bringing the total to over 600,000).

Clone this wiki locally