-
Notifications
You must be signed in to change notification settings - Fork 9
Hybrid truthset
Moore, Ben edited this page Aug 14, 2017
·
1 revision
Platinum Genomes v2017.1 introduces new hybrid truthsets which combine the strengths of the Platinum Genomes (PG) validation by inheritance with the diversity of the Genome in a Bottle (GiaB) high confidence calls in order to generate a more comprehensive characterisation of the NA12878/hg001 sample.
Input truthsets were:
- Platinum Genomes v2017.1 NA12878
- Genome in a Bottle v3.3.2 hg001
In brief, the method for building this truthset was:
- Find records which are exclusive to GiaB using
hap.py
and RTGtoolsvcfeval
- Merge GiaB-exclusive records with the PG v2017.1 NA12878 VCF
- Use a modified version of k-mer validation to look for exact k-mer support in the lower CEPH 1463 pedigree (11 children) which are expect to have inherited the respective haplotype
- For the small number of unphased GiaB records, try all possible haplotypes and use inherited k-mer counts to phased the NA12878 record
- Confidence blocks spanning validated truth variants are added to the PG 2017.1 confidence regions
This process was performed independently for hg19/GRCh37 and hg38 truthsets. The resulting files contain more comprehensive coverage of variation than either truthset independently, with hg19 versions containing over 80,000 more indels than either input truthset (bringing the total to over 600,000).
For more information or to cite Platinum Genomes, see:
- Eberle, MA et al. (2017) A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Research, 27:157-164. doi:10.1101/gr.210500.116