Benchmarking

Platinum Genomes (PG) truthsets can be used, for example, to benchmark the quality of sequencing runs or the performance of variant calling algorithms. Tools such as hap.py are able to compare haplotypes represented by records in a VCF and so allow a fair comparison of a query (i.e. a new sequencing run of the NA12878 or NA12877 cell lines) against the PG gold standard truthsets.

A simple hap.py call might be:

hap.py -r /path/to/reference.fasta -o output -f confident_regions.bed.gz \
  truth.vcf.gz query.vcf.gz

Files will be created with the prefix output that will contain lots of summary statistics, the most important being precision (the proportion of your query which is correct; TP/(TP+FP)) and recall (also know as sensitivity, the proportion of the truthset that is matched by records in the query; TP/(TP+FN)). In this example, output.summary.csv would contain a high-level overview of precision and recall at ALL (unfiltered query VCF) and PASS (all filters applied).

Resources

For more information or to cite Platinum Genomes, see:

Eberle, MA et al. (2017) A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Research, 27:157-164. doi:10.1101/gr.210500.116

Releases

Methods

Usage

Benchmarking

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking

Resources

Releases

Methods

Usage

Clone this wiki locally