Skip to content

Benchmarking

Moore, Ben edited this page Aug 14, 2017 · 4 revisions

Platinum Genomes (PG) truthsets can be used, for example, to benchmark the quality of sequencing runs or the performance of variant calling algorithms. Tools such as hap.py are able to compare haplotypes represented by records in a VCF and so allow a fair comparison of a query (i.e. a new sequencing run of the NA12878 or NA12877 cell lines) against the PG gold standard truthsets.

A simple hap.py call might be:

hap.py -r /path/to/reference.fasta -o output -f confident_regions.bed.gz \
  truth.vcf.gz query.vcf.gz

Files will be created with the prefix output that will contain lots of summary statistics, the most important being precision (the proportion of your query which is correct; TP/(TP+FP)) and recall (also know as sensitivity, the proportion of the truthset that is matched by records in the query; TP/(TP+FN)). In this example, output.summary.csv would contain a high-level overview of precision and recall at ALL (unfiltered query VCF) and PASS (all filters applied).

Resources

Clone this wiki locally