-
Notifications
You must be signed in to change notification settings - Fork 9
Benchmarking
Platinum Genomes (PG) truthsets can be used, for example, to benchmark the quality of sequencing runs or the performance of variant calling algorithms. Tools such as hap.py are able to compare haplotypes represented by records in a VCF and so allow a fair comparison of a query (i.e. a new sequencing run of the NA12878 or NA12877 cell lines) against the PG gold standard truthsets.
A simple hap.py call might be:
hap.py -r /path/to/reference.fasta -o output -f confident_regions.bed.gz \
truth.vcf.gz query.vcf.gz
Files will be created with the prefix output
that will contain lots of summary statistics, the most important being precision (the proportion of your query which is correct; TP/(TP+FP)
) and recall (also know as sensitivity, the proportion of the
truthset that is matched by records in the query; TP/(TP+FN)
). In this example, output.summary.csv
would contain a
high-level overview of precision and recall at ALL (unfiltered query VCF) and PASS (all filters applied).
For more information or to cite Platinum Genomes, see:
- Eberle, MA et al. (2017) A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Research, 27:157-164. doi:10.1101/gr.210500.116