Skip to content

qProfiler2

Christina.xu edited this page Jun 1, 2020 · 24 revisions

Introduction

qprofiler2 is a standalone Java application that produces summary metrics for common file types used in next-generation sequencing. It can process BAM, FASTQ, VCF files and the output in all cases is an XML file containing basic summary statistics. It is a newer version of qprofiler but with many more features including the VCF mode and a vastly expanded BAM mode.

Requirements

  • Java 1.8
  • Multi-core machine (ideally) and 10GB of RAM

Building qprofiler2

#first clone the adamajava repository using "git clone"
git clone https://github.com/AdamaJava/adamajava

#Then move into the adamajava folder
cd adamajava

#Run gradle to build qprofiler2 and its dependent jar files
./gradlew :qprofiler2:build

This creates the qprofiler2 jar file along with dependent jars in the qprofiler2/build/flat folder

Running qprofiler2

java -jar qprofiler2.jar -h
usage: qprofiler2 [option...] --log logfile --loglevel INFO --output outputfile --input inputfile1 --input inputfile2 ... [-ntP 4 -ntC 16] 

Option                  Description                           
-------------------------------------------------                          
--format                VCF mode only; group VCF records according to user specified format fields
--fullBamHeader         Output whole BAM header in XML report; default is to only output HD and SQ lines                       
--help                  Shows this help message.            
--index                 File containing index data relating to --input file               
--input                 File containing data to be profiled (currently limited to BAM/SAM,FASTQ, VCF).     
--log                   Log output file.       
--loglevel <LEVEL>      Logging level, e.g. INFO, DEBUG. Default=INFO.
--maxRecords <Integer>  Only process the first <Integer> records in the BAM file.                       
--ntConsumer <Integer>  count of Consumer threads created to process the input file (BAM files only).                    
--ntProducer <Integer>  count of Producer threads created to write the output file. Default=1.
--output                XML report file output by qprofiler2. Default=qprofiler2.xml.
--validation            How strict to be when reading a SAM or BAM. Possible values: {STRICT, LENIENT, SILENT}.                    
--version               Print version info.    

NOTE:

  • BAM files mapped by BWA may need to be run with the optional parameter --validation SILENT otherwise Picard will throw an exception.
  • If --output is not specified, output will be written to a default file (qprofiler2.xml) in the current directory.
  • If --ntConsumer and --ntProducer are not specified, qprofiler2 will run in single-threaded mode.
  • When running multi-threaded, we suggest more consumer than producer threads with a recommended ratio of 6:1. However, it is up to your machine system. for example, only one thread to read the input file but 12 threads are specified to process reads.
java -jar qprofiler2.jar -ntC 12 --input  $somedir/$bam --output $somedir/${bam}.qp2.xml --log $somedir/${bam}.qp2.log

Please specify the BAM index file if multiple producer threads are going to be used, e.g.:

java -jar qprofiler2.jar -ntC 12 -ntP 2 --index  $somedir/${bam}.bai --input  $somedir/$bam --output $somedir/${bam}.qp2.xml --log $somedir/${bam}.qp2.log  

Output

Xml Validation

qprofiler2 provide a schema file which help you to validate the xml output. This xsd file is published on github repository: https://github.com/AdamaJava/adamajava/blob/master/qprofiler2/src/org/qcmg/qprofiler2/qprofiler2.xsd

xmllint --noout --schema ~/PATH/Schema.xsd file.xml  
or
java -jar xsd11-validator.jar -sf my.xsd -if my.xml   

Xmllint does not validate xsd 1.1. But you can try https://www.dropbox.com/s/939jv39ihnluem0/xsd11-validator.jar

BAM Mode

Below is a screan shot of output XML file in a BAM mode

<qProfiler finishTime="2019-05-22 16:31:47" operatingSystem="Linux" startTime="2019-05-22 12:16:12" user="me" validationSchema="qprofiler_2_0.xsd" version="2.0 (b3a23f83)">
 <bamReport file="/myDir/tumour.normal.2rg.bam" finishTime="2019-05-22 16:31:41"  md5sum="D98A32C19DF282228E7BC61DC8543FEC" startTime="2019-05-22 12:16:13" uuid="babbb684-38bd-43d5-ba24-281b38d5e662">
   <bamHeader>
     <headerRecords TAG="HD" description="The header line">...</headerRecords>
     <headerRecords TAG="SQ" description="Reference sequence dictionary">...</headerRecords>
   </bamHeader>
   <bamSummary>
     <readGroups>
        <readGroup name="8e523d07-e989-4fdc-900a-d5b9e857bbf7">...</readGroup>
        <readGroup name="87b8c254-7fa2-43d5-9463-f07c13378502">...</readGroup>
        <readGroup name="b7e7c4c1-3a2e-46a7-9377-691c016517b6">...</readGroup>
     </readGroups>
     <sequenceMetrics name="Overall">...</sequenceMetrics>
     <sequenceMetrics name="OverallBaseLost">...</sequenceMetrics>
   </bamSummary>
   <bamMetrics>
     <QNAME>...</QNAME>
     <FLAG>...</FLAG>
     <RNAME>...</RNAME>
     <POS>...</POS>
     <MAPQ>...</MAPQ>
     <CIGAR>...</CIGAR>
     <TLEN>...</TLEN>
     <SEQ>...</SEQ>
     <QUAL>...</QUAL>
     <TAG>...</TAG>
   </bamMetrics>
  </bamReport>
</qProfiler>
Clone this wiki locally