-
Notifications
You must be signed in to change notification settings - Fork 5
qProfiler2
qprofiler2
is a standalone Java application that produces summary metrics for common file types used in next-generation sequencing. It can process BAM, FASTQ, VCF files and the output in all cases is an XML file containing basic summary statistics. It is a newer version of qprofiler but with many more features including the VCF mode and a vastly expanded BAM mode.
- Java 1.8
- Multi-core machine (ideally) and 10GB of RAM
#first clone the adamajava repository using "git clone"
git clone https://github.com/AdamaJava/adamajava
#Then move into the adamajava folder
cd adamajava
#Run gradle to build qprofiler2 and its dependent jar files
./gradlew :qprofiler2:build
This creates the qprofiler2 jar file along with dependent jars in the qprofiler2/build/flat
folder
java -jar qprofiler2.jar -h
usage: qprofiler2 [option...] --log logfile --loglevel INFO --output outputfile --input inputfile1 --input inputfile2 ... [-ntP 4 -ntC 16]
Option Description
-------------------------------------------------
--format VCF mode only; group VCF records according to user specified format fields
--fullBamHeader Output whole BAM header in XML report; default is to only output HD and SQ lines
--help Shows this help message.
--index File containing index data relating to --input file
--input File containing data to be profiled (currently limited to BAM/SAM,FASTQ, VCF).
--log Log output file.
--loglevel <LEVEL> Logging level, e.g. INFO, DEBUG. Default=INFO.
--maxRecords <Integer> Only process the first <Integer> records in the BAM file.
--ntConsumer <Integer> count of Consumer threads created to process the input file (BAM files only).
--ntProducer <Integer> count of Producer threads created to write the output file. Default=1.
--output XML report file output by qprofiler2. Default=qprofiler2.xml.
--validation How strict to be when reading a SAM or BAM. Possible values: {STRICT, LENIENT, SILENT}.
--version Print version info.
NOTE:
- BAM files mapped by BWA may need to be run with the optional parameter
--validation SILENT
otherwise Picard will throw an exception. - If
--output
is not specified, output will be written to a default file (qprofiler2.xml) in the current directory. - If
--ntConsumer
and--ntProducer
are not specified, qprofiler2 will run in single-threaded mode. - When running multi-threaded, we suggest more consumer than producer threads with a recommended ratio of 6:1. However, it is up to your machine system. for example, only one thread to read the input file but 12 threads are specified to process reads.
java -jar qprofiler2.jar -ntC 12 --input $somedir/$bam --output $somedir/${bam}.qp2.xml --log $somedir/${bam}.qp2.log
Please specify the BAM index file if multiple producer threads are going to be used, e.g.:
java -jar qprofiler2.jar -ntC 12 -ntP 2 --index $somedir/${bam}.bai --input $somedir/$bam --output $somedir/${bam}.qp2.xml --log $somedir/${bam}.qp2.log
qprofiler2 provide a schema file which help you to validate the xml output. This xsd file is published on github repository: https://github.com/AdamaJava/adamajava/blob/master/qprofiler2/src/org/qcmg/qprofiler2/qprofiler2.xsd
xmllint --noout --schema ~/PATH/Schema.xsd file.xml
or
java -jar xsd11-validator.jar -sf my.xsd -if my.xml
Xmllint does not validate xsd 1.1. But you can try https://www.dropbox.com/s/939jv39ihnluem0/xsd11-validator.jar
Below is a screan shot of output XML file in a BAM mode
<qProfiler finishTime="2019-05-22 16:31:47" operatingSystem="Linux" startTime="2019-05-22 12:16:12" user="me" validationSchema="qprofiler_2_0.xsd" version="2.0 (b3a23f83)">
<bamReport file="/myDir/tumour.normal.2rg.bam" finishTime="2019-05-22 16:31:41" md5sum="D98A32C19DF282228E7BC61DC8543FEC" startTime="2019-05-22 12:16:13" uuid="babbb684-38bd-43d5-ba24-281b38d5e662">
<bamHeader>
<headerRecords TAG="HD" description="The header line">...</headerRecords>
<headerRecords TAG="SQ" description="Reference sequence dictionary">...</headerRecords>
</bamHeader>
<bamSummary>
<readGroups>
<readGroup name="8e523d07-e989-4fdc-900a-d5b9e857bbf7">...</readGroup>
<readGroup name="87b8c254-7fa2-43d5-9463-f07c13378502">...</readGroup>
<readGroup name="b7e7c4c1-3a2e-46a7-9377-691c016517b6">...</readGroup>
</readGroups>
<sequenceMetrics name="Overall">...</sequenceMetrics>
<sequenceMetrics name="OverallBaseLost">...</sequenceMetrics>
</bamSummary>
<bamMetrics>
<QNAME>...</QNAME>
<FLAG>...</FLAG>
<RNAME>...</RNAME>
<POS>...</POS>
<MAPQ>...</MAPQ>
<CIGAR>...</CIGAR>
<TLEN>...</TLEN>
<SEQ>...</SEQ>
<QUAL>...</QUAL>
<TAG>...</TAG>
</bamMetrics>
</bamReport>
</qProfiler>