Skip to content

Commit

Permalink
Merge pull request #317 from DEploid-dev/doc
Browse files Browse the repository at this point in the history
Doc
  • Loading branch information
shajoezhu authored Nov 18, 2019
2 parents 31ccd50 + 3aebb03 commit c15dfb9
Show file tree
Hide file tree
Showing 11 changed files with 203 additions and 158 deletions.
2 changes: 1 addition & 1 deletion DEploid.wiki
3 changes: 1 addition & 2 deletions Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ EXTRA_DIST = bootstrap \
COMPILEDATE = $(shell date -u | sed -e "s/ /-/g")
distdir = $(PACKAGE)-$(VERSION)

bin_PROGRAMS = dEploid dEploid_dbg utilities
bin_PROGRAMS = dEploid dEploid_dbg

man1_MANS = docs/_build/man/dEploid.1

Expand Down Expand Up @@ -90,4 +90,3 @@ clean-local-check:
utilities: utilities/dataExplore.r utilities/interpretDEploid.r
sed -i'.bak' -e '/#!\/usr\/bin\/env Rscript/d' -e '/rm(list=ls())/d' utilities/dataExplore.r ; echo "#!/usr/bin/env Rscript" > tmpTxt; echo "rm(list=ls()); dEploidRootDir=\"$(PWD)\"" >> tmpTxt ; cat utilities/dataExplore.r >> tmpTxt ; mv tmpTxt utilities/dataExplore.r; chmod a+x utilities/dataExplore.r;
sed -i'.bak' -e '/#!\/usr\/bin\/env Rscript/d' -e '/rm(list=ls())/d' utilities/interpretDEploid.r; echo "#!/usr/bin/env Rscript" > tmpTxt; echo "rm(list=ls()); dEploidRootDir=\"$(PWD)\"" >> tmpTxt ; cat utilities/interpretDEploid.r >> tmpTxt ; mv tmpTxt utilities/interpretDEploid.r; chmod a+x utilities/interpretDEploid.r

4 changes: 3 additions & 1 deletion docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,4 +126,6 @@ utilities/interpretDEploid.r -vcf data/exampleData/PG0400-C.eg.vcf.gz \
Benchmark
---------

Please refer to our paper [Zhu et.al (2017)](https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btx530/4091117) section [3 Validation and performance](https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btx530/4091117#96977811).
Please refer to our paper [Zhu et.al (2017)](https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btx530/4091117) section [3 Validation and performance](https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btx530/4091117#96977811) for benchmarking inference results on number of strains, proportions and haplotype quality.

For the enhanced version -- DEploid-IBD, we compared our results against [Zhu et.al (2017)](https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btx530/4091117), and conducted more experiments and validations [Zhu et.al (2019)](https://elifesciences.org/articles/40845#s2).
108 changes: 108 additions & 0 deletions docs/Output.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
Making sense of the output
==========================


Output files
------------

``dEploid`` outputs text files with user-specified prefix with flag **-o**.

***prefix***<a></a>**.log**

Log file records ``dEploid`` version, input file paths, parameter used and proportion estimates at the final iteration.

***prefix***<a></a>**.llk**

Log likelihood of the MCMC chain.

***prefix***<a></a>**.prop**

MCMC updates of the proportion estimates.

***prefix***<a></a>**.hap**

Haplotypes at the final iteration in plain text file.

***prefix***<a></a>**.vcf**

When flag ``-vcfOut`` is turned on, haplotypes are saved at the final iteration in VCF format.

***prefix***<a></a>**.single[i]**

When flag ``-exportPostProb`` is turned on, posterior probabilities of the final iteration of strain [i].

### DEploid-IBD

When "flag" ``-ibd`` is used. 'DEploid' executes first learns the number of strain and their proportions with an identity by descent model ('DEploid-IBD'). Then it fixes the number of strains and proportions and train the haplotypes, and train the haplotypes using the original DEploid algorithm ('DEploid-classic'). The staged output are labelled with ".ibd" and ".classic" respectively, and followed by the prefix.


### DEploid-BEST

When "flag" ``-best`` is used. 'DEploid-BEST' executes the deconvolution algorithms in an optimised sequence to best report the number of strains, proportions and haplotypes. The program ('DEploid-Lasso') learns the number of strain with optimised reference panel; ".chooseK" is appended to the prefix for these output (NOTE: likelihood is not tracked in this case). It ('DEploid-IBD') then fixes the number of strains and tune the strain proportions with an identity by descent model; ".ibd" is appended to the prefix for these output. Finally, the program ('DEploid-Lasso') fixes the number of strains and proportions, and uses the optimised reference panel again to train and report the haplotypes; ".final" is appended to the prefix for these output. When ``-vcfOut`` is applied, this will only be the final haplotypes.


Example of output interpretation
------------------------------

### Example 1. Standard deconvolution output


```bash
$ ./dEploid -vcf data/exampleData/PG0390-C.eg.vcf.gz \
-plaf data/exampleData/labStrains.eg.PLAF.txt \
-noPanel -o PG0390-CNopanel -seed 1
$ utilities/interpretDEploid.r -vcf data/exampleData/PG0390-C.eg.vcf.gz \
-plaf data/exampleData/labStrains.eg.PLAF.txt \
-dEprefix PG0390-CNopanel \
-o PG0390-CNopanel -ring

```

![interpretDEploidFigure.1](_static/PG0390-CNopanel.interpretDEploidFigure.1.png "Output figure 1")



The top three figures are the same as figures show in :ref:`data example <sec-eg>`, with a small addition of inferred WSAF marked in blue, in the top right figure.

- The bottom left figure show the relative proportion change history of the MCMC chain.
- The middle figure show the correlation between the expected and observed allele frequency in sample.
- The right figure shows changes in MCMC likelihood .


![interpretDEploidFigure.2](_static/PG0390-CNopanel.interpretDEploidFigure.2.png "Output figure 2")


This panel figure shows all allele frequencies within sample across all 14 chromosomes. Expected and observed WSAF are marked in blue and red respectively.


### Example 2. Haplotype painting from a given panel


``dEploid`` can take its output haplotypes, and calculate the posterior probability of each deconvoluted strain with the reference panel. In this example, the reference panel includes four lab strains: 3D7 (red), Dd2 (dark orange), HB3 (orange) and 7G8 (yellow).

```bash
$ ./dEploid -vcf data/exampleData/PG0390-C.eg.vcf.gz \
-plaf data/exampleData/labStrains.eg.PLAF.txt \
-panel data/exampleData/labStrains.eg.panel.txt \
-o PG0390-CPanel -seed 1 -k 3
$ ./dEploid -vcf data/exampleData/PG0390-C.eg.vcf.gz \
-plaf data/exampleData/labStrains.eg.PLAF.txt \
-panel data/exampleData/labStrains.eg.panel.txt \
-o PG0390-CPanel \
-painting PG0390-CPanel.hap \
-initialP 0.8 0 0.2 -k 3
$ utilities/interpretDEploid.r -vcf data/exampleData/PG0390-C.eg.vcf.gz \
-plaf data/exampleData/labStrains.eg.PLAF.txt \
-dEprefix PG0390-CPanel \
-o PG0390-CPanel -ring

```

![PG0390fwdBwdRing](_static/PG0390-CPanel.ring.png "PG0390-CPanel.ring.png")


### Example 3. Deconvolution followed by IBD painting

In addition to lab mixed samples, here we show example of ``dEploid`` deconvolute field sample PD0577-C.

![PD0577inbreeding](_static/PD0577-CPanel.IBD.ring.png "PD0577-CPanel.IBD.ring.png")
104 changes: 64 additions & 40 deletions docs/_build/man/dEploid.1
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.\" Man page generated from reStructuredText.
.
.TH "DEPLOID" "1" "Oct 22, 2018" "v0.6-beta" "DEploid"
.TH "DEPLOID" "1" "Nov 16, 2019" "v0.6-beta" "DEploid"
.SH NAME
dEploid \-
.
Expand Down Expand Up @@ -559,48 +559,62 @@ Figure on the right show allele frequency within sample, compare against the pop
.SS Output files
.sp
\fBdEploid\fP outputs text files with user\-specified prefix with flag \fB\-o\fP\&.
.INDENT 0.0
.TP
.B \fIprefix\fP\&.log
.sp
\fB\fIprefix\fP\fP\fB\&.log\fP
.sp
Log file records \fBdEploid\fP version, input file paths, parameter used and proportion estimates at the final iteration.
.TP
.B \fIprefix\fP\&.llk
.sp
\fB\fIprefix\fP\fP\fB\&.llk\fP
.sp
Log likelihood of the MCMC chain.
.TP
.B \fIprefix\fP\&.prop
.sp
\fB\fIprefix\fP\fP\fB\&.prop\fP
.sp
MCMC updates of the proportion estimates.
.TP
.B \fIprefix\fP\&.hap
.sp
\fB\fIprefix\fP\fP\fB\&.hap\fP
.sp
Haplotypes at the final iteration in plain text file.
.TP
.B \fIprefix\fP\&.vcf
.sp
\fB\fIprefix\fP\fP\fB\&.vcf\fP
.sp
When flag \fB\-vcfOut\fP is turned on, haplotypes are saved at the final iteration in VCF format.
.TP
.B \fIprefix\fP\&.single[i]
When flag \fB\-exportPostProb\fP is turned on, posterior probabilities of the final iteration of strain [i].
.UNINDENT
.SS Example of output interpretion
.sp
\fB\fIprefix\fP\fP\fB\&.single[i]\fP
.sp
When flag \fB\-exportPostProb\fP is turned on, posterior probabilities of the final iteration of strain [i]\&.
.SS DEploid\-IBD
.sp
When "flag" \fB\-ibd\fP is used. \(aqDEploid\(aq executes first learns the number of strain and their proportions with an identity by descent model (\(aqDEploid\-IBD\(aq). Then it fixes the number of strains and proportions and train the haplotypes, and train the haplotypes using the original DEploid algorithm (\(aqDEploid\-classic\(aq). The staged output are labelled with ".ibd" and ".classic" respectively, and followed by the prefix.
.SS DEploid\-BEST
.sp
When "flag" \fB\-best\fP is used. \(aqDEploid\-BEST\(aq executes the deconvolution algorithms in an optimised sequence to best report the number of strains, proportions and haplotypes. The program (\(aqDEploid\-Lasso\(aq) learns the number of strain with optimised reference panel; ".chooseK" is appended to the prefix for these output (NOTE: likelihood is not tracked in this case). It (\(aqDEploid\-IBD\(aq) then fixes the number of strains and tune the strain proportions with an identity by descent model; ".ibd" is appended to the prefix for these output. Finally, the program (\(aqDEploid\-Lasso\(aq) fixes the number of strains and proportions, and uses the optimised reference panel again to train and report the haplotypes; ".final" is appended to the prefix for these output. When \fB\-vcfOut\fP and \fB\-exportPostProb\fP are applied, these outputs will only associate with the final haplotypes.
.SS Example of output interpretation
.SS Example 1. Standard deconvolution output
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ ./dEploid \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e
\-plaf data/exampleData/labStrains.eg.PLAF.txt \e
\-noPanel \-o PG0390\-CNopanel \-seed 1
$ utilities/interpretDEploid.r \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e
\-plaf data/exampleData/labStrains.eg.PLAF.txt \e
\-dEprefix PG0390\-CNopanel \e
\-o PG0390\-CNopanel \-ring
$ ./dEploid \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e
\-plaf data/exampleData/labStrains.eg.PLAF.txt \e
\-noPanel \-o PG0390\-CNopanel \-seed 1
$ utilities/interpretDEploid.r \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e
\-plaf data/exampleData/labStrains.eg.PLAF.txt \e
\-dEprefix PG0390\-CNopanel \e
\-o PG0390\-CNopanel \-ring


.ft P
.fi
.UNINDENT
.UNINDENT
.sp
[image: interpretDEploidFigure.1]
[image]

.sp
The top three figures are the same as figures show in data example, with a small addition of inferred WSAF marked in blue, in the top right figure.
The top three figures are the same as figures show in :ref:\fBdata example <sec\-eg>\fP, with a small addition of inferred WSAF marked in blue, in the top right figure.
.INDENT 0.0
.IP \(bu 2
The bottom left figure show the relative proportion change history of the MCMC chain.
Expand All @@ -609,8 +623,10 @@ The middle figure show the correlation between the expected and observed allele
.IP \(bu 2
The right figure shows changes in MCMC likelihood .
.UNINDENT
.sp
[image: interpretDEploidFigure.2]
[image]

.sp
This panel figure shows all allele frequencies within sample across all 14 chromosomes. Expected and observed WSAF are marked in blue and red respectively.
.SS Example 2. Haplotype painting from a given panel
Expand All @@ -621,31 +637,37 @@ This panel figure shows all allele frequencies within sample across all 14 chrom
.sp
.nf
.ft C
$ ./dEploid \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e
\-plaf data/exampleData/labStrains.eg.PLAF.txt \e
\-panel data/exampleData/labStrains.eg.panel.txt \e
\-o PG0390\-CPanel \-seed 1 \-k 3
$ ./dEploid \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e
\-plaf data/exampleData/labStrains.eg.PLAF.txt \e
\-panel data/exampleData/labStrains.eg.panel.txt \e
\-o PG0390\-CPanel \e
\-painting PG0390\-CPanel.hap \e
\-initialP 0.8 0 0.2 \-k 3
$ utilities/interpretDEploid.r \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e
\-plaf data/exampleData/labStrains.eg.PLAF.txt \e
\-dEprefix PG0390\-CPanel \e
\-o PG0390\-CPanel \-ring
$ ./dEploid \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e
\-plaf data/exampleData/labStrains.eg.PLAF.txt \e
\-panel data/exampleData/labStrains.eg.panel.txt \e
\-o PG0390\-CPanel \-seed 1 \-k 3
$ ./dEploid \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e
\-plaf data/exampleData/labStrains.eg.PLAF.txt \e
\-panel data/exampleData/labStrains.eg.panel.txt \e
\-o PG0390\-CPanel \e
\-painting PG0390\-CPanel.hap \e
\-initialP 0.8 0 0.2 \-k 3
$ utilities/interpretDEploid.r \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e
\-plaf data/exampleData/labStrains.eg.PLAF.txt \e
\-dEprefix PG0390\-CPanel \e
\-o PG0390\-CPanel \-ring


.ft P
.fi
.UNINDENT
.UNINDENT
.sp
[image: PG0390fwdBwdRing]
[image]

.SS Example 3. Deconvolution followed by IBD painting
.sp
In addition to lab mixed samples, here we show example of \fBdEploid\fP deconvolute field sample PD0577\-C.
.sp
[image: PD0577inbreeding]
[image]

.SH PF3K WORKFLOW
.sp
Our main work flow consist with three steps:
Expand Down Expand Up @@ -831,7 +853,9 @@ utilities/interpretDEploid.r \-vcf data/exampleData/PG0400\-C.eg.vcf.gz \e

.SS Benchmark
.sp
Please refer to our paper \fI\%Zhu et.al (2017)\fP section \fI\%3 Validation and performance\fP\&.
Please refer to our paper \fI\%Zhu et.al (2017)\fP section \fI\%3 Validation and performance\fP for benchmarking inference results on number of strains, proportions and haplotype quality.
.sp
For the enhanced version \-\- DEploid\-IBD, we compared our results against \fI\%Zhu et.al (2017)\fP, and conducted more experiments and validations \fI\%Zhu et.al (2019)\fP\&.
.SH REPORTING BUGS
.sp
If you encounter any problem when using \fBdEploid\fP, please file a short bug report by using the \fI\%issue tracker\fP
Expand Down
3 changes: 1 addition & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ DEploid
description
installation
input
output
Output
Pf3k-workflow.md
FAQ
Bug-report
Expand All @@ -29,4 +29,3 @@ DEploid
.. * :ref:`genindex`
.. * :ref:`modindex`
.. * :ref:`search`
Loading

0 comments on commit c15dfb9

Please sign in to comment.