Skip to content

Commit

Permalink
Document advanced features
Browse files Browse the repository at this point in the history
  • Loading branch information
dpryan79 committed Jul 13, 2016
1 parent dce8341 commit 5fa189d
Show file tree
Hide file tree
Showing 17 changed files with 83 additions and 16 deletions.
1 change: 1 addition & 0 deletions deeptools/heatmapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ def chopRegionsFromMiddle(exonsInput, left=0, right=0):
the center point of the exons.
The steps are as follow:
1) Find the center point of the set of exons (e.g., [(0, 200), (300, 400), (800, 900)] would be centered at 200)
* If a given exon spans the center point then the exon is split
2) The given number of bases at the end of the left-of-center list are extracted
Expand Down
10 changes: 10 additions & 0 deletions docs/content/advanced_features.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Advanced features
=================

Some of the features of deepTools are not self-explanatory. Below, we provide links to longer expositions on these more advanced features:

* :doc:`feature/blacklist`
* :doc:`feature/metagene`
* :doc:`feature/read_extension`
* :doc:`feature/unscaled_regions`
* :doc:`feature/read_offsets`
12 changes: 12 additions & 0 deletions docs/content/feature/blacklist.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Blacklist Regions
=================

There are many sources of bias in ChIPseq experiments. Among the most prevalent of these is signal arising from "blacklist" regions (see `Carroll et al. <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3989762/>`__ and the references therein for historical context). Blacklisted regions show notably enriched signal across many ChIP experiment types (e.g., regardless of what is being IPed or the experimental conditions). Including these regions can lead not only to false-positive peaks, but can also throw off between-sample normalization. An example of this is found below:

.. image:: ../../images/feature-blacklist0.png

The region on chromosome 9 starting around position 3 million marks the start of an annotated satellite repeat. As this region contains vastly more reads than expected, slight differences in enrichment here between samples can cause errors in between-sample scaling, thereby masking signal in non-repetitive regions. This can be seen in the IGV screenshot below, where the blacklisted region is just off the side of the screen.

.. image:: ../../images/feature-blacklist1.png

Note that the signal outside of the blacklisted region is slightly depressed due to the blacklisted region. Using the `--blackListFileName` option available throughout deepTools. The subtraction of these regions is accounted for in all normalizations.
12 changes: 12 additions & 0 deletions docs/content/feature/metagene.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Metagene analyses
=================

By default, `computeMatrix` uses the signal over entire contiguous regions (e.g., transcripts) for computing its output. While this is typically quite useful, in case such as RNAseq the results are less than ideal. Take, for example, the gene model and coverage profile below:

.. image:: ../../images/feature-metagene0.png

If clustering were done using such blocky coverage then the results would be biased by the number of exons and their positions. Instead, it's normally desired to ignore intronic regions and instead use only the signal in exons (denoted by blocks in the gene model). This can be accomlished by using the `--metagene` option in `computeMatrix` and supplying a BED12 or GTF file as a set of regions:

.. image:: ../../images/feature-metagene1.png

Note that for GTF files the regions used to define exons can be easily modified. For example, for RiboSeq samples it's preferable to use annotated coding regions, so specifying `--exonID CDS`. Likewise, entire genes can be used rather than transcripts by specifying `--transcriptID gene --transcript_id_designator gene_id`.
30 changes: 30 additions & 0 deletions docs/content/feature/read_extension.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
Read extension
==============

In the majority of NGS experiment, DNA (or RNA) is fragmented into small stretches and only the ends of these fragments sequenced. For many applications, it's desirable to quantify coverage of the entire original fragments over the genome. Consequently, there is an `--extendReads` option present throughout deepTools. This works as follows:

Paired-end reads
----------------

1. Regions of the genome are sampled to determine the median fragment/read length.
2. The genome is subdivided into disjoint regions. Each of these regions comprises one or more bins of some desired size (specified by `-bs`).
3. For each region, all alignments overlapping it are gathered. In addition, all alignments within 2000 bases are gathered, as 2000 bases is the maximum allowed fragment size.
4. The resulting collection of alignments are all extended according to their fragment length, which for paired-end reads is indicated in BAM files.

- For singletons, the expected fragment length from step 1 is used.

5. For each of the extended reads, the count in each bin that it overlaps is incremented.

Single-end reads
----------------

1. An extension length, L, is specified.
2. The genome is subdivided into disjoint regions. Each of these regions comprises one or more bins of some desired size (specified by `-bs`).
3. For each region, all alignments overlapping it are gathered. In addition, all alignments within 2000 bases are gathered, as 2000 bases is the maximum allowed fragment size.
4. The resulting collection of alignments are all extended to length L.
5. For each of the extended reads, the count in each bin that it overlaps is incremented.

Blacklisted regions
-------------------

The question likely arises as to how alignments originating inside of blacklisted regions are handled. In short, any alignment contained completely within a blacklisted region is ignored, regardless of whether it would extend into a non-blacklisted region or not. Alignments only partially overlapping blacklisted regions are treated as normal, as are pairs of reads that span over a blacklisted region. This is primarily for the sake of performance, as otherwise each extended read would need to be checked to see if it overlaps a blacklisted region.
8 changes: 8 additions & 0 deletions docs/content/feature/read_offsets.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Offsetting signal to a given position
=====================================

A growing number of experiment types need to be analyzed by focusing the signal from each alignment at a single point. As an example, RiboSeq alignments tend to be offset such that the signal pause is centered around the translation start site, an offset of around 12. Alternatively, in GROseq experiments, the pause around the TSS becomes centered by using the 1st base of each read. This can be accomplished within `bamCoverage` using the `--Offset` option. A visual example is below:

.. image:: ../../images/feature-offset0.png

The alignments shown above overlap a transcript, denoted as a blue box, which in this case represents only the coding sequence. If the alignments are from a RiboSeq experiment then the signal from each alignment should be set at the ~12th base of each alignment. The section on the right denotes the resulting signal intensity, with the expected large peak at the translation start site.
7 changes: 7 additions & 0 deletions docs/content/feature/unscaled_regions.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Unscaled regions
================

Some experiments aim to quantify the distribution of pausing of factors, such as PolII, throughout gene or transcript bodies. PolII and many other factors, show pausing (i.e., accumulation of signal) near the start/end of transcripts. As scaling is normally performed to make all regions the same length, the breadth of the paused region could be scaled differently in each transcript. This would, in turn, cause biases during clustering or other analyses. In such cases, the `--unscaled5prime` and `--unscaled3prime` options in `computeMatrix` can be used. These will prevent regions at one or both end of transcripts (or other regions) to not be excluded from scaling, thereby allowing raw signal profiles to be compared across transcripts. An example of this from `Ferrari et al. 2013 <http://www.sciencedirect.com/science/article/pii/S2211124713005603>`__ is shown below:

.. image:: ../../images/feature-unscaled0.png

6 changes: 1 addition & 5 deletions docs/content/list_of_tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -100,11 +100,7 @@ We offer several ways to filter those BAM files on the fly so that you don't nee

These parameters are optional and available throughout deepTools.

.. note:: In version 2.3 we introduced a sampling method to correct the effect of filtering when normalizing using
``bamCoverage`` or ``bamCompare``. For previous versions, if you know that your files will be strongly affected by
the filtering of duplicates or reads of low quality then consider removing
those reads *before* using ``bamCoverage`` or ``bamCompare``, as the filtering
by deepTools is done *after* the scaling factors are calculated!
.. note:: In version 2.3 we introduced a sampling method to correct the effect of filtering when normalizing using ``bamCoverage`` or ``bamCompare``. For previous versions, if you know that your files will be strongly affected by the filtering of duplicates or reads of low quality then consider removing those reads *before* using ``bamCoverage`` or ``bamCompare``, as the filtering by deepTools is done *after* the scaling factors are calculated!


Tools for BAM and bigWig file processing
Expand Down
3 changes: 1 addition & 2 deletions docs/content/tools/plotHeatmap.rst
Original file line number Diff line number Diff line change
Expand Up @@ -118,5 +118,4 @@ we combine different colormap colors, different scales and the new `--boxAround
.. image:: ../../images/test_plots/ExampleHeatmap4.png

.. tip:: **More examples** can be found in our
`Gallery <http://deeptools.readthedocs.org/en/latest/content/example_gallery.html#normalized-chip-seq-signals-and-peak-regions>`_.
.. tip:: **More examples** can be found in our `Gallery <http://deeptools.readthedocs.org/en/latest/content/example_gallery.html#normalized-chip-seq-signals-and-peak-regions>`_.
Binary file added docs/images/feature-blacklist0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/feature-blacklist1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/feature-metagene0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/feature-metagene1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/feature-offset0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/feature-unscaled0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ Contents:

content/installation
content/list_of_tools
content/advanced_features
content/example_usage
content/changelog
content/help_galaxy_intro
Expand Down
9 changes: 0 additions & 9 deletions docs/source/deeptools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -100,15 +100,6 @@ deeptools.mapReduce module
:undoc-members:
:show-inheritance:


deeptools.readBed module
------------------------

.. automodule:: deeptools.readBed
:members:
:undoc-members:
:show-inheritance:

deeptools.utilities module
--------------------------

Expand Down

0 comments on commit 5fa189d

Please sign in to comment.