-
Notifications
You must be signed in to change notification settings - Fork 215
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
17 changed files
with
83 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
Advanced features | ||
================= | ||
|
||
Some of the features of deepTools are not self-explanatory. Below, we provide links to longer expositions on these more advanced features: | ||
|
||
* :doc:`feature/blacklist` | ||
* :doc:`feature/metagene` | ||
* :doc:`feature/read_extension` | ||
* :doc:`feature/unscaled_regions` | ||
* :doc:`feature/read_offsets` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
Blacklist Regions | ||
================= | ||
|
||
There are many sources of bias in ChIPseq experiments. Among the most prevalent of these is signal arising from "blacklist" regions (see `Carroll et al. <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3989762/>`__ and the references therein for historical context). Blacklisted regions show notably enriched signal across many ChIP experiment types (e.g., regardless of what is being IPed or the experimental conditions). Including these regions can lead not only to false-positive peaks, but can also throw off between-sample normalization. An example of this is found below: | ||
|
||
.. image:: ../../images/feature-blacklist0.png | ||
|
||
The region on chromosome 9 starting around position 3 million marks the start of an annotated satellite repeat. As this region contains vastly more reads than expected, slight differences in enrichment here between samples can cause errors in between-sample scaling, thereby masking signal in non-repetitive regions. This can be seen in the IGV screenshot below, where the blacklisted region is just off the side of the screen. | ||
|
||
.. image:: ../../images/feature-blacklist1.png | ||
|
||
Note that the signal outside of the blacklisted region is slightly depressed due to the blacklisted region. Using the `--blackListFileName` option available throughout deepTools. The subtraction of these regions is accounted for in all normalizations. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
Metagene analyses | ||
================= | ||
|
||
By default, `computeMatrix` uses the signal over entire contiguous regions (e.g., transcripts) for computing its output. While this is typically quite useful, in case such as RNAseq the results are less than ideal. Take, for example, the gene model and coverage profile below: | ||
|
||
.. image:: ../../images/feature-metagene0.png | ||
|
||
If clustering were done using such blocky coverage then the results would be biased by the number of exons and their positions. Instead, it's normally desired to ignore intronic regions and instead use only the signal in exons (denoted by blocks in the gene model). This can be accomlished by using the `--metagene` option in `computeMatrix` and supplying a BED12 or GTF file as a set of regions: | ||
|
||
.. image:: ../../images/feature-metagene1.png | ||
|
||
Note that for GTF files the regions used to define exons can be easily modified. For example, for RiboSeq samples it's preferable to use annotated coding regions, so specifying `--exonID CDS`. Likewise, entire genes can be used rather than transcripts by specifying `--transcriptID gene --transcript_id_designator gene_id`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
Read extension | ||
============== | ||
|
||
In the majority of NGS experiment, DNA (or RNA) is fragmented into small stretches and only the ends of these fragments sequenced. For many applications, it's desirable to quantify coverage of the entire original fragments over the genome. Consequently, there is an `--extendReads` option present throughout deepTools. This works as follows: | ||
|
||
Paired-end reads | ||
---------------- | ||
|
||
1. Regions of the genome are sampled to determine the median fragment/read length. | ||
2. The genome is subdivided into disjoint regions. Each of these regions comprises one or more bins of some desired size (specified by `-bs`). | ||
3. For each region, all alignments overlapping it are gathered. In addition, all alignments within 2000 bases are gathered, as 2000 bases is the maximum allowed fragment size. | ||
4. The resulting collection of alignments are all extended according to their fragment length, which for paired-end reads is indicated in BAM files. | ||
|
||
- For singletons, the expected fragment length from step 1 is used. | ||
|
||
5. For each of the extended reads, the count in each bin that it overlaps is incremented. | ||
|
||
Single-end reads | ||
---------------- | ||
|
||
1. An extension length, L, is specified. | ||
2. The genome is subdivided into disjoint regions. Each of these regions comprises one or more bins of some desired size (specified by `-bs`). | ||
3. For each region, all alignments overlapping it are gathered. In addition, all alignments within 2000 bases are gathered, as 2000 bases is the maximum allowed fragment size. | ||
4. The resulting collection of alignments are all extended to length L. | ||
5. For each of the extended reads, the count in each bin that it overlaps is incremented. | ||
|
||
Blacklisted regions | ||
------------------- | ||
|
||
The question likely arises as to how alignments originating inside of blacklisted regions are handled. In short, any alignment contained completely within a blacklisted region is ignored, regardless of whether it would extend into a non-blacklisted region or not. Alignments only partially overlapping blacklisted regions are treated as normal, as are pairs of reads that span over a blacklisted region. This is primarily for the sake of performance, as otherwise each extended read would need to be checked to see if it overlaps a blacklisted region. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
Offsetting signal to a given position | ||
===================================== | ||
|
||
A growing number of experiment types need to be analyzed by focusing the signal from each alignment at a single point. As an example, RiboSeq alignments tend to be offset such that the signal pause is centered around the translation start site, an offset of around 12. Alternatively, in GROseq experiments, the pause around the TSS becomes centered by using the 1st base of each read. This can be accomplished within `bamCoverage` using the `--Offset` option. A visual example is below: | ||
|
||
.. image:: ../../images/feature-offset0.png | ||
|
||
The alignments shown above overlap a transcript, denoted as a blue box, which in this case represents only the coding sequence. If the alignments are from a RiboSeq experiment then the signal from each alignment should be set at the ~12th base of each alignment. The section on the right denotes the resulting signal intensity, with the expected large peak at the translation start site. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
Unscaled regions | ||
================ | ||
|
||
Some experiments aim to quantify the distribution of pausing of factors, such as PolII, throughout gene or transcript bodies. PolII and many other factors, show pausing (i.e., accumulation of signal) near the start/end of transcripts. As scaling is normally performed to make all regions the same length, the breadth of the paused region could be scaled differently in each transcript. This would, in turn, cause biases during clustering or other analyses. In such cases, the `--unscaled5prime` and `--unscaled3prime` options in `computeMatrix` can be used. These will prevent regions at one or both end of transcripts (or other regions) to not be excluded from scaling, thereby allowing raw signal profiles to be compared across transcripts. An example of this from `Ferrari et al. 2013 <http://www.sciencedirect.com/science/article/pii/S2211124713005603>`__ is shown below: | ||
|
||
.. image:: ../../images/feature-unscaled0.png | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters