Skip to content

Commit

Permalink
small change to BSseq segmentation
Browse files Browse the repository at this point in the history
  • Loading branch information
altuna akalin authored and altuna akalin committed Nov 15, 2020
1 parent 92057d3 commit 08d3028
Show file tree
Hide file tree
Showing 3 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion 10-bs-seq-analysis.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -324,7 +324,7 @@ my.diffMeth3=calculateDiffMeth(sim.methylBase,
The analysis of methylation dynamics is not exclusively restricted to differentially methylated regions across samples. Apart from this there is also an interest in examining the methylation profiles within the same sample. Usually, depressions in methylation profiles pinpoint regulatory regions like gene promoters that co-localize with CG-dense CpG islands. On the other hand, many gene-body regions are extensively methylated and CpG-poor [@Bock2012-oh]. These observations would describe a bimodal model of either hyper- or hypomethylated regions depending on the local density of CpGs [@Lovkvist2016-ky]. However, given the detection of CpG-poor regions with locally reduced levels of methylation (on average 30%) in pluripotent embryonic stem cells and in neuronal progenitors in both mouse and human, a different model also seems reasonable [@Stadler2011-iu]. These low-methylated regions (LMRs) are located distal to promoters, have little overlap with CpG islands, and are associated with enhancer marks such as p300 binding sites and H3K27ac enrichment.

Now we are going to try to segment a portion for the H1 human embryonic stem cell line. MethylKit \index{R Packages!\texttt{methylKit}}uses change-point analysis to segment the methylome. In change-point analysis, the change-points of a genome-wide methylation signal are recorded and the genome is partitioned into regions between consecutive change points. CpGs in each segment are similar to each other more than the following segment.
After segmentation, methylKit function `methSeg()` identifies segments that are further clustered into segment classes using a mixture modeling approach. This clustering is based on only the average methylation level of the segments and allows the detection of distinct methylome features comparable to unmethylated regions (UMRs), lowly methylated regions (LMRs), and fully methylated regions (FMRs) mentioned in Stadler et al. [@Stadler2011-yv]. The code snippet below reads the methylation data from the H1 cell line as a `GRanges` object, and runs the segmentation with potentially up to classes of segments. Mixture modeling determines the optimal number of segments using a statistic called Bayesian information criterion (BIC). The BIC is a statistic based on model likelihood and helps us select the model that fits the data better. We have set the number of segment classes to try using the `G=1:4` argument. The `minSeg` arguments are related to the minimum number of CpGs in the segments. The function `methSeg()` outputs a diagnostic plot for segmentation. This plot is shown in Figure \@ref(fig:segDiag). It shows methylation values and lengths of segments in each segment class, as well as the BIC for different numbers of segments.
After segmentation, methylKit function `methSeg()` identifies segments that are further clustered into segment classes using a mixture modeling approach. This clustering is based on only the average methylation level of the segments and allows the detection of distinct methylome features comparable to unmethylated regions (UMRs), lowly methylated regions (LMRs), and fully methylated regions (FMRs) mentioned in Stadler et al. [@Stadler2011-yv]. The code snippet below reads the methylation data from the H1 cell line as a `GRanges` object, and runs the segmentation with potentially up to 4 classes of segments. Mixture modeling determines the optimal number of segments using a statistic called Bayesian information criterion (BIC). The BIC is a statistic based on model likelihood and helps us select the model that fits the data better. We have set the number of segment classes to try using the `G=1:4` argument. The `minSeg` arguments are related to the minimum number of CpGs in the segments. The function `methSeg()` outputs a diagnostic plot for segmentation. This plot is shown in Figure \@ref(fig:segDiag). It shows methylation values and lengths of segments in each segment class, as well as the BIC for different numbers of segments.
```{r segDiag, fig.width=14,fig.height=8,fig.cap="Segmentation characteristics shown in different plots. Top left: Mean methylation values per segment in each segment class. Top middle: Length of each segment as boxplots for each segment class. Top right: Number of segments in each segment class. Bottom left: Distribution of segment methylation values. Bottom right: BIC for different number of segment classes",warning=FALSE,out.width="90%"}
# read methylation data
Expand Down
Binary file removed images/CompGen2019_A3_v2_final.png
Binary file not shown.
2 changes: 1 addition & 1 deletion latex/before_body.tex
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
%\cleardoublepage\newpage
\thispagestyle{empty}
\begin{center}
\includegraphics{images/dedication.pdf}
\includegraphics{images/dedicationOld.pdf}
\end{center}

\setlength{\abovedisplayskip}{-5pt}
Expand Down

0 comments on commit 08d3028

Please sign in to comment.