Skip to content

Commit

Permalink
adds notebooks for plotting dynamics and SILAC ratios
Browse files Browse the repository at this point in the history
  • Loading branch information
TomSmithCGAT committed Dec 10, 2019
1 parent b645828 commit 952cfe1
Show file tree
Hide file tree
Showing 18 changed files with 12,661 additions and 16 deletions.
17 changes: 14 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,15 @@
# OOPS_change_in_RNA_binding
Example of how to detect change in RNA binding from parallel Total and RNA-bound protein abundance TMT quantification using the OOPS protocol.
# Visualising OOPS

This repository was used to generate figures for the OOPS Nature Protocols manuscript

The repository structure is:

- raw:
Copies of supplementary data from the original OOPS publication (https://www.nature.com/articles/s41587-018-0001-2)

- notebooks:
R markdown notebooks to analyse data and generate figures

- results/plots:
Plots from notebooks

This notebook is for the Nature Protocols supplementary material
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,6 @@ header-includes:
---


\colorlet{shadecolor}{lightgray!10}

In this notebook, we will identify changes in RNA binding following parallel quantification of the "total" and "RNA-bound" protein abundance using the OOPS protocol. We start with a single protein and use the base function `lm`, then apply this to the complete dataset to identify all proteins with a change in RNA binding. Following this, we use the `limma` package to test for RNA binding changes using a moderated t-statistic.

Below we load the required packages and set a plotting theme.
Expand All @@ -36,7 +34,7 @@ We start by reading in the data. Our input here is the protein-level quantificat

The input data here is identical to supplementary table 5 from the above paper.
```{r}
protein_quant_raw <- readxl::read_excel('./ncbi_30607034_OOPS_NBT_table_S5.xlsx',
protein_quant_raw <- readxl::read_excel('../raw/ncbi_30607034_OOPS_NBT_table_S5.xlsx',
sheet=3, skip=1, n_max=1917) %>%
tibble::column_to_rownames('master_protein')
Expand Down Expand Up @@ -252,31 +250,31 @@ limma_results_treat_sig <- topTreat(limma_results_treat, coef = "timepoint6h:typ
```

And below we reproduce our volcano plot including the 95% confidence interval and highlight those proteins which have < 1% FDR and an absolute fold change significantly greater than 2.
And below we reproduce our volcano plot including the 95% confidence interval and highlight those proteins which have < 1% FDR and an absolute fold change significantly greater than 2. We can see that many of the proteins with a fold change (FC) point estimate > 2 have a 95% confidence interval that overlaps the dashed lines for >2-fold change. TREAT also takes the multiple testing into account so it's even more conservative than just using the 95% CI shown below.

```{r, fig.height=6, fid.width=6, out.width = '60%'}
limma_results$sig <- ifelse(limma_results$P.Value<=0.01, "<1% FDR", ">1% FDR") # add "sig" column
limma_results$sig <- ifelse(limma_results$P.Value<=0.01, "Sig. change (FC > 1; <1% FDR)", ">1% FDR") # add "sig" column
limma_results$sig[
rownames(limma_results) %in% rownames(limma_results_treat_sig)] <- "<1% FDR & FC > 2 (TREAT)"
rownames(limma_results) %in% rownames(limma_results_treat_sig)] <- "Sig. change (FC > 2; <1% FDR)"
limma_results$SE <- sqrt(limma_fit_results$s2.post) * limma_fit_results$stdev.unscaled[,1]
p <- limma_results %>%
ggplot(aes(logFC, -log10(P.Value), colour=sig)) +
geom_point(size=1) +
scale_colour_manual(values=c(cbPalette[c(6,2)], "grey20"), name="") + # manually adjust colours
geom_point(size=0.5) +
geom_errorbarh(aes(xmin=CI.L, xmax=CI.R), alpha=0.3, size=0.5) +
scale_colour_manual(values=c("grey20", cbPalette[c(2,6)]), name="") + # manually adjust colours
geom_vline(xintercept=1, linetype=2, colour="grey70") +
geom_vline(xintercept=-1, linetype=2, colour="grey70") +
theme(legend.position="top", legend.direction=2)
theme(legend.position="top", legend.direction=2) +
xlab('Fold change (log2)') +
ylab('-log10(P-value)')
print(p)
ggsave('../results/plots/rna_binding_changes_volcano.png')
```

We can also add error bars to the above Below, we can see that many of the proteins with a fold change (FC) point estimate > 2 have a 95% confidence interval that overlaps the dashed lines for >2-fold change. TREAT also takes the multiple testing into account so it's even more conservative than just using the 95% CI shown below.
```{r, fig.height=6, fid.width=6, out.width = '60%'}
print(p + geom_errorbarh(aes(xmin=CI.L, xmax=CI.R)))
```

Of course, the threshold for the fold changes you are interested in depends entirely on your prior expectations. A 2-fold threshold for "biologically" relevant changes will likely be overly conservative in the great majority of cases.

Expand Down
2,256 changes: 2,256 additions & 0 deletions notebooks/Identify_changes_in_RNA_binding.nb.html

Large diffs are not rendered by default.

Binary file not shown.
96 changes: 96 additions & 0 deletions notebooks/OOPS_CL_RNase_control_ratios.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
title: "OOPS CL:NC and RNase +:- ratios"
output:
pdf_document: default
html_notebook: default
html_document:
df_print: paged
header-includes:
- \usepackage{xcolor}
- \usepackage{framed}
---

Here we want to visualise the CL vs NC (non-crosslinked) and RNase vs Ctrl SILAC experiments.

Below we load the required packages and set a plotting theme.
```{r, message=FALSE, warning=FALSE}
# load packages
library(tidyverse)
library(ggbeeswarm)
# set up standardised plotting scheme
theme_set(theme_bw(base_size = 20) +
theme(panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
aspect.ratio=1))
cbPalette <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442",
"#0072B2", "#D55E00", "#CC79A7", "#999999")
```


We start by reading in the data. Our input here is the protein-level quantification for the CL vs NC and RNase + vs - experiments conducted for the OOPS NBT paper (https://www.nature.com/articles/s41587-018-0001-2). The peptide-level abundances have been aggregated to protein level abundance and center-median normalised. Proteins with missing values have been removed. Only proteins quantified in both "total" and "OOPS" samples are included.

The input data here is identical to supplementary table 1 from the above paper. Below we read in the data and extract the ratio between CL vs NC or RNase vs Ctrl from the respective datasheets. Where the protein was not quantified in the NC/Ctrl samples, the ratio is NA and the value in the CL/RNase sample is used instead. This represents a "pseudo value" for the ratio which could not be quantified.
```{r}
glycoproteins <- read.delim('../raw/glycoproteins.tsv') %>% pull(protein)
cl_nc_protein_quant_raw <- readxl::read_excel('../raw/ncbi_30607034_OOPS_NBT_table_S1.xlsx',
sheet=1, skip=1, n_max=2655, na='NA',
col_types = c("text", "numeric", "text", "numeric", "numeric", "numeric")) %>%
filter(step==3, !master_protein %in% glycoproteins) %>%
filter(is.finite(CL)) %>%
mutate(pseudo_CL_NC_Ratio=ifelse(is.na(CL_NC_Ratio), CL, CL_NC_Ratio)) %>%
select(master_protein, ratio=pseudo_CL_NC_Ratio) %>%
mutate(exp='CL:NC')
RNase_ctrl_protein_quant_raw <- readxl::read_excel('../raw/ncbi_30607034_OOPS_NBT_table_S1.xlsx',
sheet=3, skip=1, n_max=4307, na='NA',
col_types = c("text", "text", "text", "numeric", "numeric", "numeric", "numeric",
"numeric", "numeric", "numeric", "text")) %>%
filter(!master_protein %in% glycoproteins, cell_line=='U2OS', Phase=='Org') %>%
filter(is.finite(RNAse)) %>%
mutate(pseudo_RNAse_NC_Ratio=ifelse(is.na(RNAse_NC_Ratio), RNAse, RNAse_NC_Ratio)) %>%
select(master_protein, ratio=pseudo_RNAse_NC_Ratio) %>%
mutate(exp='RNase:Ctrl')
```

Combine the two experiments
```{r}
combined_cl_rnase <- rbind(RNase_ctrl_protein_quant_raw, cl_nc_protein_quant_raw)
```

Plot
```{r}
ratios_p1 <- combined_cl_rnase %>%
mutate(exp=factor(exp, levels=c('CL:NC', 'RNase:Ctrl'))) %>%
ggplot(aes(ratio, fill=exp)) +
geom_histogram(bins=60) +
facet_grid(exp~., scales='free_y') +
xlab('Ratio (log2)') +
ylab('Proteins') +
geom_vline(xintercept=0, linetype=2, colour='grey50') +
scale_fill_manual(values=cbPalette[2:3], guide=FALSE) +
theme(panel.spacing = unit(0.25, "lines"))
print(ratios_p1)
ggsave('../results/plots/ratios_p1.png')
ratios_p2 <- combined_cl_rnase %>%
mutate(exp=factor(exp, levels=c('RNase:Ctrl', 'CL:NC'))) %>%
ggplot(aes(exp, ratio, colour=exp)) +
geom_quasirandom(size=0.5, bandwidth=0.25) +
coord_flip() +
xlab('') +
ylab('Ratio (log2)') +
geom_hline(yintercept=0, linetype=2, colour='grey50') +
scale_colour_manual(values=cbPalette[c(3,2)], guide=FALSE)
print(ratios_p2)
ggsave('../results/plots/ratios_p2.png')
```

1,756 changes: 1,756 additions & 0 deletions notebooks/OOPS_CL_RNase_control_ratios.html

Large diffs are not rendered by default.

1,923 changes: 1,923 additions & 0 deletions notebooks/OOPS_CL_RNase_control_ratios.nb.html

Large diffs are not rendered by default.

Binary file added notebooks/OOPS_CL_RNase_control_ratios.pdf
Binary file not shown.
98 changes: 98 additions & 0 deletions notebooks/Visualise_dynamics.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
---
title: "Visualising RNA binding dynamics"
output:
pdf_document: default
html_notebook: default
header-includes:
- \usepackage{xcolor}
- \usepackage{framed}
---


Here we will visualise total and RNA-bound protein abundances across conditions.

Below we load the required packages and set a plotting theme.
```{r, message=FALSE, warning=FALSE}
# load packages
library(tidyverse)
library(UniProt.ws)
# set up standardised plotting scheme
theme_set(theme_bw(base_size = 15) +
theme(panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
aspect.ratio=1))
cbPalette <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442",
"#0072B2", "#D55E00", "#CC79A7", "#999999")
```


We start by reading in the data. Our input here is the protein-level quantification for the Nocodazole arrest/release experiment conducted for the OOPS NBT paper (https://www.nature.com/articles/s41587-018-0001-2). In this experiment, we wanted to assess changes in RNA binding in arrested/released cells. To do this, we quantified "total" protein abundance and RNA-bound (extracted by OOPS) protein abundance. The peptide-level abundances have been aggregated to protein level abundance and center-median normalised. Proteins with missing values have been removed. Only proteins quantified in both "total" and "OOPS" samples are included.

The input data here is identical to supplementary table 5 from the above paper.
```{r}
protein_quant_raw <- readxl::read_excel('../raw/ncbi_30607034_OOPS_NBT_table_S5.xlsx',
sheet=3, skip=1, n_max=1917)
```

In order to plot a functional subset of proteins, we will use the UniProt pathway annotations.

Warning: This cell will take a few minutes to run the query on the Uniprot database...
```{r}
humanUP <- UniProt.ws(taxId=9606) # H.sapiens
protein_ids <- protein_quant_raw$master_protein
hsapiens.annot <- AnnotationDbi::select(
humanUP,
keys = protein_ids,
columns = c("PATHWAY", "PROTEIN-NAMES"),
keystyle = "UNIPROTKB")
hsapiens.pathway <- hsapiens.annot %>% data.frame() %>%
separate_rows(PATHWAY, sep="; ") %>% dplyr::select(UNIPROTKB, PROTEIN.NAMES, PATHWAY)
```

Identify the glycolysis proteins
```{r}
glycolysis_proteins <- hsapiens.pathway %>% filter(PATHWAY=='glycolysis')
glycolysis_proteins$cleaned_protein_name <- sapply(strsplit(glycolysis_proteins$PROTEIN.NAMES, split='\\('), '[[', 1)
```

Restructure the data and subset to the glycolysis proteins
```{r}
glycolysis_intensities <- protein_quant_raw %>%
gather(key='sample', value='intensity', -master_protein) %>%
merge(glycolysis_proteins, by.x='master_protein', by.y='UNIPROTKB') %>%
separate(sample, into=c('timepoint', 'replicate', 'type'), remove=FALSE) %>%
mutate(type=factor(type, levels=c('total', 'OOPS'))) %>%
mutate(timepoint=factor(timepoint, levels=c('0h', '6h', '23h')))
glycolysis_intensities$type <- recode(glycolysis_intensities$type, 'OOPS'='RNA-bound', 'total'='Total')
```

Plot the glycolysis proteins
```{r, fig.width=10}
protein_order <- glycolysis_intensities %>%
group_by(cleaned_protein_name) %>% summarise(max_intensity=max(intensity)) %>%
arrange(max_intensity) %>% pull(cleaned_protein_name)
p <- glycolysis_intensities %>%
mutate(cleaned_protein_name=factor(cleaned_protein_name, levels=protein_order)) %>%
ggplot(aes(interaction(replicate, timepoint), cleaned_protein_name, fill=intensity)) +
geom_tile(colour='grey80', lwd=0.1) +
facet_grid(.~type) +
ylab('') + xlab('') +
scale_x_discrete(labels=c('', '0h', '', '', '6h', '', '', '23h', '')) +
geom_vline(xintercept=3.5) +
geom_vline(xintercept=6.5) +
scale_fill_gradient(low=cbPalette[1], high=cbPalette[5], name='Protein abundance\n(centre-median normalised)') +
theme(axis.text.y=element_text(size=10))
print(p)
ggsave('../results/plots/rna_binding_changes_heatmap.png')
```

1,935 changes: 1,935 additions & 0 deletions notebooks/Visualise_dynamics.nb.html

Large diffs are not rendered by default.

Binary file added notebooks/Visualise_dynamics.pdf
Binary file not shown.
Loading

0 comments on commit 952cfe1

Please sign in to comment.