-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
adds notebooks for plotting dynamics and SILAC ratios
- Loading branch information
TomSmithCGAT
committed
Dec 10, 2019
1 parent
b645828
commit 952cfe1
Showing
18 changed files
with
12,661 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,15 @@ | ||
# OOPS_change_in_RNA_binding | ||
Example of how to detect change in RNA binding from parallel Total and RNA-bound protein abundance TMT quantification using the OOPS protocol. | ||
# Visualising OOPS | ||
|
||
This repository was used to generate figures for the OOPS Nature Protocols manuscript | ||
|
||
The repository structure is: | ||
|
||
- raw: | ||
Copies of supplementary data from the original OOPS publication (https://www.nature.com/articles/s41587-018-0001-2) | ||
|
||
- notebooks: | ||
R markdown notebooks to analyse data and generate figures | ||
|
||
- results/plots: | ||
Plots from notebooks | ||
|
||
This notebook is for the Nature Protocols supplementary material |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2,256 changes: 2,256 additions & 0 deletions
2,256
notebooks/Identify_changes_in_RNA_binding.nb.html
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file renamed
BIN
+735 KB
Identify_changes_in_RNA_binding.pdf → ...books/Identify_changes_in_RNA_binding.pdf
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
--- | ||
title: "OOPS CL:NC and RNase +:- ratios" | ||
output: | ||
pdf_document: default | ||
html_notebook: default | ||
html_document: | ||
df_print: paged | ||
header-includes: | ||
- \usepackage{xcolor} | ||
- \usepackage{framed} | ||
--- | ||
|
||
Here we want to visualise the CL vs NC (non-crosslinked) and RNase vs Ctrl SILAC experiments. | ||
|
||
Below we load the required packages and set a plotting theme. | ||
```{r, message=FALSE, warning=FALSE} | ||
# load packages | ||
library(tidyverse) | ||
library(ggbeeswarm) | ||
# set up standardised plotting scheme | ||
theme_set(theme_bw(base_size = 20) + | ||
theme(panel.grid.major=element_blank(), | ||
panel.grid.minor=element_blank(), | ||
aspect.ratio=1)) | ||
cbPalette <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", | ||
"#0072B2", "#D55E00", "#CC79A7", "#999999") | ||
``` | ||
|
||
|
||
We start by reading in the data. Our input here is the protein-level quantification for the CL vs NC and RNase + vs - experiments conducted for the OOPS NBT paper (https://www.nature.com/articles/s41587-018-0001-2). The peptide-level abundances have been aggregated to protein level abundance and center-median normalised. Proteins with missing values have been removed. Only proteins quantified in both "total" and "OOPS" samples are included. | ||
|
||
The input data here is identical to supplementary table 1 from the above paper. Below we read in the data and extract the ratio between CL vs NC or RNase vs Ctrl from the respective datasheets. Where the protein was not quantified in the NC/Ctrl samples, the ratio is NA and the value in the CL/RNase sample is used instead. This represents a "pseudo value" for the ratio which could not be quantified. | ||
```{r} | ||
glycoproteins <- read.delim('../raw/glycoproteins.tsv') %>% pull(protein) | ||
cl_nc_protein_quant_raw <- readxl::read_excel('../raw/ncbi_30607034_OOPS_NBT_table_S1.xlsx', | ||
sheet=1, skip=1, n_max=2655, na='NA', | ||
col_types = c("text", "numeric", "text", "numeric", "numeric", "numeric")) %>% | ||
filter(step==3, !master_protein %in% glycoproteins) %>% | ||
filter(is.finite(CL)) %>% | ||
mutate(pseudo_CL_NC_Ratio=ifelse(is.na(CL_NC_Ratio), CL, CL_NC_Ratio)) %>% | ||
select(master_protein, ratio=pseudo_CL_NC_Ratio) %>% | ||
mutate(exp='CL:NC') | ||
RNase_ctrl_protein_quant_raw <- readxl::read_excel('../raw/ncbi_30607034_OOPS_NBT_table_S1.xlsx', | ||
sheet=3, skip=1, n_max=4307, na='NA', | ||
col_types = c("text", "text", "text", "numeric", "numeric", "numeric", "numeric", | ||
"numeric", "numeric", "numeric", "text")) %>% | ||
filter(!master_protein %in% glycoproteins, cell_line=='U2OS', Phase=='Org') %>% | ||
filter(is.finite(RNAse)) %>% | ||
mutate(pseudo_RNAse_NC_Ratio=ifelse(is.na(RNAse_NC_Ratio), RNAse, RNAse_NC_Ratio)) %>% | ||
select(master_protein, ratio=pseudo_RNAse_NC_Ratio) %>% | ||
mutate(exp='RNase:Ctrl') | ||
``` | ||
|
||
Combine the two experiments | ||
```{r} | ||
combined_cl_rnase <- rbind(RNase_ctrl_protein_quant_raw, cl_nc_protein_quant_raw) | ||
``` | ||
|
||
Plot | ||
```{r} | ||
ratios_p1 <- combined_cl_rnase %>% | ||
mutate(exp=factor(exp, levels=c('CL:NC', 'RNase:Ctrl'))) %>% | ||
ggplot(aes(ratio, fill=exp)) + | ||
geom_histogram(bins=60) + | ||
facet_grid(exp~., scales='free_y') + | ||
xlab('Ratio (log2)') + | ||
ylab('Proteins') + | ||
geom_vline(xintercept=0, linetype=2, colour='grey50') + | ||
scale_fill_manual(values=cbPalette[2:3], guide=FALSE) + | ||
theme(panel.spacing = unit(0.25, "lines")) | ||
print(ratios_p1) | ||
ggsave('../results/plots/ratios_p1.png') | ||
ratios_p2 <- combined_cl_rnase %>% | ||
mutate(exp=factor(exp, levels=c('RNase:Ctrl', 'CL:NC'))) %>% | ||
ggplot(aes(exp, ratio, colour=exp)) + | ||
geom_quasirandom(size=0.5, bandwidth=0.25) + | ||
coord_flip() + | ||
xlab('') + | ||
ylab('Ratio (log2)') + | ||
geom_hline(yintercept=0, linetype=2, colour='grey50') + | ||
scale_colour_manual(values=cbPalette[c(3,2)], guide=FALSE) | ||
print(ratios_p2) | ||
ggsave('../results/plots/ratios_p2.png') | ||
``` | ||
|
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
--- | ||
title: "Visualising RNA binding dynamics" | ||
output: | ||
pdf_document: default | ||
html_notebook: default | ||
header-includes: | ||
- \usepackage{xcolor} | ||
- \usepackage{framed} | ||
--- | ||
|
||
|
||
Here we will visualise total and RNA-bound protein abundances across conditions. | ||
|
||
Below we load the required packages and set a plotting theme. | ||
```{r, message=FALSE, warning=FALSE} | ||
# load packages | ||
library(tidyverse) | ||
library(UniProt.ws) | ||
# set up standardised plotting scheme | ||
theme_set(theme_bw(base_size = 15) + | ||
theme(panel.grid.major=element_blank(), | ||
panel.grid.minor=element_blank(), | ||
aspect.ratio=1)) | ||
cbPalette <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", | ||
"#0072B2", "#D55E00", "#CC79A7", "#999999") | ||
``` | ||
|
||
|
||
We start by reading in the data. Our input here is the protein-level quantification for the Nocodazole arrest/release experiment conducted for the OOPS NBT paper (https://www.nature.com/articles/s41587-018-0001-2). In this experiment, we wanted to assess changes in RNA binding in arrested/released cells. To do this, we quantified "total" protein abundance and RNA-bound (extracted by OOPS) protein abundance. The peptide-level abundances have been aggregated to protein level abundance and center-median normalised. Proteins with missing values have been removed. Only proteins quantified in both "total" and "OOPS" samples are included. | ||
|
||
The input data here is identical to supplementary table 5 from the above paper. | ||
```{r} | ||
protein_quant_raw <- readxl::read_excel('../raw/ncbi_30607034_OOPS_NBT_table_S5.xlsx', | ||
sheet=3, skip=1, n_max=1917) | ||
``` | ||
|
||
In order to plot a functional subset of proteins, we will use the UniProt pathway annotations. | ||
|
||
Warning: This cell will take a few minutes to run the query on the Uniprot database... | ||
```{r} | ||
humanUP <- UniProt.ws(taxId=9606) # H.sapiens | ||
protein_ids <- protein_quant_raw$master_protein | ||
hsapiens.annot <- AnnotationDbi::select( | ||
humanUP, | ||
keys = protein_ids, | ||
columns = c("PATHWAY", "PROTEIN-NAMES"), | ||
keystyle = "UNIPROTKB") | ||
hsapiens.pathway <- hsapiens.annot %>% data.frame() %>% | ||
separate_rows(PATHWAY, sep="; ") %>% dplyr::select(UNIPROTKB, PROTEIN.NAMES, PATHWAY) | ||
``` | ||
|
||
Identify the glycolysis proteins | ||
```{r} | ||
glycolysis_proteins <- hsapiens.pathway %>% filter(PATHWAY=='glycolysis') | ||
glycolysis_proteins$cleaned_protein_name <- sapply(strsplit(glycolysis_proteins$PROTEIN.NAMES, split='\\('), '[[', 1) | ||
``` | ||
|
||
Restructure the data and subset to the glycolysis proteins | ||
```{r} | ||
glycolysis_intensities <- protein_quant_raw %>% | ||
gather(key='sample', value='intensity', -master_protein) %>% | ||
merge(glycolysis_proteins, by.x='master_protein', by.y='UNIPROTKB') %>% | ||
separate(sample, into=c('timepoint', 'replicate', 'type'), remove=FALSE) %>% | ||
mutate(type=factor(type, levels=c('total', 'OOPS'))) %>% | ||
mutate(timepoint=factor(timepoint, levels=c('0h', '6h', '23h'))) | ||
glycolysis_intensities$type <- recode(glycolysis_intensities$type, 'OOPS'='RNA-bound', 'total'='Total') | ||
``` | ||
|
||
Plot the glycolysis proteins | ||
```{r, fig.width=10} | ||
protein_order <- glycolysis_intensities %>% | ||
group_by(cleaned_protein_name) %>% summarise(max_intensity=max(intensity)) %>% | ||
arrange(max_intensity) %>% pull(cleaned_protein_name) | ||
p <- glycolysis_intensities %>% | ||
mutate(cleaned_protein_name=factor(cleaned_protein_name, levels=protein_order)) %>% | ||
ggplot(aes(interaction(replicate, timepoint), cleaned_protein_name, fill=intensity)) + | ||
geom_tile(colour='grey80', lwd=0.1) + | ||
facet_grid(.~type) + | ||
ylab('') + xlab('') + | ||
scale_x_discrete(labels=c('', '0h', '', '', '6h', '', '', '23h', '')) + | ||
geom_vline(xintercept=3.5) + | ||
geom_vline(xintercept=6.5) + | ||
scale_fill_gradient(low=cbPalette[1], high=cbPalette[5], name='Protein abundance\n(centre-median normalised)') + | ||
theme(axis.text.y=element_text(size=10)) | ||
print(p) | ||
ggsave('../results/plots/rna_binding_changes_heatmap.png') | ||
``` | ||
|
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file not shown.
Oops, something went wrong.