-
Notifications
You must be signed in to change notification settings - Fork 40
/
Copy path17-DIY.Rmd
127 lines (103 loc) · 5.51 KB
/
17-DIY.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
title: "Lab 17"
output: html_document
---
# DIY Lab
```{r, warning=FALSE, message=FALSE}
library(Seurat)
library(dplyr)
library(Matrix)
library(gdata)
library(utils)
library(liger)
library(SingleCellExperiment)
library(destiny)
library(scater)
library(clusterExperiment)
library(gam)
library(corrplot)
library(ggplot2)
library(ggthemes)
library(ggbeeswarm)
library(cowplot)
library(RColorBrewer)
```
## DIY Lab
Hopefully now you have a "feel" for what scRNA-seq analysis entails. Today we will work in groups to analyze a publicly available data set: IDH-mutated gliomas.
Your data includes:
IDH_A_processed_data_portal.txt - the TPM matrix, already in log scale
IDH_A_cell_type_assignment_portal.txt - a classification of cells to malignant and non-malignant groups, and to tumors
You also have files for different signatures.
What would you like to focus on today?
If it's clustering and identifying cell populations we reccomend you use all the data and try to distinguish the different cell types. If it's combining a few datasets and work on batch correction, we reccomend you focus on all malignant cells and work on the batch effects among tumors and identifying cell populations shared among them. If it's identifying subtle differenced in clustering and trying also to identify lineages we reccomend you use one tumor.
Most importantly, feel free to explore what ever interest you!
```{r read data_diy}
# read the single cell RNA data
sc.data.dirname <- "data/lab20_DIY/"
# if your rstudio crashes and you would like to use a smaller dataset
counts <- read.table(file = paste0(sc.data.dirname,"/IDH_A_processed_data_portal_filtered.txt"), sep="\t", header = TRUE, row.names=1)
classification <- read.table(file = paste0(sc.data.dirname,"/IDH_A_cell_type_assignment_portal_filtered.txt"), sep="\t", header = TRUE)
```
Create a seurat object filtering out the very extreme cases.
```{r create seurat object_diy}
seurat<-CreateSeuratObject(counts = counts, min.cells = 3, min.features = 350, project = "Astrocytomas")
```
```{r add_meta_data_diy}
vec.cell.type <- classification$type
names(vec.cell.type) <- classification$cell_name
seurat <- AddMetaData(object = seurat, metadata = vec.cell.type, col.name = "cell_type")
vec.tumor.name <- classification$tumor_name
names(vec.tumor.name) <- classification$cell_name
seurat <- AddMetaData(object = seurat, metadata = vec.tumor.name, col.name = "tumor_name")
```
```{r look_into_technical_features_diy}
#Notice: Unfortunetlay this dataset does not provide mitochondrial genes so we cannot calculate percent.mito
# load resources
resources.dirname <- "data/resources/"
# Load the the list of house keeping genes
hkgenes <- read.table(paste0(resources.dirname,"/tirosh_house_keeping.txt"), skip = 2)
hkgenes <- as.vector(hkgenes$V1)
# remove hkgenes that were not found
hkgenes.found <- which(toupper(rownames(seurat@assays$RNA)) %in% hkgenes)
n.expressed.hkgenes <- Matrix::colSums(seurat@assays$RNA[hkgenes.found, ] > 0)
seurat <- AddMetaData(object = seurat, metadata = n.expressed.hkgenes, col.name = "n.exp.hkgenes")
VlnPlot(object = seurat, features = c("nFeature_RNA", "nCount_RNA","n.exp.hkgenes"), ncol = 3)
```
You can start by filtering extreme outliers if you would like (just replace the '?')
```{r filterdata_diy, eval = FALSE}
seurat <- subset(seurat, subset = nFeature_RNA > ? & nFeature_RNA < ? & n.exp.hkgenes > ?)
```
And normalize:
```{r normalize_diy}
seurat <- NormalizeData(object = seurat, normalization.method = "LogNormalize", scale.factor = 1e4)
```
```{r calculate_cell_cycle_diy}
# Read in a list of cell cycle markers, from Tirosh et al, 2015.
# We can segregate this list into markers of G2/M phase and markers of S phase.
s.genes <- Seurat::cc.genes$s.genes
g2m.genes <- Seurat::cc.genes$g2m.genes
seurat <- CellCycleScoring(object = seurat, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE)
```
```{r add_known_signatures_diy}
seurat <- AddModuleScore(object = seurat, features = list(read.table(paste0(resources.dirname,"astro_genes.txt"))$V1), name = "astro_signature")
seurat <- AddModuleScore(object = seurat, features = list(read.table(paste0(resources.dirname,"oligo_genes.txt"))$V1), name = "oligo_signature")
seurat <- AddModuleScore(object = seurat, features = list(read.table(paste0(resources.dirname,"stemness_genes.txt"))$V1), name = "stemness_signature")
```
NOTICE: THESE FOLLOWING COMMANDS WILL ONLY WORK AFTER CREATEING UMAP/tSNE
Here are useful markers that can be used to explore the clusters after you will cluster the data
```{r plot_immune_cells_markers_diy, eval = FALSE}
FeaturePlot(seurat, features = c("CD14","CSF1R","FCER1G","FCGR3A")) # markers for macrophages
FeaturePlot(seurat, features = c("MBP","MAG","MOG")) # markers for oligodendrocytes
```
And a few useful commands, that can be used after clustering:
```{r plot_signatures_diy, eval = FALSE}
# notice that the name of the signature is the name you assigned to it, plus "1" that seurat is adding to it
FeaturePlot(seurat, features = c("oligo_signature1","astro_signature1","stemness_signature1"))
```
If you want to focus on just malignant cells, or just one tumor at a time
```{r subset_seurat_object_diy, eval = FALSE}
seurat.malignant<- subset(seurat, subset = cell_type == "malignant") #keep only malignant cells
seurat.malignant.MGH107<- subset(seurat.malignant, subset = tumor_name == "MGH107neg") #keep only malignant MGH107 cells
seurat.malignant.MGH44<- subset(seurat.malignant, subset = tumor_name == "MGH44") #keep only malignant MGH44 cells
```
Ok, you are ready! Discuss what you wish to explore and go for it!