-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathS0b_Pan_neuro_ND75KD.Rmd
executable file
·158 lines (118 loc) · 4.65 KB
/
S0b_Pan_neuro_ND75KD.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
title: "Neuro_Droso_ND75KD - Generating non-integrated PanNeuroND75KD Seurat object"
author: "Vincent Gardeux"
date_created: "2023-05-12"
date_modified: "2024-09-25"
output:
html_document:
df_print: paged
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_knit$set(root.dir = "/data/gardeux/Neuro_Droso_ND75KD/")
```
## Introduction
In this script, I will analyze the Pan Neuro ND75-KD single-cell RNA-seq dataset using Seurat.
## Libraries
```{r}
suppressPackageStartupMessages(library(Seurat, lib.loc = "/home/gardeux/.R/lib/")) # Seurat v4.4.0, library for single-cell analysis
suppressPackageStartupMessages(library(SeuratObject, lib.loc = "/home/gardeux/.R/lib/")) # For compatibility purpose
suppressPackageStartupMessages(library(data.table)) # For reading text files
suppressPackageStartupMessages(library(crayon)) # Just for bolding the console output :D
cat(bold("Seurat"), "version", as.character(packageVersion("Seurat")), "\n")
cat(bold("SeuratObject"), "version", as.character(packageVersion("Seurat")), "\n")
cat(bold("data.table"), "version", as.character(packageVersion("data.table")), "\n")
```
## Parameters
```{r}
input_10x_path <- "./data/Pan_neuro_ND75KD.h5"
input_gene_mapping_path <- "./data/features.tsv"
output_seurat_path <- "./data/Pan_neuro_ND75KD.rds"
output_project_name <- tools::file_path_sans_ext(basename(output_seurat_path))
```
## I. Parsing
First step is to read 10X files and create a Seurat object
```{r}
# Here, put the path of the 10x count matrix (output of cellranger)
data.seurat <- CreateSeuratObject(Read10X_h5(input_10x_path, use.names = F), project=output_project_name)
message(ncol(data.seurat), " cells were loaded")
message(nrow(data.seurat), " genes were loaded")
```
## II. Filtering cells/genes
# Find Mitochondrial genes
```{r}
# Read gene mapping
data.gene.mapping <- fread(input_gene_mapping_path, header = F, sep = "\t", data.table = F)
colnames(data.gene.mapping) <- c("Ensembl", "Name", "Type")
rownames(data.gene.mapping) <- data.gene.mapping$Ensembl
# I know that Mitochondrial genes start with mt: in the Drosophila assembly that I'm using
mt.genes <- rownames(data.gene.mapping)[startsWith(data.gene.mapping$Name, "mt:")]
data.seurat[["percent.mt"]] <- PercentageFeatureSet(data.seurat, features = mt.genes)
```
# Visualize QC metrics as a violin plot
```{r}
VlnPlot(data.seurat, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
```
# Filtering cells
We see in the plots that one cell is an outlier (nCount_RNA), and a few others are a bit off compared to the main mass of cells. I will filter the "low quality cells" and outlier cells
```{r}
data.seurat <- subset(data.seurat, subset = nCount_RNA < 20000 & percent.mt < 10)
message(ncol(data.seurat), " cells remain after filtering")
```
# Visualize QC metrics as a violin plot
Visualizing again the QC, without the filtered cells.
```{r}
VlnPlot(data.seurat, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
```
## III. Preprocessing the data
# Normalization
```{r}
# Default params
data.seurat <- NormalizeData(data.seurat, normalization.method = "LogNormalize", scale.factor = 10000)
```
# Highly variable genes
```{r}
data.seurat <- FindVariableFeatures(data.seurat, selection.method = "vst", nfeatures = 2000)
VariableFeaturePlot(data.seurat)
```
# Scaling
```{r}
data.seurat <- ScaleData(data.seurat, features = rownames(data.seurat))
```
# PCA
```{r}
data.seurat <- RunPCA(data.seurat, features = VariableFeatures(data.seurat), verbose = F, npcs = 100)
DimPlot(data.seurat, reduction = "pca")
```
Note: Seurat computes 50PCs by default, and Jackstraw plot says all of them are significant. So, I decided to compute up to 100 PCs.
# JackStraw
```{r}
data.seurat <- JackStraw(data.seurat, num.replicate = 100, dims = 100)
data.seurat <- ScoreJackStraw(data.seurat, dims = 1:100)
data.seurat@reductions$pca@jackstraw$overall.p.values
```
The p-value distribution show that ~90 PCs are good (similar to Pan_neuro_control), so I keep 90 PCs for the rest of the pipeline.
I've plotted the tail of the JackStraw plot (80 to 100 PCs) to see that indeed the sweet spot is around there.
```{r}
JackStrawPlot(data.seurat, dims = 80:100)
npcs <- 90
```
# Clustering
```{r}
data.seurat <- FindNeighbors(data.seurat, dims = 1:npcs)
data.seurat <- FindClusters(data.seurat, resolution = 4)
```
# UMAP (visualization)
```{r}
data.seurat <- RunUMAP(data.seurat, dims = 1:npcs)
DimPlot(data.seurat, reduction = "umap")
```
# t-SNE (visualization)
```{r}
data.seurat <- RunTSNE(data.seurat, dims = 1:npcs)
DimPlot(data.seurat, reduction = "tsne")
```
# Saving the Seurat object
```{r}
saveRDS(data.seurat, file = output_seurat_path)
```