-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
186 lines (145 loc) · 5.13 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "90%"
)
```
# targidcn <img src="man/figures/logo.png" align="right" height="139" />
<!-- badges: start -->
<!-- badges: end -->
Target identification is an essential first step in drug discovery.
This package implements convenient functions for performing target
identification tasks on gene expression data using the WGCNA method.
("cn" in the package name stands for "correlation network".)
## Authors
- Chen Liang <https://github.com/dzyim>
## Installation
You can install the development version of `targidcn` from
[GitHub](https://github.com/GHDDI-AILab/target-id-by-WGCNA).
```{r, eval = FALSE}
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("GO.db", "preprocessCore", "impute")) # Prerequisites for the WGCNA package
BiocManager::install(c("AnnotationDbi", "org.Hs.eg.db"))
if (!require("remotes", quietly = TRUE))
install.packages("remotes")
remotes::install_github("GHDDI-AILab/target-id-by-WGCNA")
```
## Input data formats
The method can be applied to gene expression data generate
by RNA-seq or Mass Spectrometry (MS).
### RNA-seq data
(preprocessing not available so far)
### Labelled MS data
The expression levels of proteins are stored in the columns
with the prefixes `Ratio H/L` and `Ratio H/L normalized`.
### Label-free MS data
The expression levels of proteins are stored in the columns
with the prefixes `LFQ intensity` or similar.
## Details
R package `targidcn` contains:
- Functions for loading raw data:
- `ReadExperimentalDesign()`: read raw data path, return an object of class `ExperimentInfo`
- `ReadPhenotypeTable()`: read a phenotype table, return an object of class `ExperimentInfo`
- `ReadProteinGroups()`: read raw data path, return an object of class `ProteinGroups`
- S3 classes:
- `ExperimentList-class`
- `ExperimentInfo-class`: for storing sample information
- `ExpAssayTable-class`: for storing raw or QCed expression data, whose rows correspond to genes
- `ProteinGroups-class`: for storing raw or QCed proteomics data
- `ExpAssayFrame-class`: for storing QCed, scaled, normalized expression data, whose rows correspond to samples
- `CorrelationNetwork-class`: for storing storing expression data with correlation network(s)
- S3 methods:
- `AddPhenotype()`
- `Subset()`
- `Tidy()`
- `QC()`
- `Reshape()`: convert an `ExpAssayTable` object to an `ExpAssayFrame` object
- `LogTransform()`
- `Normalize()`
- `Histogram()`
- `SampleTree()`
- `PickThreshold()`
- `AddNetwork()`
- `ModulePlot()`
- `AddConnectivity()`
- `GetConnectivity()`
- `GetHubGenes()`
- `ModuleSignificance()`
- `BindModuleSignificance()`
- `ModuleTraitHeatmap()`
- `GeneSignificance()`
- `ModuleMembership()`
- `GetRelatedHubGenes()`
- `GetSignificantGenes()`
## Tutorial
```{r}
library(data.table)
library(magrittr)
library(targidcn) %>% suppressMessages()
datadir = system.file("extdata", "MS_label-free", "MaxQuantOutput_50", package = "targidcn")
pheno = ReadPhenotypeTable(file.path(datadir, "phenotype.txt"))
assay = ReadProteinGroups(datadir, col = "Intensity")
assay
```
```{r}
cn = assay %>%
AddPhenotype(pheno) %>%
Tidy() %>%
QC() %>%
Reshape() %>%
LogTransform() %>%
Normalize() %>%
AddNetwork(power = 3) %>%
AddConnectivity()
```
```{r}
cn
```
```{r, fig.dim = c(8, 8), out.width = "60%"}
cn %>% Histogram(preview = TRUE)
```
```{r, fig.dim = c(20, 12)}
cn %>% SampleTree(preview = TRUE)
```
```{r, fig.dim = c(12.5, 10)}
cn %>% ModulePlot(preview = TRUE)
```
```{r}
cn %>% GetHubGenes()
```
```{r}
samples1 = pheno$table[is.na(Remission) | Remission == 0, Experiment]
samples2 = pheno$table[!is.na(Remission), Experiment]
mt1 = ModuleSignificance(cn, samples = samples1, traits = "Illness", prefix = "UC")
mt2 = ModuleSignificance(cn, samples = samples2, traits = "Remission", prefix = "UC")
mt = BindModuleSignificance(mt1, mt2)
```
```{r, fig.dim = c(10, 10), out.width = "60%"}
mt %>% ModuleTraitHeatmap(preview = TRUE)
```
```{r}
mt %>% GetRelatedHubGenes()
```
## References
**Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular target.**
Horvath S, Zhang B, Carlson M, et al.
PNAS. 2006;103(46):17402-17407. doi:10.1073/pnas.0608396103
**WGCNA: an R package for weighted correlation network analysis.**
Langfelder P, Horvath S.
BMC Bioinformatics. 2008;9:559. doi:10.1186/1471-2105-9-559
**Structural weakening of the colonic mucus barrier is an early event in ulcerative colitis pathogenesis.**
van der Post S, Jabbar KS, Birchenough G, et al.
Gut. 2019;68(12):2142-2151. doi:10.1136/gutjnl-2018-317571
<!--
You'll still need to render `README.Rmd` regularly, to keep `README.md` up-to-date.
`devtools::build_readme()` is handy for this. You could also use GitHub Actions to
re-render `README.Rmd` every time you push. An example workflow can be found here:
<https://github.com/r-lib/actions/tree/v1/examples>.
--->