README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "90%"
  )
```

# targidcn <img src="man/figures/logo.png" align="right" height="139" />

<!-- badges: start -->
<!-- badges: end -->

Target identification is an essential first step in drug discovery. 
This package implements convenient functions for performing target 
identification tasks on gene expression data using the WGCNA method. 
("cn" in the package name stands for "correlation network".)

## Authors
- Chen Liang <https://github.com/dzyim>

## Installation

You can install the development version of `targidcn` from 
[GitHub](https://github.com/GHDDI-AILab/target-id-by-WGCNA).

```{r, eval = FALSE}
if (!require("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

BiocManager::install(c("GO.db", "preprocessCore", "impute"))  # Prerequisites for the WGCNA package
BiocManager::install(c("AnnotationDbi", "org.Hs.eg.db"))

if (!require("remotes", quietly = TRUE))
  install.packages("remotes")

remotes::install_github("GHDDI-AILab/target-id-by-WGCNA")
```

## Input data formats
The method can be applied to gene expression data generate 
by RNA-seq or Mass Spectrometry (MS).

### RNA-seq data
(preprocessing not available so far)

### Labelled MS data
The expression levels of proteins are stored in the columns 
with the prefixes `Ratio H/L` and `Ratio H/L normalized`.

### Label-free MS data
The expression levels of proteins are stored in the columns 
with the prefixes `LFQ intensity` or similar.

## Details

R package `targidcn` contains:

- Functions for loading raw data:
  - `ReadExperimentalDesign()`: read raw data path, return an object of class `ExperimentInfo`
  - `ReadPhenotypeTable()`: read a phenotype table, return an object of class `ExperimentInfo`
  - `ReadProteinGroups()`: read raw data path, return an object of class `ProteinGroups`

- S3 classes:
  - `ExperimentList-class`
    - `ExperimentInfo-class`: for storing sample information
    - `ExpAssayTable-class`: for storing raw or QCed expression data, whose rows correspond to genes
      - `ProteinGroups-class`: for storing raw or QCed proteomics data
    - `ExpAssayFrame-class`: for storing QCed, scaled, normalized expression data, whose rows correspond to samples
      - `CorrelationNetwork-class`: for storing storing expression data with correlation network(s)

- S3 methods:
  - `AddPhenotype()`
  - `Subset()`
  - `Tidy()`
  - `QC()`
  - `Reshape()`: convert an `ExpAssayTable` object to an `ExpAssayFrame` object
  - `LogTransform()`
  - `Normalize()`
  - `Histogram()`
  - `SampleTree()`
  - `PickThreshold()`
  - `AddNetwork()`
  - `ModulePlot()`
  - `AddConnectivity()`
  - `GetConnectivity()`
  - `GetHubGenes()`
  - `ModuleSignificance()`
  - `BindModuleSignificance()`
  - `ModuleTraitHeatmap()`
  - `GeneSignificance()`
  - `ModuleMembership()`
  - `GetRelatedHubGenes()`
  - `GetSignificantGenes()`

## Tutorial

```{r}
library(data.table)
library(magrittr)
library(targidcn) %>% suppressMessages()

datadir = system.file("extdata", "MS_label-free", "MaxQuantOutput_50", package = "targidcn")
pheno = ReadPhenotypeTable(file.path(datadir, "phenotype.txt"))
assay = ReadProteinGroups(datadir, col = "Intensity")
assay
```

```{r}
cn = assay %>% 
  AddPhenotype(pheno) %>% 
  Tidy() %>% 
  QC() %>% 
  Reshape() %>% 
  LogTransform() %>% 
  Normalize() %>% 
  AddNetwork(power = 3) %>% 
  AddConnectivity()

```

```{r}
cn
```

```{r, fig.dim = c(8, 8), out.width = "60%"}
cn %>% Histogram(preview = TRUE)
```

```{r, fig.dim = c(20, 12)}
cn %>% SampleTree(preview = TRUE)
```

```{r, fig.dim = c(12.5, 10)}
cn %>% ModulePlot(preview = TRUE)
```

```{r}
cn %>% GetHubGenes()
```

```{r}
samples1 = pheno$table[is.na(Remission) | Remission == 0, Experiment]
samples2 = pheno$table[!is.na(Remission), Experiment]
mt1 = ModuleSignificance(cn, samples = samples1, traits = "Illness", prefix = "UC")
mt2 = ModuleSignificance(cn, samples = samples2, traits = "Remission", prefix = "UC")
mt = BindModuleSignificance(mt1, mt2)
```

```{r, fig.dim = c(10, 10), out.width = "60%"}
mt %>% ModuleTraitHeatmap(preview = TRUE)
```

```{r}
mt %>% GetRelatedHubGenes()
```

## References

**Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular target.**  
Horvath S, Zhang B, Carlson M, et al.  
PNAS. 2006;103(46):17402-17407. doi:10.1073/pnas.0608396103  

**WGCNA: an R package for weighted correlation network analysis.**  
Langfelder P, Horvath S.  
BMC Bioinformatics. 2008;9:559. doi:10.1186/1471-2105-9-559  

**Structural weakening of the colonic mucus barrier is an early event in ulcerative colitis pathogenesis.**  
van der Post S, Jabbar KS, Birchenough G, et al.  
Gut. 2019;68(12):2142-2151. doi:10.1136/gutjnl-2018-317571  


<!-- 
You'll still need to render `README.Rmd` regularly, to keep `README.md` up-to-date. 
`devtools::build_readme()` is handy for this. You could also use GitHub Actions to 
re-render `README.Rmd` every time you push. An example workflow can be found here: 
<https://github.com/r-lib/actions/tree/v1/examples>.
--->