Skip to content

bebeklab/nSEA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nSEA

n-Node Subnetwork Enumeration Algorithm (nSEA) is an unsupervised bottom-up approach to facilitate the identification of significantly dysregulated functional subnetworks to identify disease phenotypes.

Setup

  1. In the working directory, create a folder “TCGA”. In the “TCGA” folder, create another folder as “TCGA_UCSC_XXX” where “XXX” is the cancer code. For example, for low grade glioma, the cancer code is “LGG”. So the folder name should be “TCGA_UCSC_LGG”. This is the directory you store all the data related to LGG project (“TCGA/ TCGA_UCSC_LGG/”). And all the plots and generated data files will be exported to this project directory as well.
  2. Download gene expression data from USCS cancer browser. Rename the file as ‘genomicMatrix’ without file extension.
  3. Download the clinical data and put it in the project directory.
  4. Additional annotation files can be put in the project directory as well. See ”Data Processing” part.
  5. Move all the scripts except the “Data_Additional_Processing_XXX.R” into the working directory (not the project directory). For data additional processing, see ”Data Processing” part.
  6. Read “combined_pre_run.r”. Install all the packages needed for NSEA. Then run this script to setup the environment. Note that you should always run this script first before you run any other scripts.

Data Processing

  1. Open ”TCGA_Data_Processing.R“. Define the parameters for gene expression data processing.
  2. You should modify the script to load your clinical data since there is no universal format for clinical data. Please read our script and clinical data as an example.
  3. If you have additional annotations such as subtypes and mutations, you should write a separate script to add these annotations to the “save.data” object. Name it as “Data_Additional_Processing_” plus the cancer code and put it in to the project directory so that the data processing script will automatically call this additional processing script. Read “Data_Additional_Processing_LGG.R” as an example.
  4. Run the data processing script. Two files will be generated. “TCGA_XXX_Data.rda” contains all the data for NSEA pipeline. “TCGA_LGG_Data_par.rda” contains all the parameters.

NSEA Pipeline Open “NSEA_Pipeline2.R”. Define the parameters for NSEA pipeline. Currently the script is for 4-node subnetworks. But you can easily change it to support subnetworks more than 4 nodes. The script will automatically load processed data and save the result into “TCGA_NSEA_XXX_Data.rda”. Note that you don’t need to run ”TCGA_Data_Processing.R“ every time before you run this script. But you must have the data generated by ”TCGA_Data_Processing.R“ in the project directory. For a large network, it may take you several hours to generate all possible subnetworks. So an efficiency test is highly suggested before you formally run this pipeline on your entire network.

Result Analysis

  1. Run “load data” part in “NSEA_Grouping_Plotting3.R”.
  2. For consensus clustering, you can define your own parameters, such as choosing a different clustering algorithm.
  3. For survival analysis, you can choose specific groups for plotting of survival curves.
  4. In the example script, heatmap generation requires the subtype annotation. You can remove it or add other annotations. These annotations are supposed to be loaded by “Data_Additional_Processing_XXX.R”.
  5. The script can also plot all the subnetwork states by their gene expression values.
  6. GSEA script can output a table containing detailed GSEA results and a barplot which visualize the result. User can also define the GSEA parameters including the algorithm and the test statistics. See “topGO” package for more details.

Follow the nsea_complete.rmd under nSEA/codes. html version is also available there.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published