-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* initializing README * edit language in README * add relative link to gestalt * update relative links in README * modify README for correct links and instructions * update README instructions * add ignore file * add functionality scripts these scripts are copied over from https://github.com/greenelab/tad_pathways and are only slightly modified * add example files and example results * add conda environment file * add example pipelines * add repo initialization script * add R sessionInfo * update gestalt path in readme * set error to exit bash scripts * add info to preprint * add correct bmd evidence files
- Loading branch information
Showing
37 changed files
with
20,146 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
.Rhistory | ||
data/ | ||
tad_pathways_data.tar.gz |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,155 @@ | ||
# TAD_Pathways | ||
|
||
## Leveraging TADs to identify candidate genes at GWAS signals | ||
|
||
**Gregory P. Way and Casey S. Greene - 2017** | ||
|
||
### Summary | ||
|
||
The repository contains data and instructions to implement a "TAD_Pathways" | ||
analysis for over 300 different trait/disease GWAS or custom SNP lists. | ||
|
||
TAD_Pathways uses the principles of topologically association domains (TADs) to | ||
define where an association signal (typically a GWAS signal) can most likely | ||
impact gene function. We use TAD boundaries as defined by | ||
[Dixon et al. 2012](https://doi.org/10.1038/nature11082) and | ||
[hg19 Gencode genes](ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/) | ||
to identify which genes may be implicated. We then input this list into a | ||
[WebGestalt Pathways Analysis](http://webgestalt.org/) to output | ||
significantly associated pathways implicated by the input TAD-defined geneset. | ||
|
||
For more specific details about the method, refer to our | ||
[preprint](https://doi.org/10.1101/087718 "Determining causal genes from GWAS signals using topologically associating domains"). | ||
|
||
### Setup | ||
|
||
Before you begin, download the necessary TAD based index files and GWAS | ||
curation files and setup python environment: | ||
|
||
```bash | ||
bash initialize.sh | ||
|
||
source activate tad_pathways | ||
``` | ||
|
||
### Examples | ||
|
||
We provide three different examples for a TAD pathways analysis pipeline. To run | ||
each of the analyses: | ||
|
||
```bash | ||
# Example using Bone Mineral Density GWAS | ||
bash example_pipeline_bmd.sh | ||
|
||
# Example using Type 2 Diabetes GWAS | ||
bash example_pipeline_t2d.sh | ||
|
||
# Example using custom input SNPs | ||
bash example_pipeline_custom.sh | ||
``` | ||
|
||
### General Usage | ||
|
||
There are two ways to implement a TAD_Pathways analysis: | ||
|
||
1. GWAS | ||
2. Custom | ||
|
||
#### GWAS | ||
|
||
Browse the `data/gwas_tad_genes/` directory to select a GWAS file. Each file in | ||
this directory is a tab separated text file that includes information regarding | ||
each gene located within a signal TAD. The column `gene_name` is the | ||
comprehensive list of all implicated genes. For complete information on how | ||
these lists were constructed, refer to | ||
https://github.com/greenelab/tad_pathways. | ||
|
||
Input this gene list directly into a | ||
[WebGestalt Pathway Analysis](http://webgestalt.org/) and skip to the | ||
[WebGestalt step](#webgestalt-pathway-analysis). | ||
|
||
#### Custom | ||
|
||
Create a comma separated file where the first row of each column names the list | ||
of snps below in subsequent rows. There can be many columns with variable | ||
length rows. | ||
|
||
E.g.: `custom_example.csv` | ||
|
||
| Group 1 | Group 2 | | ||
| ------- | ------- | | ||
| rs12345 | rs67891 | | ||
| rs19876 | rs54321 | | ||
| ... | ... | | ||
|
||
Then, perform the following steps: | ||
|
||
```bash | ||
# Map custom SNPs to genomic locations | ||
Rscript --vanilla scripts/build_snp_list.R \ | ||
--snp_file "custom_example.csv" \ | ||
--output_file "mapped_results.tsv" | ||
|
||
# Build TAD based genelists for each group | ||
python scripts/build_custom_TAD_genelist.py \ | ||
--snp_data_file "mapped_results.tsv" \ | ||
--output_file "custom_tad_genelist.tsv" | ||
``` | ||
|
||
Skip now to the the [WebGestalt step](#webgestalt-pathway-analysis). | ||
|
||
### WebGestalt Pathway Analysis | ||
|
||
Insert either the GWAS curated genelist or a column from the custom genelist | ||
with the following parameters: | ||
|
||
| Parameter | Input | | ||
| --------- | ----- | | ||
| Select gene ID type | *hsapiens__gene_symbol* | | ||
| Enrichment Analysis | *GO Analysis* | | ||
| GO Slim Classification | *Yes* | | ||
| Reference Set | *hsapiens__genome* | | ||
| Statistical Method | *Hypergeometric* | | ||
| Multiple Test Adjustment | *BH* | | ||
| Significance Level | *Top10* | | ||
| Minimum Number of Genes for a Category | *4* | | ||
|
||
Once the analysis is complete, click `Export TSV Only` and save the file as | ||
`gestalt/<INSERT_TRAIT_HERE>_gestalt.tsv`. | ||
|
||
### Curation | ||
|
||
Clean and tidy the output files and summarize into convenient lists of | ||
candidate genes. These genes may or may not be the nearest gene to the GWAS | ||
signal and will require experimental validation. | ||
|
||
```bash | ||
# An example for Bone Mineral Density (see `example_pipeline_bmd.sh` as well) | ||
|
||
# Process WebGestalt Output saved in `data/gestalt/bmd_gestalt.tsv` | ||
python scripts/parse_gestalt.py --trait 'bmd' --process | ||
|
||
# Output evidence tables | ||
python scripts/construct_evidence.py \ | ||
--trait 'bmd' \ | ||
--genelist 'data/gwas_catalog/Bone_mineral_density_hg19.tsv' \ | ||
--pathway 'skeletal system development' | ||
|
||
# Summarize evidence | ||
python scripts/assign_evidence_to_TADs.py \ | ||
--evidence 'results/bmd_gene_evidence.csv' \ | ||
--snps 'data/gwas_tad_genes/Bone_mineral_density_hg19_SNPs.tsv' \ | ||
--output_file 'results/BMD_evidence_summary.tsv' | ||
|
||
# Output venn diagram | ||
R --no-save --args 'results/bmd_gene_evidence.csv' \ | ||
'BMD' < scripts/integrative_summary.R | ||
``` | ||
|
||
### Contact | ||
|
||
For all questions and bug reporting please file a | ||
[GitHub issue](https://github.com/greenelab/tad_pathways/issues) | ||
|
||
For all other questions contact Casey Greene at [email protected] or | ||
Struan Grant at [email protected] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
prostate_cancer | ||
rs10009409 | ||
rs1016343 | ||
rs10187424 | ||
rs103294 | ||
rs1041449 | ||
rs10486567 | ||
rs10875943 | ||
rs10896449 | ||
rs10934853 | ||
rs10936632 | ||
rs10993994 | ||
rs11135910 | ||
rs11214775 | ||
rs11228565 | ||
rs115306967 | ||
rs115457135 | ||
rs11568818 | ||
rs11650494 | ||
rs11902236 | ||
rs12051443 | ||
rs12155172 | ||
rs1218582 | ||
rs12480328 | ||
rs12500426 | ||
rs12621278 | ||
rs12653946 | ||
rs1270884 | ||
rs130067 | ||
rs1327301 | ||
rs13385191 | ||
rs1447295 | ||
rs1456315 | ||
rs1465618 | ||
rs1512268 | ||
rs16901979 | ||
rs17021918 | ||
rs17181170 | ||
rs17599629 | ||
rs17694493 | ||
rs1775148 | ||
rs1859962 | ||
rs188140481 | ||
rs1894292 | ||
rs1933488 | ||
rs1983891 | ||
rs2121875 | ||
rs2238776 | ||
rs2242652 | ||
rs2273669 | ||
rs2405942 | ||
rs2427345 | ||
rs2660753 | ||
rs2735839 | ||
rs2807031 | ||
rs3096702 | ||
rs3123078 | ||
rs339331 | ||
rs3771570 | ||
rs3850699 | ||
rs4242382 | ||
rs4245739 | ||
rs4430796 | ||
rs4713266 | ||
rs4844289 | ||
rs4962416 | ||
rs56232506 | ||
rs5759167 | ||
rs5919432 | ||
rs5945572 | ||
rs6062509 | ||
rs636291 | ||
rs6465657 | ||
rs651164 | ||
rs6545977 | ||
rs6625711 | ||
rs6763931 | ||
rs684232 | ||
rs6869841 | ||
rs6983267 | ||
rs7127900 | ||
rs7130881 | ||
rs7141529 | ||
rs7153648 | ||
rs7210100 | ||
rs721048 | ||
rs7241993 | ||
rs7501939 | ||
rs7584330 | ||
rs7611694 | ||
rs76934034 | ||
rs7837688 | ||
rs7931342 | ||
rs8008270 | ||
rs80130819 | ||
rs8014671 | ||
rs8102476 | ||
rs817826 | ||
rs9284813 | ||
rs9287719 | ||
rs9364554 | ||
rs9443189 | ||
rs9600079 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
name: tad_pathways | ||
dependencies: | ||
- python=3.5.2 | ||
- pandas=0.18.0 | ||
- numexpr=2.5.2 | ||
- numpy=1.11.1 | ||
- scipy=0.17.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
#!/bin/bash | ||
|
||
set -o errexit | ||
|
||
# Example of a TAD_Pathways Analysis applied to Bone Mineral Density GWAS | ||
|
||
# After saving WebGestalt tsv file, parse its contents | ||
python scripts/parse_gestalt.py --trait 'bmd' | ||
|
||
# Construct an evidence file - Nearest gene to gwas or not | ||
python scripts/construct_evidence.py \ | ||
--trait 'bmd'\ | ||
--gwas 'data/gwas_catalog/Bone_mineral_density_hg19.tsv'\ | ||
--pathway 'skeletal system development' | ||
|
||
# Summarize the evidence file | ||
python scripts/summarize_evidence.py \ | ||
--evidence 'results/bmd_gene_evidence.csv' \ | ||
--snps 'data/gwas_tad_snps/Bone_mineral_density_hg19_SNPs.tsv' \ | ||
--output_file 'results/bmd_gene_evidence_summary.tsv' | ||
|
||
# Visualize overlap in TAD pathways curation | ||
R --no-save --args 'results/bmd_gene_evidence.csv' \ | ||
< scripts/integrative_summary.R |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
#!/bin/bash | ||
|
||
set -o errexit | ||
|
||
# Example of a TAD_Pathways Analysis applied to a Custom SNP list | ||
# For this example, the custom SNP list is the GWAS findings for | ||
# Prostate Cancer. The the data is used as a custom input. | ||
|
||
# Map SNPs to genomic location | ||
Rscript --vanilla scripts/build_snp_list.R \ | ||
--snp_file 'custom_example.csv' \ | ||
--output_file 'results/custom_example_location.tsv' | ||
|
||
# Build a customized genelist to input into WebGestalt | ||
python scripts/build_custom_tad_genelist.py \ | ||
--snp_data_file 'results/custom_example_location.tsv' \ | ||
--output_file 'results/custom_example_tad_results.tsv' | ||
|
||
# After saving WebGestalt tsv file, parse its contents | ||
python scripts/parse_gestalt.py --trait 'custom' | ||
|
||
# Construct an evidence file - Nearest gene to gwas or not | ||
python scripts/construct_evidence.py \ | ||
--trait 'custom'\ | ||
--gwas 'results/custom_example_tad_results_nearest_gene.tsv'\ | ||
--pathway 'epidermis development,antigen processing and presentation' | ||
|
||
# Summarize the evidence file | ||
python scripts/summarize_evidence.py \ | ||
--evidence 'results/custom_gene_evidence.csv' \ | ||
--snps 'results/custom_example_tad_results.tsv' \ | ||
--output_file 'results/custom_gene_evidence_summary.tsv' | ||
|
||
# Visualize overlap in TAD pathways curation | ||
R --no-save --args 'results/custom_gene_evidence.csv' \ | ||
< scripts/integrative_summary.R | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
#!/bin/bash | ||
|
||
set -o errexit | ||
|
||
# Example of a TAD_Pathways Analysis applied to Type 2 Diabetes GWAS | ||
|
||
# After saving WebGestalt tsv file, parse its contents | ||
python scripts/parse_gestalt.py --trait 't2d' | ||
|
||
# Construct an evidence file - Nearest gene to gwas or not | ||
python scripts/construct_evidence.py \ | ||
--trait 't2d'\ | ||
--gwas 'data/gwas_catalog/Type_2_diabetes_hg19.tsv'\ | ||
--pathway 'peptide hormone secretion' | ||
|
||
# Summarize the evidence file | ||
python scripts/summarize_evidence.py \ | ||
--evidence 'results/t2d_gene_evidence.csv' \ | ||
--snps 'data/gwas_tad_snps/Type_2_diabetes_hg19_SNPs.tsv' \ | ||
--output_file 'results/t2d_gene_evidence_summary.tsv' | ||
|
||
# Visualize overlap in TAD pathways curation | ||
R --no-save --args 'results/t2d_gene_evidence.csv' \ | ||
< scripts/integrative_summary.R |
Oops, something went wrong.