Skip to content

Commit

Permalink
Repository Initialization (#1)
Browse files Browse the repository at this point in the history
* initializing README

* edit language in README

* add relative link to gestalt

* update relative links in README

* modify README for correct links and instructions

* update README instructions

* add ignore file

* add functionality scripts

these scripts are copied over from https://github.com/greenelab/tad_pathways and are only slightly modified

* add example files and example results

* add conda environment file

* add example pipelines

* add repo initialization script

* add R sessionInfo

* update gestalt path in readme

* set error to exit bash scripts

* add info to preprint

* add correct bmd evidence files
  • Loading branch information
gwaybio authored Jan 6, 2017
1 parent 86a678e commit e69df96
Show file tree
Hide file tree
Showing 37 changed files with 20,146 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
.Rhistory
data/
tad_pathways_data.tar.gz
155 changes: 155 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# TAD_Pathways

## Leveraging TADs to identify candidate genes at GWAS signals

**Gregory P. Way and Casey S. Greene - 2017**

### Summary

The repository contains data and instructions to implement a "TAD_Pathways"
analysis for over 300 different trait/disease GWAS or custom SNP lists.

TAD_Pathways uses the principles of topologically association domains (TADs) to
define where an association signal (typically a GWAS signal) can most likely
impact gene function. We use TAD boundaries as defined by
[Dixon et al. 2012](https://doi.org/10.1038/nature11082) and
[hg19 Gencode genes](ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/)
to identify which genes may be implicated. We then input this list into a
[WebGestalt Pathways Analysis](http://webgestalt.org/) to output
significantly associated pathways implicated by the input TAD-defined geneset.

For more specific details about the method, refer to our
[preprint](https://doi.org/10.1101/087718 "Determining causal genes from GWAS signals using topologically associating domains").

### Setup

Before you begin, download the necessary TAD based index files and GWAS
curation files and setup python environment:

```bash
bash initialize.sh

source activate tad_pathways
```

### Examples

We provide three different examples for a TAD pathways analysis pipeline. To run
each of the analyses:

```bash
# Example using Bone Mineral Density GWAS
bash example_pipeline_bmd.sh

# Example using Type 2 Diabetes GWAS
bash example_pipeline_t2d.sh

# Example using custom input SNPs
bash example_pipeline_custom.sh
```

### General Usage

There are two ways to implement a TAD_Pathways analysis:

1. GWAS
2. Custom

#### GWAS

Browse the `data/gwas_tad_genes/` directory to select a GWAS file. Each file in
this directory is a tab separated text file that includes information regarding
each gene located within a signal TAD. The column `gene_name` is the
comprehensive list of all implicated genes. For complete information on how
these lists were constructed, refer to
https://github.com/greenelab/tad_pathways.

Input this gene list directly into a
[WebGestalt Pathway Analysis](http://webgestalt.org/) and skip to the
[WebGestalt step](#webgestalt-pathway-analysis).

#### Custom

Create a comma separated file where the first row of each column names the list
of snps below in subsequent rows. There can be many columns with variable
length rows.

E.g.: `custom_example.csv`

| Group 1 | Group 2 |
| ------- | ------- |
| rs12345 | rs67891 |
| rs19876 | rs54321 |
| ... | ... |

Then, perform the following steps:

```bash
# Map custom SNPs to genomic locations
Rscript --vanilla scripts/build_snp_list.R \
--snp_file "custom_example.csv" \
--output_file "mapped_results.tsv"

# Build TAD based genelists for each group
python scripts/build_custom_TAD_genelist.py \
--snp_data_file "mapped_results.tsv" \
--output_file "custom_tad_genelist.tsv"
```

Skip now to the the [WebGestalt step](#webgestalt-pathway-analysis).

### WebGestalt Pathway Analysis

Insert either the GWAS curated genelist or a column from the custom genelist
with the following parameters:

| Parameter | Input |
| --------- | ----- |
| Select gene ID type | *hsapiens__gene_symbol* |
| Enrichment Analysis | *GO Analysis* |
| GO Slim Classification | *Yes* |
| Reference Set | *hsapiens__genome* |
| Statistical Method | *Hypergeometric* |
| Multiple Test Adjustment | *BH* |
| Significance Level | *Top10* |
| Minimum Number of Genes for a Category | *4* |

Once the analysis is complete, click `Export TSV Only` and save the file as
`gestalt/<INSERT_TRAIT_HERE>_gestalt.tsv`.

### Curation

Clean and tidy the output files and summarize into convenient lists of
candidate genes. These genes may or may not be the nearest gene to the GWAS
signal and will require experimental validation.

```bash
# An example for Bone Mineral Density (see `example_pipeline_bmd.sh` as well)

# Process WebGestalt Output saved in `data/gestalt/bmd_gestalt.tsv`
python scripts/parse_gestalt.py --trait 'bmd' --process

# Output evidence tables
python scripts/construct_evidence.py \
--trait 'bmd' \
--genelist 'data/gwas_catalog/Bone_mineral_density_hg19.tsv' \
--pathway 'skeletal system development'

# Summarize evidence
python scripts/assign_evidence_to_TADs.py \
--evidence 'results/bmd_gene_evidence.csv' \
--snps 'data/gwas_tad_genes/Bone_mineral_density_hg19_SNPs.tsv' \
--output_file 'results/BMD_evidence_summary.tsv'

# Output venn diagram
R --no-save --args 'results/bmd_gene_evidence.csv' \
'BMD' < scripts/integrative_summary.R
```

### Contact

For all questions and bug reporting please file a
[GitHub issue](https://github.com/greenelab/tad_pathways/issues)

For all other questions contact Casey Greene at [email protected] or
Struan Grant at [email protected]
103 changes: 103 additions & 0 deletions custom_example.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
prostate_cancer
rs10009409
rs1016343
rs10187424
rs103294
rs1041449
rs10486567
rs10875943
rs10896449
rs10934853
rs10936632
rs10993994
rs11135910
rs11214775
rs11228565
rs115306967
rs115457135
rs11568818
rs11650494
rs11902236
rs12051443
rs12155172
rs1218582
rs12480328
rs12500426
rs12621278
rs12653946
rs1270884
rs130067
rs1327301
rs13385191
rs1447295
rs1456315
rs1465618
rs1512268
rs16901979
rs17021918
rs17181170
rs17599629
rs17694493
rs1775148
rs1859962
rs188140481
rs1894292
rs1933488
rs1983891
rs2121875
rs2238776
rs2242652
rs2273669
rs2405942
rs2427345
rs2660753
rs2735839
rs2807031
rs3096702
rs3123078
rs339331
rs3771570
rs3850699
rs4242382
rs4245739
rs4430796
rs4713266
rs4844289
rs4962416
rs56232506
rs5759167
rs5919432
rs5945572
rs6062509
rs636291
rs6465657
rs651164
rs6545977
rs6625711
rs6763931
rs684232
rs6869841
rs6983267
rs7127900
rs7130881
rs7141529
rs7153648
rs7210100
rs721048
rs7241993
rs7501939
rs7584330
rs7611694
rs76934034
rs7837688
rs7931342
rs8008270
rs80130819
rs8014671
rs8102476
rs817826
rs9284813
rs9287719
rs9364554
rs9443189
rs9600079
7 changes: 7 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
name: tad_pathways
dependencies:
- python=3.5.2
- pandas=0.18.0
- numexpr=2.5.2
- numpy=1.11.1
- scipy=0.17.1
24 changes: 24 additions & 0 deletions example_pipeline_bmd.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/bin/bash

set -o errexit

# Example of a TAD_Pathways Analysis applied to Bone Mineral Density GWAS

# After saving WebGestalt tsv file, parse its contents
python scripts/parse_gestalt.py --trait 'bmd'

# Construct an evidence file - Nearest gene to gwas or not
python scripts/construct_evidence.py \
--trait 'bmd'\
--gwas 'data/gwas_catalog/Bone_mineral_density_hg19.tsv'\
--pathway 'skeletal system development'

# Summarize the evidence file
python scripts/summarize_evidence.py \
--evidence 'results/bmd_gene_evidence.csv' \
--snps 'data/gwas_tad_snps/Bone_mineral_density_hg19_SNPs.tsv' \
--output_file 'results/bmd_gene_evidence_summary.tsv'

# Visualize overlap in TAD pathways curation
R --no-save --args 'results/bmd_gene_evidence.csv' \
< scripts/integrative_summary.R
37 changes: 37 additions & 0 deletions example_pipeline_custom.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#!/bin/bash

set -o errexit

# Example of a TAD_Pathways Analysis applied to a Custom SNP list
# For this example, the custom SNP list is the GWAS findings for
# Prostate Cancer. The the data is used as a custom input.

# Map SNPs to genomic location
Rscript --vanilla scripts/build_snp_list.R \
--snp_file 'custom_example.csv' \
--output_file 'results/custom_example_location.tsv'

# Build a customized genelist to input into WebGestalt
python scripts/build_custom_tad_genelist.py \
--snp_data_file 'results/custom_example_location.tsv' \
--output_file 'results/custom_example_tad_results.tsv'

# After saving WebGestalt tsv file, parse its contents
python scripts/parse_gestalt.py --trait 'custom'

# Construct an evidence file - Nearest gene to gwas or not
python scripts/construct_evidence.py \
--trait 'custom'\
--gwas 'results/custom_example_tad_results_nearest_gene.tsv'\
--pathway 'epidermis development,antigen processing and presentation'

# Summarize the evidence file
python scripts/summarize_evidence.py \
--evidence 'results/custom_gene_evidence.csv' \
--snps 'results/custom_example_tad_results.tsv' \
--output_file 'results/custom_gene_evidence_summary.tsv'

# Visualize overlap in TAD pathways curation
R --no-save --args 'results/custom_gene_evidence.csv' \
< scripts/integrative_summary.R

24 changes: 24 additions & 0 deletions example_pipeline_t2d.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/bin/bash

set -o errexit

# Example of a TAD_Pathways Analysis applied to Type 2 Diabetes GWAS

# After saving WebGestalt tsv file, parse its contents
python scripts/parse_gestalt.py --trait 't2d'

# Construct an evidence file - Nearest gene to gwas or not
python scripts/construct_evidence.py \
--trait 't2d'\
--gwas 'data/gwas_catalog/Type_2_diabetes_hg19.tsv'\
--pathway 'peptide hormone secretion'

# Summarize the evidence file
python scripts/summarize_evidence.py \
--evidence 'results/t2d_gene_evidence.csv' \
--snps 'data/gwas_tad_snps/Type_2_diabetes_hg19_SNPs.tsv' \
--output_file 'results/t2d_gene_evidence_summary.tsv'

# Visualize overlap in TAD pathways curation
R --no-save --args 'results/t2d_gene_evidence.csv' \
< scripts/integrative_summary.R
Loading

0 comments on commit e69df96

Please sign in to comment.