Skip to content

Commit

Permalink
modified: HISTORY.md
Browse files Browse the repository at this point in the history
	modified:   README.md
	new file:   cre.gnomad_scores.R
	modified:   cre.hgmd2csv.sql
	new file:   data/gnomad_scores.csv
  • Loading branch information
naumenko-sa committed Nov 8, 2018
1 parent c968e6c commit eb2169b
Show file tree
Hide file tree
Showing 5 changed files with 19,692 additions and 3 deletions.
3 changes: 2 additions & 1 deletion HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
- 2017-11-02: added back Gerp_score, updated OMIM
- 2018-11-07: new gnomad obseverved/expected scores instead of pLi and exac_missense. cre.gnomad_scores.R
- 2018-11-02: added back Gerp_score, updated OMIM
- 2017-09-22: added Info_refseq and Maf_exac to the database report
- 2017-09-14: improved cre.database.sh: it creates databases for cre.R
- 2017-07-13: added cre.package.sh. It packages reports to send.
Expand Down
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Excel variant report generator and scripts to process WES data (cram/bam/fastq -
```
Orphanet provides descriptions for ~3600 genes:. By default CRE uses [orphanet.txt](../master/data/orphanet.txt)

6. (Optional) Install Gene-level Exac scores.
6. (Optional) Update Gnomad gene contraint scores.

By default using [~/cre/data/exac_scores.txt](../master/data/exac_scores.txt)

Expand All @@ -42,7 +42,6 @@ Excel variant report generator and scripts to process WES data (cram/bam/fastq -
8. (Optional) Install HGMD pro database
Install HGMD pro and dump information with [~/cre/cre.hgmd2csv.sql](../master/cre.hgmd2csv.sql).


# 1. Creating bcbio project - grch37

* Prepare input files: family_sample_1.fq.gz, family_sample_2.fq.gz, or family_sample.bam and place them into family/input folder.
Expand Down
43 changes: 43 additions & 0 deletions cre.gnomad_scores.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# get gnomad gene constraint scores
# - Gnomad_oe_lof_score
# - Gnomad_oe_mis_score
# output: gnomad_scores.csv
# run: Rscript ~/cre/cre.gnomad_scores.R
# https://macarthurlab.org/2018/10/17/gnomad-v2-1/

# install.packages("R.utils")
# bash:
# cd
# git clone https://github.com/naumenko-sa/bioscripts

source("~/bioscripts/genes.R")
library("R.utils")

gnomad_scores_url = "https://storage.googleapis.com/gnomad-public/release/2.1/ht/constraint/constraint.txt.bgz"
download.file(gnomad_scores_url,"gnomad_scores.txt.bgz")
gunzip("gnomad_scores.txt.bgz","gnomad_scores.txt")
gnomad_scores = read.delim("gnomad_scores.txt", stringsAsFactors=F)
gnomad_scores = gnomad_scores[,c("gene","transcript","canonical","oe_lof","oe_mis")]
gnomad_scores = gnomad_scores[gnomad_scores$canonical == "true",]

#still has a few duplicates
#gnomad_scores[duplicated(gnomad_scores$gene),]
gnomad_scores = gnomad_scores[!duplicated(gnomad_scores$gene),]

mart = init_mart_human()
get_protein_coding_genes(mart)

genes_transcripts = read.csv("genes.transcripts.csv", stringsAsFactors = F)
gnomad_scores = merge(gnomad_scores,genes_transcripts,by.x="transcript",by.y="Ensembl_transcript_id",all.x=T,all.y=F)

#some genes are absent in grch37
#gnomad_scores[is.na(gnomad_scores$Ensembl_gene_id),]

gnomad_scores = gnomad_scores[!is.na(gnomad_scores$Ensembl_gene_id),]
gnomad_scores = gnomad_scores[,c("Ensembl_gene_id","oe_lof","oe_mis")]

colnames(gnomad_scores) = c("Ensembl_gene_id","Gnomad_oe_lof_score","Gnomad_oe_mis_score")
write.csv(gnomad_scores,"gnomad_scores.csv",row.names = F)

file.remove("genes.transcripts.csv")
file.remove("gnomad_scores.txt")
1 change: 1 addition & 0 deletions cre.hgmd2csv.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
-- mysql -u root -p hgmd_pro < ~/cre/cre.hgmd2csv.sql
select
v.chrom, v.pos, v.id, v.ref, v.alt,
a.gene, a.tag, a.author, a.allname, a.vol, a.page, a.year, a.pmid,a.dbsnp
Expand Down
Loading

0 comments on commit eb2169b

Please sign in to comment.