Skip to content

Commit

Permalink
Merge pull request #348 from bmeg/dgidb-filter
Browse files Browse the repository at this point in the history
Remove duplicate associations from DGIdb
  • Loading branch information
adamstruck authored Oct 9, 2019
2 parents f2f7b30 + fc52c3e commit 74bc09f
Show file tree
Hide file tree
Showing 5 changed files with 40 additions and 32 deletions.
20 changes: 10 additions & 10 deletions outputs.bmeg_manifest.dvc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
md5: a49937fe0fba16dd03ef1412485bb952
md5: d63ee84d4ff01e9aee9df8863e26ac5b
cmd: echo generating file manifest...
deps:
- md5: de92fd37fca1bed73a64aa545e422d3a
Expand Down Expand Up @@ -65,19 +65,19 @@ deps:
path: outputs/celllines/Case_SameAs_Case.Edge.json.gz
- md5: 8e8e11059b5cbd3c1ae85573b30dabb0
path: outputs/compound/normalized.Case_Compounds_Compound.Edge.json.gz
- md5: ef8e005a6a56f34904dd4bfd7948355d
- md5: bb25790bff416d59028c2358b69b03c8
path: outputs/compound/normalized.Compound.Vertex.json.gz
- md5: 488eee20c9e4f0c99070eb6dfbb3fdb7
path: outputs/compound/normalized.Compound_Cases_Case.Edge.json.gz
- md5: 65e90f12066d8eba49f2a7c6401573d9
path: outputs/compound/normalized.Compound_DrugResponses_DrugResponse.Edge.json.gz
- md5: 3a44e58080ed69ba3ac95861a9f6fa67
- md5: 710e732ae8a92c48204bd5fd33fbcb7c
path: outputs/compound/normalized.Compound_G2PAssociations_G2PAssociation.Edge.json.gz
- md5: 8fe1e1cb0721e8d8f8efb46e99ed7514
path: outputs/compound/normalized.Compound_Projects_Project.Edge.json.gz
- md5: 9a9fae017188f7f50f06ea922ef085d1
path: outputs/compound/normalized.DrugResponse_Compounds_Compound.Edge.json.gz
- md5: 155190bbebf30deaaa01f11c72d3eb22
- md5: 61bd9cea68e641c69a96269c623c8d73
path: outputs/compound/normalized.G2PAssociation_Compounds_Compound.Edge.json.gz
- md5: 5a0a0aa5f5795c2960172004f2aad976
path: outputs/compound/normalized.Project_Compounds_Compound.Edge.json.gz
Expand Down Expand Up @@ -113,15 +113,15 @@ deps:
path: outputs/ctrp/drug_response.DrugResponse.Vertex.json.gz
- md5: 50804cc666bd19c2d88baf263890abdd
path: outputs/ctrp/drug_response.DrugResponse_Aliquot_Aliquot.Edge.json.gz
- md5: 83f55521629daf5d4e26bcb7ddb330d8
- md5: 3fa57b7c369e2c8dd2dc0d0b6c44766e
path: outputs/dgidb/G2PAssociation.Vertex.json.gz
- md5: 4845214655fae54a07b80dc1afc0a460
- md5: 166177ce31138f0ee32dc339fdfdd987
path: outputs/dgidb/G2PAssociation_Genes_Gene.Edge.json.gz
- md5: 4491baeade2f848aa19eebe5934bb892
- md5: 4f99308481017576d6cc208b6f514eab
path: outputs/dgidb/G2PAssociation_Publications_Publication.Edge.json.gz
- md5: a6875d57805787affd94295283674917
- md5: 97282e6b782ab81889884ba6d9ba368e
path: outputs/dgidb/Gene_G2PAssociations_G2PAssociation.Edge.json.gz
- md5: 28b8818ec139029fb607e6b8eb815999
- md5: 03f13f3513d5282b8a1f309429ae81f0
path: outputs/dgidb/Publication_G2PAssociations_G2PAssociation.Edge.json.gz
- md5: 3f89bd640983b7ea76b5dbbbcd846941
path: outputs/ensembl/Exon.Vertex.json.gz
Expand Down Expand Up @@ -353,7 +353,7 @@ deps:
path: outputs/phenotype/normalized.Phenotype_Samples_Sample.Edge.json.gz
- md5: 2068dedca47d3a14bb26df905023e716
path: outputs/phenotype/normalized.Sample_Phenotypes_Phenotype.Edge.json.gz
- md5: 05da1b15f56e10fb29e885f93609df49
- md5: 6d0f476db147dd592b4b507f68272f0e
path: outputs/publication/stub.Publication.Vertex.json.gz
- md5: 6bdd6ef5f03bc16023a4c59bfcec95db
path: outputs/pubmed/baseline/pubmed19n0001.Publication.Vertex.json.gz
Expand Down
14 changes: 7 additions & 7 deletions outputs/compound/normalized.compounds.dvc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
md5: 312d5c40ae17c4c38621f5e278172bf8
md5: eb52dd28df4a5f37fbafad6164f37584
cmd: python3 transform/compound/transform.py
wdir: ../..
deps:
Expand All @@ -18,7 +18,7 @@ deps:
path: outputs/ccle/drug_response.Compound.Vertex.json.gz
- md5: d0d1f80ac811d6b0bc7b757b02830252
path: outputs/gdsc/drug_response.Compound.Vertex.json.gz
- md5: f3e13d6fdf7ad86bd80e642d65d613d0
- md5: f7a2bbfb7436d76e09b9e13bd3d0c9ff
path: outputs/dgidb/Compound.Vertex.json.gz
- md5: 848566e573744e2303b49f6c0986a413
path: outputs/ccle/drug_response.DrugResponse_Compounds_Compound.Edge.json.gz
Expand Down Expand Up @@ -56,12 +56,12 @@ deps:
path: outputs/gdsc/drug_response.Project_Compounds_Compound.Edge.json.gz
- md5: 991c580d8354294781105155bf773d88
path: outputs/gdsc/drug_response.Compound_Projects_Project.Edge.json.gz
- md5: cfbb384f11ad1d8c089371715caf4348
- md5: 9c4627785660215fa9adbf7c9f6cf172
path: outputs/dgidb/G2PAssociation_Compounds_Compound.Edge.json.gz
- md5: 7a4375e805a71b0b83708ab993fa2071
- md5: fa0ae005deb28af2d935cc818c1394b4
path: outputs/dgidb/Compound_G2PAssociations_G2PAssociation.Edge.json.gz
outs:
- md5: ef8e005a6a56f34904dd4bfd7948355d
- md5: bb25790bff416d59028c2358b69b03c8
path: outputs/compound/normalized.Compound.Vertex.json.gz
cache: true
metric: false
Expand Down Expand Up @@ -96,12 +96,12 @@ outs:
cache: true
metric: false
persist: false
- md5: 155190bbebf30deaaa01f11c72d3eb22
- md5: 61bd9cea68e641c69a96269c623c8d73
path: outputs/compound/normalized.G2PAssociation_Compounds_Compound.Edge.json.gz
cache: true
metric: false
persist: false
- md5: 3a44e58080ed69ba3ac95861a9f6fa67
- md5: 710e732ae8a92c48204bd5fd33fbcb7c
path: outputs/compound/normalized.Compound_G2PAssociations_G2PAssociation.Edge.json.gz
cache: true
metric: false
Expand Down
20 changes: 10 additions & 10 deletions outputs/dgidb/dgidb.dvc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
md5: 927fb62cd04cdd5d1fdd69c3b8961aa7
md5: e720598cc16fd7bb5918bd09f0397589
cmd: python3 transform/dgidb/transform.py
wdir: ../..
deps:
Expand All @@ -8,45 +8,45 @@ deps:
path: source/drug_enricher/drug_alias.tsv
- md5: 64e7a82c87e7151a7c49846469157547
path: src/bmeg/enrichers/drug_enricher.py
- md5: e738a1f112c6a31342d98f6313a09f60
- md5: 936873de1ea7a748293263ff126a2967
path: transform/dgidb/transform.py
outs:
- md5: 83f55521629daf5d4e26bcb7ddb330d8
- md5: 3fa57b7c369e2c8dd2dc0d0b6c44766e
path: outputs/dgidb/G2PAssociation.Vertex.json.gz
cache: true
metric: false
persist: false
- md5: f3e13d6fdf7ad86bd80e642d65d613d0
- md5: f7a2bbfb7436d76e09b9e13bd3d0c9ff
path: outputs/dgidb/Compound.Vertex.json.gz
cache: true
metric: false
persist: false
- md5: 4845214655fae54a07b80dc1afc0a460
- md5: 166177ce31138f0ee32dc339fdfdd987
path: outputs/dgidb/G2PAssociation_Genes_Gene.Edge.json.gz
cache: true
metric: false
persist: false
- md5: 4491baeade2f848aa19eebe5934bb892
- md5: 4f99308481017576d6cc208b6f514eab
path: outputs/dgidb/G2PAssociation_Publications_Publication.Edge.json.gz
cache: true
metric: false
persist: false
- md5: cfbb384f11ad1d8c089371715caf4348
- md5: 9c4627785660215fa9adbf7c9f6cf172
path: outputs/dgidb/G2PAssociation_Compounds_Compound.Edge.json.gz
cache: true
metric: false
persist: false
- md5: 28b8818ec139029fb607e6b8eb815999
- md5: 03f13f3513d5282b8a1f309429ae81f0
path: outputs/dgidb/Publication_G2PAssociations_G2PAssociation.Edge.json.gz
cache: true
metric: false
persist: false
- md5: a6875d57805787affd94295283674917
- md5: 97282e6b782ab81889884ba6d9ba368e
path: outputs/dgidb/Gene_G2PAssociations_G2PAssociation.Edge.json.gz
cache: true
metric: false
persist: false
- md5: 7a4375e805a71b0b83708ab993fa2071
- md5: fa0ae005deb28af2d935cc818c1394b4
path: outputs/dgidb/Compound_G2PAssociations_G2PAssociation.Edge.json.gz
cache: true
metric: false
Expand Down
12 changes: 8 additions & 4 deletions outputs/publication/stub_publications.dvc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
md5: e01759a412af6ddf090ed7a0106d746d
md5: 443f917e8b79b3bc6b071f4a9474cacd
cmd: python3 transform/publication/transform.py
wdir: ../..
deps:
Expand All @@ -8,18 +8,22 @@ deps:
path: outputs/g2p/G2PAssociation_Publications_Publication.Edge.json.gz
- md5: 78e4c7cdbfad334978b576eeaeee2e86
path: outputs/g2p/Publication_G2PAssociations_G2PAssociation.Edge.json.gz
- md5: 4f99308481017576d6cc208b6f514eab
path: outputs/dgidb/G2PAssociation_Publications_Publication.Edge.json.gz
- md5: 03f13f3513d5282b8a1f309429ae81f0
path: outputs/dgidb/Publication_G2PAssociations_G2PAssociation.Edge.json.gz
- md5: 2a69a3d9fe9acd1c5465afc03be20cd2.dir
path: outputs/pubmed/baseline
- md5: 7feb8063bd265b0933cf9d1f7aa7923c
path: outputs/msigdb/Publication_GeneSets_GeneSet.Edge.json.gz
- md5: c7b863f20541385d1c46b46d3a885e96
path: outputs/msigdb/GeneSet_Publications_Publication.Edge.json.gz
- md5: 6381ab2738b55712095a487b26d3785d
- md5: 34ce068f620964c170cc5a297a1f5803
path: outputs/pathway_commons/Publication_Interactions_Interaction.Edge.json.gz
- md5: a0341597a0aa8a96bcce46d8513ed109
- md5: 737bc587e7faef80a008f30b9f310643
path: outputs/pathway_commons/Interaction_Publications_Publication.Edge.json.gz
outs:
- md5: 05da1b15f56e10fb29e885f93609df49
- md5: 6d0f476db147dd592b4b507f68272f0e
path: outputs/publication/stub.Publication.Vertex.json.gz
cache: true
metric: false
Expand Down
6 changes: 5 additions & 1 deletion transform/dgidb/transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,12 @@ def transform(interactions_file="source/dgidb/interactions.tsv",
interactions = read_tsv(interactions_file)
# gene_name gene_claim_name entrez_id interaction_claim_source interaction_types drug_claim_name drug_claim_primary_name drug_name drug_chembl_id PMIDs
for line in interactions:
source = line["interaction_claim_source"]
# remove associations already brought in by VICC G2P
if source in ["CGI", "CIViC", "OncoKB", "CKB"]:
continue
assoc_params = {
"source": line["interaction_claim_source"],
"source": source,
"source_document": json.dumps(line),
"description": "{} {} {}".format(line["drug_chembl_id"], line["interaction_types"], line["gene_name"]),
"evidence_label": None,
Expand Down

0 comments on commit 74bc09f

Please sign in to comment.