- Paper
- ID: 208
- Chaoqi Yang, Cao Xiao, Fenglong Ma, Lucas Glass, and Jimeng Sun. 2021. SafeDrug: Dual Molecular Graph Encoders for Recommending Effective and Safe Drug Combinations. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21). International Joint Conferences on Artificial Intelligence Organization, 3735-3741. DOI: https://doi.org/10.24963/ijcai.2021/514
- Original Code Repository: https://github.com/ycq091044/SafeDrug
- Group
- ID: 24
- Members:
- Anushree Dhople ([email protected])
- Gazi Muhammad Samiul Hoque ([email protected])
data/
processing.py
: The data preprocessing file.input/
PRESCRIPTIONS.csv
: the prescription file from MIMIC-III raw datasetDIAGNOSES_ICD.csv
: the diagnosis file from MIMIC-III raw datasetPROCEDURES_ICD.csv
: the procedure file from MIMIC-III raw datasetRXCUI2atc4.csv
: this is a NDC-RXCUI-ATC4 mapping file, and we only need the RXCUI to ATC4 mapping. This file is obtained from https://github.com/ycq091044/SafeDrug.drug-atc.csv
: this is a CID-ATC file, which gives the mapping from CID code to detailed ATC code (we will use the prefix of the ATC code latter for aggregation). This file is obtained from https://github.com/ycq091044/SafeDrug.rxnorm2RXCUI.txt
: rxnorm to RXCUI mapping file. This file is obtained from https://github.com/ycq091044/SafeDrug.drugbank_drugs_info.csv
: drug information table downloaded from drugbank here https://www.dropbox.com/s/angoirabxurjljh/drugbank_drugs_info.csv?dl=0, which is used to map drug name to drug SMILES string.drug-DDI.csv
: this a large file, containing the drug DDI information, coded by CID. The file could be downloaded from https://drive.google.com/file/d/1mnPc0O0ztz0fkv3HF-dpmBb8PLWsEoDz/view?usp=sharing
output/
atc3toSMILES.pkl
: drug ID (we use ATC-3 level code to represent drug ID) to drug SMILES string dictddi_A_final.pkl
: ddi adjacency matrixddi_matrix_H.pkl
: H mask structure (This file is created by ddi_mask_H.py)ehr_adj_final.pkl
: used in GAMENet baseline (if two drugs appear in one set, then they are connected)records_final.pkl
: The final diagnosis-procedure-medication EHR records of each patient, used for train/val/test split.voc_final.pkl
: diag/prod/med index to code dictionary
src/
SafeDrug.py
: our model- baseline models:
GAMENet.py
DMNC.py
Leap.py
Retain.py
ECC.py
LR.py
- setting file
model.py
util.py
layer.py
- analysis file
Result-Analysis.ipynb
dependency.sh
requirements.txt
README.md
After the processing have been done, we get the following statistics:
# patients 6350
# clinical events 15032
# diagnosis 1958
# med 112
# procedure 1430
# avg of diagnoses 10.5089143161256
# avg of medicines 11.647751463544438
# avg of procedures 3.8436668440659925
# avg of vists 2.367244094488189
# max of diagnoses 128
# max of medicines 64
# max of procedures 50
# max of visit 29
-
First, install the
rdkit
(RDKit: Open-Source Cheminformatics Software) conda environmentconda create -c conda-forge -n SafeDrug rdkit conda activate SafeDrug
-
Clone this repository in your preferred location. We assume that you clone it in your home directory.
cd ~ git clone [email protected]:samhq/cs598dl4h-project.git
-
In
SafeDrug
environment, run the following commands to install required python packages (according to your GPU support)cd ~/cs598dl4h-project # if you don't have GPU ./dependency.sh # if you have GPU ./dependency.sh 1
-
Go to https://physionet.org/content/mimiciii/1.4/ to download the MIMIC-III dataset (You may need to get the certificate)
wget -r -N -c -np --user [account] --ask-password https://physionet.org/files/mimiciii/1.4/
-
Go into the folder and unzip required three files and copy them to the
~/cs598dl4h-project/data/input/
foldercd ~/physionet.org/files/mimiciii/1.4 gzip -d PROCEDURES_ICD.csv.gz # procedure information gzip -d PRESCRIPTIONS.csv.gz # prescription information gzip -d DIAGNOSES_ICD.csv.gz # diagnosis information cp PROCEDURES_ICD.csv PRESCRIPTIONS.csv DIAGNOSES_ICD.csv ~/cs598dl4h-project/data/input/
-
Download additional files in the
~/cs598dl4h-project/data/input/
foldercd ~/cs598dl4h-project/data/input/ ./get_additional_files.sh
-
Processing the data to get a complete
records_final.pkl
cd ~/cs598dl4h-project/data python processing.py
To run the SafeDrug
model, run the following:
cd ~/cs598dl4h-project/src
python SafeDrug.py
here is the argument:
usage: SafeDrug.py [-h] [--Test] [--model_name=MODEL_NAME]
[--resume_path=RESUME_PATH] [--lr=LR]
[--target_ddi=TARGET_DDI] [--kp=KP] [--dim=DIM]
[--epoch=EPOCH]
optional arguments:
-h, --help show this help message and exit
--Test test mode
--model_name MODEL_NAME model name
--resume_path RESUME_PATH resume path
--lr LR learning rate
--target_ddi TARGET_DDI target ddi
--kp KP coefficient of P signal
--dim DIM dimension
--epoch EPOCH how many epoch
If you want to run all models consecutively, then run:
cd ~/cs598dl4h-project/src
./run_models.sh [NUMBER_OF_EPOCHS]
Please check the Jupyter Notebook here.
Model | DDI | Jaccard | F1-score | PRAUC | Avg. # of Drugs |
---|---|---|---|---|---|
LR | 0.0775 | 0.4900 | 0.6470 | 0.7553 | - |
ECC | 0.0806 | 0.4868 | 0.6428 | 0.7602 | - |
RETAIN | 0.0851 ± 0.0028 | 0.4711 ± 0.0140 | 0.6337 ± 0.0129 | 0.7512 ± 0.0126 | 17.9925 ± 0.8751 |
LEAP | 0.0689 ± 0.0028 | 0.4369 ± 0.0117 | 0.6002 ± 0.0116 | 0.6467 ± 0.0068 | 19.1096 ± 0.1240 |
GAMENet | 0.0836 ± 0.0067 | 0.4790 ± 0.0260 | 0.6382 ± 0.0240 | 0.7393 ± 0.0247 | 25.1478 ± 1.1325 |
SafeDrug | 0.0627 ± 0.0023 | 0.5051 ± 0.0150 | 0.6624 ± 0.0134 | 0.7604 ± 0.0117 | 19.3245 ± 0.5557 |
SafeDrug* | 0.0589 ± 0.0005 | 0.5213 ± 0.0030 | 0.6768 ± 0.0027 | 0.7647 ± 0.0025 | 19.9178 ± 0.1604 |
- values from the original SafeDrug model paper
Further analysis can be found at the Jupyter Notebook here.
Our work followed the original codes at https://github.com/ycq091044/SafeDrug.