Reproducing `SafeDrug` For CS598: Deep Learning for Healthcare (Spring '22)

Paper and Group Details

Paper
- ID: 208
- Chaoqi Yang, Cao Xiao, Fenglong Ma, Lucas Glass, and Jimeng Sun. 2021. SafeDrug: Dual Molecular Graph Encoders for Recommending Effective and Safe Drug Combinations. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21). International Joint Conferences on Artificial Intelligence Organization, 3735-3741. DOI: https://doi.org/10.24963/ijcai.2021/514
- Original Code Repository: https://github.com/ycq091044/SafeDrug
Group
- ID: 24
- Members:
  - Anushree Dhople ([email protected])
  - Gazi Muhammad Samiul Hoque ([email protected])

Project Structure

data/
- processing.py: The data preprocessing file.
- input/
  - PRESCRIPTIONS.csv: the prescription file from MIMIC-III raw dataset
  - DIAGNOSES_ICD.csv: the diagnosis file from MIMIC-III raw dataset
  - PROCEDURES_ICD.csv: the procedure file from MIMIC-III raw dataset
  - RXCUI2atc4.csv: this is a NDC-RXCUI-ATC4 mapping file, and we only need the RXCUI to ATC4 mapping. This file is obtained from https://github.com/ycq091044/SafeDrug.
  - drug-atc.csv: this is a CID-ATC file, which gives the mapping from CID code to detailed ATC code (we will use the prefix of the ATC code latter for aggregation). This file is obtained from https://github.com/ycq091044/SafeDrug.
  - rxnorm2RXCUI.txt: rxnorm to RXCUI mapping file. This file is obtained from https://github.com/ycq091044/SafeDrug.
  - drugbank_drugs_info.csv: drug information table downloaded from drugbank here https://www.dropbox.com/s/angoirabxurjljh/drugbank_drugs_info.csv?dl=0, which is used to map drug name to drug SMILES string.
  - drug-DDI.csv: this a large file, containing the drug DDI information, coded by CID. The file could be downloaded from https://drive.google.com/file/d/1mnPc0O0ztz0fkv3HF-dpmBb8PLWsEoDz/view?usp=sharing
- output/
  - atc3toSMILES.pkl: drug ID (we use ATC-3 level code to represent drug ID) to drug SMILES string dict
  - ddi_A_final.pkl: ddi adjacency matrix
  - ddi_matrix_H.pkl: H mask structure (This file is created by ddi_mask_H.py)
  - ehr_adj_final.pkl: used in GAMENet baseline (if two drugs appear in one set, then they are connected)
  - records_final.pkl: The final diagnosis-procedure-medication EHR records of each patient, used for train/val/test split.
  - voc_final.pkl: diag/prod/med index to code dictionary
src/
- SafeDrug.py: our model
- baseline models:
  - GAMENet.py
  - DMNC.py
  - Leap.py
  - Retain.py
  - ECC.py
  - LR.py
- setting file
  - model.py
  - util.py
  - layer.py
- analysis file
  - Result-Analysis.ipynb
dependency.sh
requirements.txt
README.md

After the processing have been done, we get the following statistics:

# patients  6350
# clinical events  15032
# diagnosis  1958
# med  112
# procedure 1430
# avg of diagnoses  10.5089143161256
# avg of medicines  11.647751463544438
# avg of procedures  3.8436668440659925
# avg of vists  2.367244094488189
# max of diagnoses  128
# max of medicines  64
# max of procedures  50
# max of visit  29

Execution

Step 1: Environment Setup

First, install the rdkit (RDKit: Open-Source Cheminformatics Software) conda environment
```
conda create -c conda-forge -n SafeDrug rdkit
conda activate SafeDrug
```
Clone this repository in your preferred location. We assume that you clone it in your home directory.
```
cd ~
git clone [email protected]:samhq/cs598dl4h-project.git 
```
In SafeDrug environment, run the following commands to install required python packages (according to your GPU support)
```
cd ~/cs598dl4h-project

# if you don't have GPU
./dependency.sh

# if you have GPU
./dependency.sh 1
```

Step 2: Obtaining Data and Processing

Go to https://physionet.org/content/mimiciii/1.4/ to download the MIMIC-III dataset (You may need to get the certificate)
```
wget -r -N -c -np --user [account] --ask-password https://physionet.org/files/mimiciii/1.4/
```

Go into the folder and unzip required three files and copy them to the ~/cs598dl4h-project/data/input/ folder

cd ~/physionet.org/files/mimiciii/1.4
gzip -d PROCEDURES_ICD.csv.gz # procedure information
gzip -d PRESCRIPTIONS.csv.gz  # prescription information
gzip -d DIAGNOSES_ICD.csv.gz  # diagnosis information
cp PROCEDURES_ICD.csv PRESCRIPTIONS.csv DIAGNOSES_ICD.csv ~/cs598dl4h-project/data/input/

Download additional files in the ~/cs598dl4h-project/data/input/ folder
```
cd ~/cs598dl4h-project/data/input/
./get_additional_files.sh
```
Processing the data to get a complete records_final.pkl
```
cd ~/cs598dl4h-project/data
python processing.py
```

Step 3: Run Model(s)

To run the SafeDrug model, run the following:

cd ~/cs598dl4h-project/src
python SafeDrug.py

here is the argument:

usage: SafeDrug.py [-h] [--Test] [--model_name=MODEL_NAME]
               [--resume_path=RESUME_PATH] [--lr=LR]
               [--target_ddi=TARGET_DDI] [--kp=KP] [--dim=DIM]
               [--epoch=EPOCH]

optional arguments:
  -h, --help                  show this help message and exit
  --Test                      test mode
  --model_name MODEL_NAME     model name
  --resume_path RESUME_PATH   resume path
  --lr LR                     learning rate
  --target_ddi TARGET_DDI     target ddi
  --kp KP                     coefficient of P signal
  --dim DIM                   dimension
  --epoch EPOCH               how many epoch

If you want to run all models consecutively, then run:

cd ~/cs598dl4h-project/src
./run_models.sh [NUMBER_OF_EPOCHS]

Step 4: Analysis of the results

Please check the Jupyter Notebook here.

Results

Model	DDI	Jaccard	F1-score	PRAUC	Avg. # of Drugs
LR	0.0775	0.4900	0.6470	0.7553	-
ECC	0.0806	0.4868	0.6428	0.7602	-
RETAIN	0.0851 ± 0.0028	0.4711 ± 0.0140	0.6337 ± 0.0129	0.7512 ± 0.0126	17.9925 ± 0.8751
LEAP	0.0689 ± 0.0028	0.4369 ± 0.0117	0.6002 ± 0.0116	0.6467 ± 0.0068	19.1096 ± 0.1240
GAMENet	0.0836 ± 0.0067	0.4790 ± 0.0260	0.6382 ± 0.0240	0.7393 ± 0.0247	25.1478 ± 1.1325
SafeDrug	0.0627 ± 0.0023	0.5051 ± 0.0150	0.6624 ± 0.0134	0.7604 ± 0.0117	19.3245 ± 0.5557
SafeDrug*	0.0589 ± 0.0005	0.5213 ± 0.0030	0.6768 ± 0.0027	0.7647 ± 0.0025	19.9178 ± 0.1604

values from the original SafeDrug model paper

Further analysis can be found at the Jupyter Notebook here.

Credits

Our work followed the original codes at https://github.com/ycq091044/SafeDrug.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
dependency.sh		dependency.sh
dlh_project.pdf		dlh_project.pdf
illustration.png		illustration.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reproducing `SafeDrug` For CS598: Deep Learning for Healthcare (Spring '22)

Paper and Group Details

Project Structure

Execution

Step 1: Environment Setup

Step 2: Obtaining Data and Processing

Step 3: Run Model(s)

Step 4: Analysis of the results

Results

Credits

About

Releases

Packages

Contributors 2

Languages

samhq/cs598dl4h-project

Folders and files

Latest commit

History

Repository files navigation

Reproducing SafeDrug For CS598: Deep Learning for Healthcare (Spring '22)

Paper and Group Details

Project Structure

Execution

Step 1: Environment Setup

Step 2: Obtaining Data and Processing

Step 3: Run Model(s)

Step 4: Analysis of the results

Results

Credits

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Reproducing `SafeDrug` For CS598: Deep Learning for Healthcare (Spring '22)

Packages