GitHub - ZhiGroup/pytorch_ehr: source codes based on PyTorch to analyze EHR

@@ ## This Branch is dedicated to Clinicians -- please feel free to report an issue to help making this Repo more user friendly ## @@

Steps to run:

Environment Preparation:

login to https://colab.research.google.com
Select the GitHub option and enter this GitHub repo link (https://github.com/ZhiGroup/pytorch_ehr) and select the "Clinician" Branch
click the arrow to open the Prepare_env notebook
Run the file. It will display some messages; please select "Run anyway" and follow the instructions to add the authorization code.
As it completes successfully, you can see the pytorch_ehr drive created under your MyDrive:

Now you are ready to enjoy the tutorial :)

Data Preparation:

This year, we have 2 options:

Use the N3C SynthPuf data (only Demo, for code sharing on the N3C enclave Kindly email the presenters)
We will use SQLite to prepare our cohorts extracted from 100K Covid-19 patients synthetic data ( Synthea(TM) COVID-19 data: Walonoski J, Klaus S, Granger E, Hall D, Gregorowicz A, Neyarapally G, Watson A, Eastman J. Synthea™ Novel coronavirus (COVID-19) model and synthetic data set. Intelligence-Based Medicine. 2020 Nov;1:100007. https://doi.org/10.1016/j.ibmed.2020.100007 )

a. Download SQLite browser from https://sqlitebrowser.org/dl/

b. Under the Data_Prep folder you can Download the dataprep.sql which describes the cohort definition and data extraction from the covid100k.db adapted from https://synthea.mitre.org/ covid-19 100K data set.

c. You will also find DataPrep.ipynb notebook which will guide you through the data preprocessing.

Model Training:

Go to https://drive.google.com/, navigate to Model_Training folder. You will find Model_Training.ipynb notebook which will guide you through the RNN model training
Model_explanation.ipynb will be used for the model explanation demo.

Predictive Modeling on Electronic Health Records(EHR) using Pytorch

Overview

Although there are plenty of repos on vision and NLP models, there are very limited repos on EHR using deep learning that we can find. Here we open source our repo, implementing data preprocessing, data loading, and a zoo of common RNN models. The main goal is to lower the bar of entering this field for researchers. We are not claiming any state-of-the-art performance, though our models are quite competitive.

Based on existing works (e.g., Dr. AI and RETAIN), we represent electronic health records (EHRs) using the pickled list of list of list, which contain histories of patients' diagnoses, medications, and other various events. We integrated all relevant information of a patient's history, allowing easy subsetting.

Currently, this repo includes the following predictive models: Vanilla RNN, GRU, LSTM, Bidirectional RNN, Bidirectional GRU, Bidirectional LSTM, Dilated RNN, Dilated GRU, Dilated LSTM, QRNN,and T-LSTM to analyze and predict clinical performaces. Additionally we have tutorials comparing perfomance to plain LR, Random Forest.

Pipeline

Data Structure

We followed the data structure used in the RETAIN. Encounters may include pharmacy, clinical and microbiology laboratory, admission, and billing information from affiliated patient care locations. All admissions, medication orders and dispensing, laboratory orders, and specimens are date and time-stamped, providing a temporal relationship between treatment patterns and clinical information. These clinical data are mapped to the most common standards, for example, diagnoses and procedures are mapped to the International Classification of Diseases (ICD) codes, and laboratory tests are linked to their LOINIC codes.
Our processed pickle data: multi-level lists. From most outmost to gradually inside (assume we have loaded them as X)
- Outmost level: patients level, e.g. X[0] is the records for patient indexed 0
- 2nd level: patient information indicated in X[0][0], X[0][1], X[0][2] are patient id, outcome label or disease status (1: yes, 0: no disease), in case of survival it will be [disease status , time_to_disease], and visits records
- 3rd level: a list of length of total visits. Each element will be an element of two lists (as indicated in 4)
- 4th level: for each row in the 3rd-level list.
  - 1st element, e.g. X[0][2][0][0] is list of visit_time (since last time)
  - 2nd element, e.g. X[0][2][0][1] is a list of codes corresponding to a single visit
- 5th level: either a visit_time, or a single code
An illustration of the data structure is shown below:

In the implementation, the medical codes are tokenized with a unified dictionary for all patients.

Notes: as long as you have multi-level list you can use our EHRdataloader to generate batch data and feed them to your model
How it works

Paper Reference

Since we started our pytorch_ehr project a number of papers are published, for latest version kindly cite: Rasmy L, Nigo M, Kannadath BS, Xie Z, Mao B, Patel K, Zhou Y, Zhang W, Ross A, Xu H, Zhi D. Recurrent neural network models (CovRNN) for predicting outcomes of patients with COVID-19 on admission to hospital: model development and validation using electronic health record data. The Lancet Digital Health. 2022 Jun 1;4(6):e415-25

Versions This is towards Version 0.3, more details will be in the release notes

Dependencies

[Pytorch 0.4.0] (http://pytorch.org) All models except the QRNN and T-LSTM are compatble with the latest pytorch version (verified)
[Torchqrnn] (https://github.com/salesforce/pytorch-qrnn)
Pynvrtc
sklearn
Matplotlib (for visualizations)
tqdm
Python: 3.6+

License

This repo is for research purpose. Using it at your own risk.
This repo is under GPL-v3 license.

Acknowledgements Hat-tip to:

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
Pytorch_EHR_Tutorial		Pytorch_EHR_Tutorial
Prepare_env.ipynb		Prepare_env.ipynb
Pytorch_EHR Tutorial_2023_slides.pdf		Pytorch_EHR Tutorial_2023_slides.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Steps to run:

Environment Preparation:

Data Preparation:

Model Training:

Predictive Modeling on Electronic Health Records(EHR) using Pytorch

About

Releases 1

Packages

Contributors 5

Languages

ZhiGroup/pytorch_ehr

Folders and files

Latest commit

History

Repository files navigation

Steps to run:

Environment Preparation:

Data Preparation:

Model Training:

Predictive Modeling on Electronic Health Records(EHR) using Pytorch

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 5

Languages

Packages