Skip to content
This repository has been archived by the owner on Dec 20, 2022. It is now read-only.

Files

Latest commit

author
João F Silva
Oct 19, 2020
53fbd42 · Oct 19, 2020

History

History
This branch is up to date with ieeta-pt/PatientFM:master.

src

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Oct 16, 2020
Oct 16, 2020
Oct 16, 2020
Oct 16, 2020
Oct 16, 2020
Feb 20, 2020
Oct 16, 2020
Oct 16, 2020
Jul 8, 2020
Jul 8, 2020
Jul 8, 2020
Jul 6, 2020
Oct 19, 2020
Jul 6, 2020
Oct 16, 2020
Oct 16, 2020

Using PatientFM

PatientFM is an end-to-end system for extracting family history information from clinical notes. The system is implemented so that the user can select between the hybrid solution (end-to-end extraction of entities and relations), only the rule-based engine (only extracts family member entities) and only the deep-learning module (only extracts disease observation entities).

To run the system simply execute the python main.py command supplied with the desired flags. The system supports the following list of flags:

Settings Flag Description
-s SETTINGS_FILE, --settings SETTINGS_FILE The system settings file (default: ../settings.ini)
Execution Mode Flags Description
-t1, --first In this mode, the script will execute the first subtask of the challenge (default: False)
-t2, --second In this mode, the script will execute the second subtask of the challenge (default: False)
Configuration Flags Description
-p, --showprints When active, some parts of the processing are shown during execution (default: False)
-c, --cleaning When active, clinical reports are processed and cleaned (default: False)'
-m, --method The method used to process clinical records
-ds, --dataset The dataset used to process
-r, --read Read annotations from file (Useful for task 2)
-sf, --submission If activated, the output will be in the submission file format

While the system provides the possibility to define the desired method and dataset through the previous system flags, the user can also define them directly in the settings.ini file. By configuring settings.ini accordingly, it is not necessary to supply the method and dataset flags to main.py.

Settings file

The PatientFM tool uses a settings file following the standard configuration file format to define the system variables. This file is divided in the following 9 sections (a description of each section is provided further below):

[vocabulary]
anatomy_file=../vocabularies/n2c2_ANAT.tsv
disease_file=../vocabularies/n2c2_DISO_extended.tsv

[methods]
deeplearning=False
rulebased=True

[embeddings]
vocabulary_path=../vocabularies/vocabulary.txt
sentences_path=../vocabularies/sentences.pickle
wordvec_path=/backup/data/models/embeddings/BioSentVec/BioWordVec_PubMed_MIMICIII_d200.bin
biowordvec_original=embeddings/biowordvec_vocab_orig.npy
biowordvec_normalized=embeddings/biowordvec_vocab_norm.npy
wordvec_size=200
train_embeddings_pickle=../dataset/Train/train_embeddings.pickle
test_embeddings_pickle=../dataset/Test/test_embeddings.pickle

[datasets]
train_files=../dataset/Train/bioc-FH-training-updated-627/
test_files=../dataset/Test/testRelease-0805/
goldstandard_st1=../dataset/Train/train_subtask1_2.tsv
goldstandard_st2=../dataset/Train/train_subtask2_2.tsv
nltk_sources=../dataset/nltk_data

[neji]
use_neji_annotations=True
neji_train_pickle_biowordvec=../dataset/Train/neji_train_classes_embedmodel_BIO.pickle
neji_test_pickle_biowordvec=../dataset/Test/neji_test_classes_embedmodel_BIO.pickle
neji_train_pickle_albert=../dataset/Train/neji_train_classes_albertmodel_BIO.pickle
neji_test_pickle_albert=../dataset/Test/neji_test_classes_albertmodel_BIO.pickle
neji_train_pickle_clinicalbert=../dataset/Train/neji_train_classes_clinicalbertmodel_BIO.pickle
neji_test_pickle_clinicalbert=../dataset/Test/neji_test_classes_clinicalbertmodel_BIO.pickle

[results]
task1=../results/task1.tsv
task2=../results/task2.tsv
family_members=../results/task1.tsv
observations=../results/task1_js.tsv

[DLmodel]
#model=biowordvec_bilstm
#model=albert_bilstm
model=clinicalbert_bilstm
#model=clinicalbert_linear

[DLmodelparams]
entity_prediction=True
hiddensize=256
batchsize=32
iterationsperepoch=100
numlayers=2
learningrate=1e-3
patience=5
epochs=100
EMBEDDINGS_FREEZE_AFTER_EPOCH=2

[ALBERT]
model=albert-base-v2
#model=albert-large-v2
#model=albert-xlarge-v2
#model=albert-xxlarge-v2
add_special_tokens=True
Field Description
[vocabulary] Section used to define vocabulary files. It is necessary to create a new entry for each new vocabulary file. These files should respect the format of the following example: ''UMLS:C0006629:T017:Anatomy cadaver|cadavers|corpse|corpses''.
[methods] Section used to define the desired method to process clinical records. The user can select the Deep-Learning component for observation extraction, the Rule-Based component only, or set both to True for the Hybrid solution.
[embeddings] Section used to define paths and parameters for word embeddings files. Not necessary if deeplearningis set to False.
[datasets] Section used to define paths for the datasets.
[neji] Section used to define the paths for Neji annotation files for the Deep Learning solution. This is used to avoid the process of creating Neji annotations whenever the system is run (annotations are created once and saved).
[results] Section used to define the paths where result files will be saved.
[DLmodel] Section used to select the Deep Learning model to use. The system currently supports 4 different models. Uncomment only the desired model.
[DLmodelparams] Section used to set a list of Deep Learning model parameters.
[ALBERT] Section used to select the size of the ALBERT model, if albert_bilstm is selected in [DLmodel].