src

João F Silva

Oct 19, 2020

53fbd42 · Oct 19, 2020

This branch is up to date with ieeta-pt/PatientFM:master.

Name	Name	Last commit message	Last commit date
parent directory ..
embeddings	embeddings	Code cleaning	Oct 16, 2020
models	models	Code cleaning	Oct 16, 2020
Entity.py	Entity.py	Code cleaning	Oct 16, 2020
NejiAnnotator.py	NejiAnnotator.py	Code cleaning	Oct 16, 2020
Orchestrator.py	Orchestrator.py	Code cleaning	Oct 16, 2020
Preprocessing.py	Preprocessing.py	Added Processing.py for sentence splitting and tokenization. Entity.p…	Feb 20, 2020
ProcessorAux.py	ProcessorAux.py	Fixing naming	Oct 16, 2020
README.md	README.md	Update README.md	Oct 16, 2020
Reader.py	Reader.py	Feature to detect the living status	Jul 8, 2020
RuleBased.py	RuleBased.py	Feature to detect the living status	Jul 8, 2020
Writer.py	Writer.py	Feature to detect the living status	Jul 8, 2020
eval.py	eval.py	Add more details in the evaluation method	Jul 6, 2020
main.py	main.py	Adding saved model files	Oct 19, 2020
rules.py	rules.py	Join the Sergio's ideas with my rule-based syste (I think)	Jul 6, 2020
settings.ini	settings.ini	Fixing naming	Oct 16, 2020
stopWords.py	stopWords.py	Code Cleaning	Oct 16, 2020

README.md

Using PatientFM

PatientFM is an end-to-end system for extracting family history information from clinical notes. The system is implemented so that the user can select between the hybrid solution (end-to-end extraction of entities and relations), only the rule-based engine (only extracts family member entities) and only the deep-learning module (only extracts disease observation entities).

To run the system simply execute the python main.py command supplied with the desired flags. The system supports the following list of flags:

Settings Flag	Description
-s SETTINGS_FILE, --settings SETTINGS_FILE	The system settings file (default: ../settings.ini)

Execution Mode Flags	Description
-t1, --first	In this mode, the script will execute the first subtask of the challenge (default: False)
-t2, --second	In this mode, the script will execute the second subtask of the challenge (default: False)

Configuration Flags	Description
-p, --showprints	When active, some parts of the processing are shown during execution (default: False)
-c, --cleaning	When active, clinical reports are processed and cleaned (default: False)'
-m, --method	The method used to process clinical records
-ds, --dataset	The dataset used to process
-r, --read	Read annotations from file (Useful for task 2)
-sf, --submission	If activated, the output will be in the submission file format

While the system provides the possibility to define the desired method and dataset through the previous system flags, the user can also define them directly in the settings.ini file. By configuring settings.ini accordingly, it is not necessary to supply the method and dataset flags to main.py.

Settings file

The PatientFM tool uses a settings file following the standard configuration file format to define the system variables. This file is divided in the following 9 sections (a description of each section is provided further below):

[vocabulary]
anatomy_file=../vocabularies/n2c2_ANAT.tsv
disease_file=../vocabularies/n2c2_DISO_extended.tsv

[methods]
deeplearning=False
rulebased=True

[embeddings]
vocabulary_path=../vocabularies/vocabulary.txt
sentences_path=../vocabularies/sentences.pickle
wordvec_path=/backup/data/models/embeddings/BioSentVec/BioWordVec_PubMed_MIMICIII_d200.bin
biowordvec_original=embeddings/biowordvec_vocab_orig.npy
biowordvec_normalized=embeddings/biowordvec_vocab_norm.npy
wordvec_size=200
train_embeddings_pickle=../dataset/Train/train_embeddings.pickle
test_embeddings_pickle=../dataset/Test/test_embeddings.pickle

[datasets]
train_files=../dataset/Train/bioc-FH-training-updated-627/
test_files=../dataset/Test/testRelease-0805/
goldstandard_st1=../dataset/Train/train_subtask1_2.tsv
goldstandard_st2=../dataset/Train/train_subtask2_2.tsv
nltk_sources=../dataset/nltk_data

[neji]
use_neji_annotations=True
neji_train_pickle_biowordvec=../dataset/Train/neji_train_classes_embedmodel_BIO.pickle
neji_test_pickle_biowordvec=../dataset/Test/neji_test_classes_embedmodel_BIO.pickle
neji_train_pickle_albert=../dataset/Train/neji_train_classes_albertmodel_BIO.pickle
neji_test_pickle_albert=../dataset/Test/neji_test_classes_albertmodel_BIO.pickle
neji_train_pickle_clinicalbert=../dataset/Train/neji_train_classes_clinicalbertmodel_BIO.pickle
neji_test_pickle_clinicalbert=../dataset/Test/neji_test_classes_clinicalbertmodel_BIO.pickle

[results]
task1=../results/task1.tsv
task2=../results/task2.tsv
family_members=../results/task1.tsv
observations=../results/task1_js.tsv

[DLmodel]
#model=biowordvec_bilstm
#model=albert_bilstm
model=clinicalbert_bilstm
#model=clinicalbert_linear

[DLmodelparams]
entity_prediction=True
hiddensize=256
batchsize=32
iterationsperepoch=100
numlayers=2
learningrate=1e-3
patience=5
epochs=100
EMBEDDINGS_FREEZE_AFTER_EPOCH=2

[ALBERT]
model=albert-base-v2
#model=albert-large-v2
#model=albert-xlarge-v2
#model=albert-xxlarge-v2
add_special_tokens=True

Field	Description
`[vocabulary]`	Section used to define vocabulary files. It is necessary to create a new entry for each new vocabulary file. These files should respect the format of the following example: ''UMLS:C0006629:T017:Anatomy cadaver\|cadavers\|corpse\|corpses''.
`[methods]`	Section used to define the desired method to process clinical records. The user can select the Deep-Learning component for observation extraction, the Rule-Based component only, or set both to `True` for the Hybrid solution.
`[embeddings]`	Section used to define paths and parameters for word embeddings files. Not necessary if `deeplearning`is set to `False`.
`[datasets]`	Section used to define paths for the datasets.
`[neji]`	Section used to define the paths for Neji annotation files for the Deep Learning solution. This is used to avoid the process of creating Neji annotations whenever the system is run (annotations are created once and saved).
`[results]`	Section used to define the paths where result files will be saved.
`[DLmodel]`	Section used to select the Deep Learning model to use. The system currently supports 4 different models. Uncomment only the desired model.
`[DLmodelparams]`	Section used to set a list of Deep Learning model parameters.
`[ALBERT]`	Section used to select the size of the ALBERT model, if `albert_bilstm` is selected in `[DLmodel]`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

src

src

README.md

Using PatientFM

Settings file

Files

src

Directory actions

More options

Directory actions

More options

Latest commit

History

src

Folders and files

parent directory

README.md

Using PatientFM

Settings file