Skip to content

NER with distant supervision (resources and code)

Notifications You must be signed in to change notification settings

IldikoPilan/dist_sup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distant supervision for medical entity recognition

Medical Entity Recognition (NER) with distant supervision methods for Norwegian. The underlying lexical resource is NorMedTerm, a list of Norwegian categorized medical entities. The resource is included in this repository.

Dependencies

ufal.udpipe

Lexical baseline

Baseline tagging relying on lexical match with terms from NorMedTerm. Performs optional pre-processing and statistical analysis of the tagged entities. The longest possible match is taken. NER tags are added as an additional column at the end of each line. The 'examples' folder includes some parsed (.conllu) and tagged (.ner) files of medical texts from Legemiddelhåndboka.

Example runs

tagging and statistics for already tokenised files

  • python lex_baseline.py -i <input_folder> -a ts

processing, tagging and statistics with a subset of entity categories

  • python lex_baseline.py -i <input_folder> -a pts -m <path_to_udpipe_model> -e CONDITION,PROCEDURE,SUBSTANCE

help

  • python lex_baseline.py -h

Distantly supervised neural model

Coming soon.

Acknowledgements

Developed within the BigMed project.

Terms of use

Distributed under the CC BY 4.0 licence.

About

NER with distant supervision (resources and code)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages