Medical Entity Recognition (NER) with distant supervision methods for Norwegian. The underlying lexical resource is NorMedTerm, a list of Norwegian categorized medical entities. The resource is included in this repository.
Baseline tagging relying on lexical match with terms from NorMedTerm. Performs optional pre-processing and statistical analysis of the tagged entities. The longest possible match is taken. NER tags are added as an additional column at the end of each line. The 'examples' folder includes some parsed (.conllu) and tagged (.ner) files of medical texts from Legemiddelhåndboka.
python lex_baseline.py -i <input_folder> -a ts
python lex_baseline.py -i <input_folder> -a pts -m <path_to_udpipe_model> -e CONDITION,PROCEDURE,SUBSTANCE
python lex_baseline.py -h
Coming soon.
Developed within the BigMed project.
Distributed under the CC BY 4.0 licence.