Implementation of NER with character and word level embeddings
For each word, its word embedding, character embedding and POS-tag embedding are concatenated, which is then fed into bidirectional LSTM. Note that CNN is used to embed characters whose kernel sizes and number of channels are defined in ./utils/constants.py
.
This implementation is based on Korean and hence Okt tokenizer is used from konlpy.tag
.
The data is provided from here.
run vocab.py
- build vocabs for wordss, characters and pos-taggings
- tokenize and convert into index for both tokens and labels
run train.py
run inference.ipynb
Make sure that you give a relevant path for your trained model