Named Entity Recognition

Implementation of NER with character and word level embeddings For each word, its word embedding, character embedding and POS-tag embedding are concatenated, which is then fed into bidirectional LSTM. Note that CNN is used to embed characters whose kernel sizes and number of channels are defined in ./utils/constants.py.

Language & Tokenizer

This implementation is based on Korean and hence Okt tokenizer is used from konlpy.tag.

Data

The data is provided from here.

Process

1. Build Vocab and Preprocess

run vocab.py

build vocabs for wordss, characters and pos-taggings
tokenize and convert into index for both tokens and labels

2. Train

run train.py

3. Inference

run inference.ipynb
Make sure that you give a relevant path for your trained model

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
models		models
result/epoch_60_batch_64_ch_in_1_ch_out_32		result/epoch_60_batch_64_ch_in_1_ch_out_32
utils		utils
README.md		README.md
inference.ipynb		inference.ipynb
preprocess.py		preprocess.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Named Entity Recognition

Language & Tokenizer

Data

Process

1. Build Vocab and Preprocess

2. Train

3. Inference

About

Releases

Packages

Languages

robinsongh381/Character_and_Word_Embedding_for_NER

Folders and files

Latest commit

History

Repository files navigation

Named Entity Recognition

Language & Tokenizer

Data

Process

1. Build Vocab and Preprocess

2. Train

3. Inference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages