This paper considers the problem of answering factoid questions in an open-domain setting using Wikipedia as the unique knowledge source. Having a single knowledge source forces the model to be very precise while searching for an answer.
In order to answer any question, one must retrieve the relevant articles and then scan them to indentify the answer.
Uses an efficient (non-machine learning) document retrieval system to first narrow the search space and docus on relevant articles. A simple inverted index lookup followed by term vector model scoring is used.
Articles and questions are compared as TF-IDF (Term Frequency — Inverse Document Frequency) weighted bag-of-word vectors. It is further improved by taking local word order into account with n-gram features (BEST: bigram).
Given a question q consisting og l tokens and a document of n paragraphs where a single paragraph p consists of m tokens, an RNN model is developed which is applied to each paragraph and then finally aggregated to predict the answers.
The tokens in a paragraph is represented as a sequence of feature vectors
which is then passed as the input to the
, A multi-layer bidirectional Long Short-term memory network and take
as the concatenation of each layer's hidden units in the end.
The feature vector is comprised of
-
Word Embeddings:
. Using the 300-dimensional GloVe embeddings. The 1000 most frequent question words are fine tuned as some key words could be crucial to QA systems.
-
Exact Match:
. Uses three simple binary features indicating whether
can be exactly matched to one of the question word in q.
-
Token Features:
. Manual features which reflect some properties of the token are added which include Part-of-speech (POS), Named-entity-recognition (NER) and (Normalized) Term-frequency (TF).
-
Aligned Question Embedding:
where the attention score
captures similarity between
and each question word
.
Another RNN is applied on the word embeddings of and the resulting hidden units are combined into one single vector
, where
and
encodes the importance of each question word.
At the paragraph level, the goal is to predict the span of tokens that is most likely the correct answer.
Two classifiers are trained independently over the paragraph vectors and the question vector
to predict the two ends of the span.
-
Wikipedia (Knowledge Source) - Uses the 2016-12-21 dump of English Wikipedia as the knowledge source.
-
SQuAD (The Stanford Question Answering Dataset) - Uses SQuAD for training and evaluating the Document Reader.