Skip to content

Latest commit

 

History

History
225 lines (155 loc) · 16.1 KB

File metadata and controls

225 lines (155 loc) · 16.1 KB

Qanary logo

In a Nutshell: Qanary Question Answering Components

The Qanary Framework is dedicated to creating Question Answering systems. Question Answering (QA) is a task requiring different fields leading to expensive/time-consuming engineering tasks that might block research as it is too expensive. Typical problems/use cases that might occur while developing a Question Answering system are:

  • an algorithm requires analyzing textual questions and annotating the found entities, relations, classes, etc.
    • it is time-consuming as many services/algorithms/tools need to be compared
  • your QA process needs to be improved
    • following traditional development approaches requires additional efforts for testing and debugging of code to uncover possible flaws
  • the quality of components dedicated to a particular task needs to be analyzed
    • it is expensive to integrate all of the particular components due to a missing generalized interface

In this repository, the components of the Qanary framework are stored. All components are implemented in Java and provide a Docker container for lightweight maintenance.

Build and run a minimal set of components

To show the Qanary methodology and its functionality a tiny template-based Question Answering system was designed. It is capable of answering questions for the real name of a superhero like "What is the real name of Captain America?". For this purpose, just two components were used: a) Qanary DBpedia Spotlight component: The component is capable of finding superhero names and linking it to the DBpedia knowledge base (such a process is called Named Entity Recognition and Disambiguation). b) Qanary Query Builder for Superhero Names: The component is capable of creating SPARQL SELECT queries to be executed on DBpedia (such a component is typically called Query Builder) if the given question is following the template What is the real name of <superheroname>.

Hence, given a question following the described pattern the result will be a SPARQL query that might be executed, s.t., the real name of a superhero is retrieved from DBpedia.

Run a minimalistic Question Answering system

  1. Install the Qanary core components
  2. Clone the current repository:
git clone https://github.com/WDAqua/Qanary-question-answering-components.git
  1. Switch to the folder Qanary-question-answering-components:
cd Qanary-question-answering-components
  1. Build the minimal set of components using the Maven profile "tinytutorial" (here we skip creating the corresponding Docker images by adding the parameter -Ddockerfile.skip=true to the Maven command):
mvn clean package -Ddockerfile.skip=true -P tinytutorial
* The output should look like the following indicating that the component `qa.NED-DBpedia-Spotlight``and `qanary_component-QB-SimpleRealNameOfSuperHero` was created:
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] qa.NED-DBpedia-Spotlight 2.1.0 ..................... SUCCESS [  3.717 s]
[INFO] qanary_component-QB-SimpleRealNameOfSuperHero 2.0.0  SUCCESS [  1.083 s]
[INFO] mvn.reactor 0.1.1-SNAPSHOT ......................... SUCCESS [  0.073 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
  1. Now, both components might be started using the JAR files:
java -jar qanary_component-NED-DBpedia-Spotlight/target/qa.NED-DBpedia-Spotlight-X.Y.Z.jar
java -jar qanary_component-QB-SimpleRealNameOfSuperHero/target/qanary_component-QB-SimpleRealNameOfSuperHero-X.Y.Z.jar
  1. Build and start a Qanary pipeline

  2. While having installed the Qanary components and Qanary pipeline using the standard configuration you can access a trivial Question Answering frontend via http://localhost:8080/startquestionansweringwithtextquestion

    • Use the question "What is the real name of Captain America?".
    • The question can be answered using the given two components.
    • Thereafter, the triplestore will hold a SPARQL query that was created by the QueryBuilder component SimpleRealNameOfSuperHero (for DBpedia). It could be used to retrieve the actual answer from DBpedia. The UI shows the graph ID where the computed information was stored.
      • Retrieve the SPARQL query from your Qanary triplestore using:
PREFIX oa: <http://www.w3.org/ns/openannotation/core/>
PREFIX qa: <http://www.wdaqua.eu/qa#> 

SELECT *
FROM <ADD-YOUR-GRAPH-ID-HERE>
WHERE {
    ?s a qa:AnnotationOfAnswerSPARQL.
    ?s oa:hasBody ?sparqlQueryOnDBpedia .
    ?s oa:annotatedBy ?annotatingService .
}

Big Picture

  • Qanary provides the methodology for a knowledge-driven, vocabulary-based approach. Our long-term agenda is to create a knowledge-driven ecosystem for the field of Question Answering. It is part of the WDAqua project where Question Answering systems are researched and developed.
  • Qanary Framework provides the core framework for creating Question Answering systems following the Qanary methodology. You might consider the Qanary Framework as a reference implementation of the Qanary framework as a microservice-based component architecture.
  • Qanary components is covering the QA components compatible with the Qanary framework.
  • Frankenstein is a supporting framework to establish a toolset for rapid orchestration and benchmarking of Qanary components. For example, it provides the tools to create from 29 components 380 QA systems.

Regarding questions, ideas, or any feedback related to Qanary please do not hesitate to contact the core developers. However, if you would like to see a QA system originally built using the Qanary framework, one of our core developers has built a complete end-to-end QA system that allows you to query several RDF data stores: http://wdaqua.eu/qa.

Please go to the GitHub Wiki page of the Qanary repository to get more insights on how to use this framework, how to add new components etc.

How to Cite

Introducing a Vocabulary for Knowledge-driven Question Answering Processes

Kuldeep Singh, Andreas Both, Dennis Diefenbach, Saeedeh Shekarpour: Towards a Message-Driven Vocabulary for Promoting the Interoperability of Question Answering Systems. ICSC 2016: 386-389 DOI 10.1109/ICSC.2016.59

Introducing the Qanary Framework

Andreas Both, Dennis Diefenbach, Kuldeep Singh, Saeedeh Shekarpour, Didier Cherix, Christoph Lange: Qanary - A Methodology for Vocabulary-Driven Open Question Answering Systems. ESWC 2016: 625-641 DOI 10.1007/978-3-319-34129-3_38

Analytics of NER/NED Components

Dennis Diefenbach, Kuldeep Singh, Andreas Both, Didier Cherix, Christoph Lange, Sören Auer: The Qanary Ecosystem: Getting New Insights by Composing Question Answering Pipelines. ICWE 2017: 171-189 DOI 10.1007/978-3-319-60131-1_10

For further publications please see the following wiki page.


Qanary Components

The following components are contained in the

Question Answering Name Entity Recognition (NER) and Disambiguation Components (NED) Components

Entity Classifier 2 (NER)

It uses rule-based grammar to extract entities in a text.

Stanford NLP Tool (NER)

Stanford named entity recognizer is an open-source tool that uses Gibbs sampling for information extraction to spot entities in a text.

Babelfy

is a multilingual, graph-based approach that uses random walks and the densest subgraph algorithm to identify and disambiguate entities present in a text.

AGDISTIS (NED)

It is a graph-based disambiguation tool that couples the HITS algorithm with label expansion strategies and string similarity measures to disambiguate entities in a given text.

DBpedia Spotlight

It is a web service that uses a vector-space representation of entities and using the cosine similarity, recognize and disambiguate the entities.

Tag Me

It matches terms in a given text with Wikipedia, \ie links text to recognize named entities. Furthermore, it uses the in-link graph and the page dataset to disambiguate recognized entities to its Wikipedia URIs.

Other NER and NED Tools

Question Answering Relation Linking (RL) Components

ReMatch

  • It maps natural language relations to knowledge graph properties by using dependency parsing characteristics with adjustment rules.It then carries out a match against knowledge base properties, enhanced with word lexicon Wordnet via a set of similarity measures. It is an open source tool.
  • Qanary ReMatch for RL

RelationLinker2 (RelationMatch)

  • It devises semantic-index-based representation of PATTY~\cite{DBLP:conf/emnlp/NakasholeWS12} (a knowledge corpus of linguistic patterns and its associated properties in DBpedia) and a search mechanism over this index with the purpose of enhancing relation linking task.
  • Qanary RelationLinker2 for RL

OKBQA DiambiguationProperty (ReLMatch)

  • The disambiguation module (DM) of OKBQA framework provides disambiguation of entities, classes, and relations present in a natural language question.
  • Qanary DiambiguationProperty for RL

RelNliodRel (RNLIWOD)

Spot Property (AnnotationofSpotProperty)

Question Answering Class Linking (CL) Components

ClsNliodCls (NLIWOD CLS)

  • NLIWOD Class Identifier is one among the several other tools provided by the NLIWOD community for reuse. The code for the class identifier is available on GitHub.
  • Qanary ClsNliodCls for CL

AnnotationofSpotClass (OKBQA Class linker)

Question Answering Query Builder (QB) Components

QueryBuilder (NLIWOD Template-based QB)

  • Template-based query builders are widely used in the QA community for SPARQL query construction. This component is similar to the existing template-based components.
  • Qanary QueryBuilder for QB

SINA (QB)

  • SINA is a keyword and natural language query search engine that is based on Hidden Markov Models for choosing the correct dataset to query. We decoupled the original implementation to get a query builder.
  • Qanary SINA for QB