This tool provides a web service to convert particular question answering datasets (represented in JSON format) into RDF Turtle. It uses the Question Answering Dataset Ontology (QADO) to represent the data in RDF.
This service needs a running instance of QADO RML Applicator.
The host of a instance (e. g., http://localhost:8000) has to be provided by setting the environment variable RML_APPLICATOR_HOST
.
Otherwise, you can set up a Docker image running the service by pulling the prepared image:
docker pull wseresearch/qado-rdfizer:latest
Alternatively you can build the Docker image from source. To start a Docker container use the following command:
docker run -d --env RML_APPLICATOR_HOST="YOUR RML APPLICATOR HOST" -p "$EXTERNAL_PORT:8080" wseresearch/qado-rdfizer:latest
This service provides a basic UI at $HOST:$PORT/
where you can transform a dataset and view the results directly in the web browser.
To transform a JSON file to RDF perform a POST
-Request at $HOST:$PORT/json2rdf
with a JSON payload of the following structure:
{
"filePath": "URL of the JSON file",
"format": "Mapping file name",
"label": "Name for the generated RDF triples",
"homepage": "URL of the data publisher",
"language": "Language tag of the questions (required only for 'compositional_wikidata' format)"
}
By default, the following datasets/formats are supported (using these RML mappings):
-
QALD: Question Answering over Linked Data
-
format identifier:
qald
-
supported versions: 5, 6, 8, 9, 9-plus, and 10
-
-
LC-QuAD: Largescale Complex Question Answering Dataset
-
supported versions:
-
1, format identifier:
lc-quad
-
2, format identifier:
lc-quad-2
-
-
-
RuBQ: A Russian Knowledge Base Question Answering and Machine Reading Comprehension Data Set
-
format identifier:
rubq
-
supported versions: 1 and 2
-
-
Mintaka: A complex, natural, and multilingual dataset for end-to-end question answering
-
format identifier:
mintaka
-
-
-
format identifier:
cwq
-
-
(beta) Compositional Wikidata Questions
-
format identifier:
compositional_wikidata
-
The following cURL command can be used to convert a JSON file of the QALD benchmark into RDF using Turtle as the output format.
curl --location --request POST 'http://$HOST:$PORT/json2rdf' \
--header 'Content-Type: application/json' \
--data-raw '{
"filePath": "https://github.com/ag-sc/QALD/raw/master/6/data/qald-6-train-multilingual-raw.json",
"format": "qald",
"label": "QALD 6 train multilingual raw",
"homepage": "https://github.com/ag-sc/QALD"
}'
To add new mapping rules just add a new mapping file NAME.ttl
to app/mappings
while NAME
has to be in all caps.
The mapping language is RML.
To use the file within the webservice just use the base file name as the format
parameter.
Here, also a script for creating statistics about the created datasets can be created.
See scripts/statistics
for more details.