Name		Name	Last commit message	Last commit date
parent directory ..
queries		queries
.dockerignore		.dockerignore
Dockerfile		Dockerfile
README.adoc		README.adoc
config.py		config.py
createSpreadsheet.py		createSpreadsheet.py
createSpreadsheet.sh		createSpreadsheet.sh
requirements.txt		requirements.txt

README.adoc

SPARQL dataset statistics

This subproject (part of QADO question answering dataset RDFizer) address the need of a good understanding of the characteristics of Knowledge Graph Question Answering (KGQA) benchmark datasets. For this purpose, a tool is provided here to create CSV files containing statistics about the RDFized data in a triplestore.

Table of Contents

SPARQL dataset statistics

Requirements

Docker needs to be installed on your system
your triplestore needs to be accessible via SPARQL endpoint (cf. Section Configuration)

Usage

./createSpreadsheet.sh

This script creates new CSV files with all statistics and stores them following the pattern QADO-statistics-${statistic_name}-${current_date}.csv in the folder /tmp/QADO-statistics/ (by default). To change the output directory, you can provide a parameter to the script, e.g.:

./createSpreadsheet.sh myOutputDirectory

Configuration

Script configuration

You might edit the config.py to select the triplestore where your data is stored.

Statistics

A number of existing statistics are already implemented. To create your own statistics, just store your SPARQL queries into the queries directory. This tool creates a new CSV file for each SPARQL query file.

Boxplot data

If you want to generate a boxplot chart, you need to store your query with the file ending _boxplot.sparql. Additionally, the data for the boxplot chart needs to be selected via (GROUP_CONCAT(?YOUR_VARIABLE ; SEPARATOR = ",") AS ?concat).

Default statistics

The tool generates per default the following statistics for RDFized Question Answering benchmarks:

Query length per benchmark
Query modifiers per benchmark
Question length per
- answer type and benchmark (boxplot)
- benchmark (boxplot)
- language and benchmark (boxplot)
Questions per
- answer type
- benchmark
- language
- language and benchmark
Question type per benchmark
Statistics of used resources inside the SPARQL queries per benchmark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

statistics

statistics

README.adoc

SPARQL dataset statistics

Requirements

Usage

Configuration

Script configuration

Statistics

Boxplot data

Default statistics

Files

statistics

Directory actions

More options

Directory actions

More options

Latest commit

History

statistics

Folders and files

parent directory

README.adoc

SPARQL dataset statistics

Requirements

Usage

Configuration

Script configuration

Statistics

Boxplot data

Default statistics