Skip to content

QLever performance evaluation and comparison to other SPARQL engines

Hannah Bast edited this page Apr 9, 2024 · 13 revisions

QLever performance evaluation and comparison to other SPARQL engines

Here are the results of a simple performance evaluation and comparison of QLever, Virtuoso, Blazegraph, GraphDB, Stardog, Apache Jena, and Oxigraph on a single moderately-sized dataset. More engines and more datasets will be added in the future. However, since all of the metrics below essentially scale linearly with the size of the dataset (at least for QLever), already the results on this one dataset say a lot.

All evaluations (of all engines) were run on an AMD Ryzen 9 7950X with 16 cores, 128 GB, and 7.1 TB of NVMe SSD. This is high-quality but affordable consumer hardware (as opposed to typical server hardware), with a total cost of around 2500 €.

Evaluation and comparison on the DBLP dataset (390 M triples)

The dataset used was the RDF dump of DBLP, version 02.04.2024 (1.8 GB compressed, 390 million triples, 68 predicates, see this SPARQL endpoint).

The following table compares loading time (in seconds), loading speed (million triples per second), and index size (in Gigabytes). The next to last column shows the average query time for the small benchmark detailed in the next section. The last column provides a subjective assessment of how easy or not it was to build the index and run queries (Blazegraph requires explicit chunking to load larger datasets, GraphDB's normal load takes forever, Virtuoso is old and error-prone with unusual interfaces, the setup for Stardog was by far the most complicated of all, see Section "Command lines ..." below).

SPARQL engine Code Loading time Loading speed Index size Avg. query time Usability
Oxigraph Rust 640s 0.6 M/s 67 GB 93s very good
Apache Jena Java 2392s 0.2 M/s 42 GB 69s very good
Stardog Java 724s 0.5 M/s 28 GB 17s complicated
GraphDB Java 1066s 0.4 M/s 28 GB 16s good
Blazegraph Java 6326s <0.1 M/s 67 GB 4.3s good
Virtuoso C 561s 0.7 M/s 13 GB 2.2s messy
QLever C++ 231s 1.7 M/s 8 GB 0.7s very good

The following table compares query processing times on six queries from the "Examples" of https://qlever.cs.uni-freiburg.de/dblp. The queries were selected for their variety (see the "Comment" column), not to make a particular engine look particularly good or bad. For each engine, the query times were measured after emptying the disk cache with sudo bash -c "sync; sleep 5; echo 3 > /proc/sys/vm/drop_caches" and starting the respective server from scratch. For QLever, its internal cache was cleared after each query (this makes it harder for QLever). For the other engines, no such precautions were taken. There was no significant (IO-heavy or CPU-heavy) activity on the machine during the evaluation. The > in one the table cells below indicates that Virtuoso, due to an internal limitation, downloaded only 1,048,576 of the around 7M results for the respective query.

Query Result shape Oxigraph Apache Jena Stardog GraphDB Blazegraph Virtuoso QLever Comment
All papers published in SIGIR 6264 x 3 1.6s 0.3s 0.52s 0.17s 0.47s 0.54s 0.02s Two simple joins, nothing special
Number of papers by venue 19954 x 2 2.6s 28s 2.0s 3.1s 1.2s 1.0s 0.02s Scan of a single predicate with GROUP BY and ORDER BY
Author names matching REGEX 513 x 3 5.6s 4.8s 0.61s 0.29s 0.27s 0.98s 0.05s Joins, GROUP BY, ORDER BY, FILTER REGEX
All papers in DBLP until 1940 70 x 4 313s 50s 16s 0.04s 5.9s 0.08s 0.11s Three joins, a FILTER, and an ORDER BY
All papers with their title 7167122 x 2 132s 54s 44s 20s 18s >9.1s 4.2s Simple, but must materialize large result (problematic for many SPARQL engines)
All predicates ordered by size 68 x 3 106s 279s 37s 72s 0.05s 1.48s 0.01s Conceptually requires a scan over all triples, but huge optimization potential

Command lines for producing the results above (loading and queries)

For each engine, we created a folder with only the input file dblp.ttl.gz and a file queries.tsv obtained via curl -s https://qlever.cs.uni-freiburg.de/api/examples/dblp | sed -n '3p;4p;5p;6p;10p;15p' > queries.tsv (see below for the contents). For Virtuoso, there was also the config file virtuoso.ini (with generous settings regarding memory consumption). For QLever, there was the config file Qleverfile (with standard settings).

Oxigraph

oxigraph load -f dblp.ttl.gz -l .
sudo bash -c "sync; sleep 5; echo 3 > /proc/sys/vm/drop_caches"
oxigraph serve-read-only -l . -b localhost:8015
qlever example-queries --get-queries-cmd "cat queries.tsv" --download-or-count download --sparql-endpoint localhost:8015/query

Apache Jena

apache-jena-5.0.0/bin/tdb2.xloader --loc data dblp.ttl.gz
sudo bash -c "sync; sleep 5; echo 3 > /proc/sys/vm/drop_caches"
java -jar apache-jena-fuseki-5.0.0/fuseki-server.jar --port 8015 --loc data /dblp
qlever example-queries --get-queries-cmd "cat queries.tsv" --download-or-count download --sparql-endpoint localhost:8015/dblp

Stardog

sed -i 's/UseParallelOldGC/UseParallelGC/' opt/stardog/bin/helpers.sh
export STARDOG_SERVER_JAVA_ARGS="-Xms20g -Xmx20g"
export STARDOG_PROPERTIES=$(pwd) && echo "memory.mode = bulk_load" > stardog.properties
stardog-admin server start
stardog-admin db create -n dblp dblp.ttl.gz
stardog-admin server stop
rm -f stardog.properties
sudo bash -c "sync; sleep 5; echo 3 > /proc/sys/vm/drop_caches"
stardog-admin server start --disable-security
qlever example-queries --get-queries-cmd "cat queries.tsv" --download-or-count download --sparql-endpoint localhost:5820/dblp/query

GraphDB

graphdb-10.6.2/bin/console
> create graphdb   [ID = dblp, rest = default]
> quit
graphdb-10.6.2/bin/importrdf preload -f -i dblp dblp.ttl.gz
sudo bash -c "sync; sleep 5; echo 3 > /proc/sys/vm/drop_caches"
graphdb-10.6.2/bin/graphdb
curl -s localhost:7200/repositories/dblp --data-urlencode 'query=SELECT * { ?s ?p ?o } LIMIT 1'   [minimal warmup]
qlever example-queries --get-queries-cmd "cat queries.tsv" --download-or-count download --sparql-endpoint localhost:7200/repositories/dblp

Blazegraph

java -server -Xmx20g -jar blazegraph.jar &
docker run -it --rm -v $(pwd):/data stain/jena riot --output=NT /data/dblp.ttl.gz | split -a 3 --numeric-suffixes=1 --additional-suffix=.nt -l 1000000  --filter='gzip > $FILE.gz' - dblp-
for CHUNK in dblp-???.nt.gz; do curl -s indus:9999/blazegraph/namespace/kb/sparql --data-binary update="LOAD <file://$(pwd)/${CHUNK}>"; done
kill %1
sudo bash -c "sync; sleep 5; echo 3 > /proc/sys/vm/drop_caches"
java -server -Xmx20g -jar blazegraph.jar &
qlever example-queries --get-queries-cmd "cat queries.tsv" --download-or-count download --sparql-endpoint localhost:9999/blazegraph/namespace/kb/sparql

Virtuoso

isql-vt 8888
SQL> ld_dir('/local/data/qlever/qlever-indices/virtuoso-playground.ssd', 'dblp.ttl.gz', '');
SQL> DB.DBA.rdf_loader_run();
SQL> checkpoint;
SQL> exit;
sudo bash -c "sync; sleep 5; echo 3 > /proc/sys/vm/drop_caches"
/usr/bin/virtuoso-t -f &
qlever example-queries --get-queries-cmd "cat queries.tsv" --download-or-count download --sparql-endpoint localhost:8890/sparql

QLever

qlever index
sudo bash -c "sync; sleep 5; echo 3 > /proc/sys/vm/drop_caches"
qlever start
qlever example-queries --get-queries-cmd "cat queries.tsv" --download-or-count download --sparql-endpoint localhost:7015

Contents of queries.tsv

All papers published in SIGIR	PREFIX dblp: <https://dblp.org/rdf/schema#> SELECT ?paper ?title ?year WHERE { ?paper dblp:title ?title . ?paper dblp:publishedIn "SIGIR" . ?paper dblp:yearOfPublication ?year } ORDER BY DESC(?year)
Number of papers by venue	PREFIX dblp: <https://dblp.org/rdf/schema#> SELECT ?venue (COUNT(?paper) as ?count) WHERE { ?paper dblp:publishedIn ?venue } GROUP BY ?venue ORDER BY DESC(?count)
Author names matching REGEX	PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dblp: <https://dblp.org/rdf/schema#> SELECT ?author ?author_label ?count WHERE { { SELECT ?author ?author_label (COUNT(?paper) as ?count) WHERE { ?paper dblp:authoredBy ?author . ?paper dblp:publishedIn "SIGIR" . ?author rdfs:label ?author_label } GROUP BY ?author ?author_label } FILTER REGEX(STR(?author_label), "M.*D.*", "i") } ORDER BY DESC(?count)
All papers in DBLP until 1940	PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dblp: <https://dblp.org/rdf/schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?title ?author ?author_label ?year WHERE { ?paper dblp:title ?title . ?paper dblp:authoredBy ?author . ?paper dblp:yearOfPublication ?year . ?author rdfs:label ?author_label . FILTER (?year <= "1940"^^xsd:gYear) } ORDER BY ASC(?year) ASC(?title)
All papers with their title (large result)	PREFIX dblp: <https://dblp.org/rdf/schema#> SELECT ?paper ?title WHERE { ?paper dblp:title ?title }
All predicates, ordered by number of subjects	SELECT ?predicate (COUNT(?subject) as ?count) WHERE { ?subject ?predicate ?object } GROUP BY ?predicate ORDER BY DESC(?count)