QLever performance evaluation and comparison to other SPARQL engines

Here are the results of a simple performance evaluation and comparison of QLever, Virtuoso, Blazegraph, GraphDB, Stardog, Apache Jena, and Oxigraph on a single moderately-sized dataset. More engines and more datasets will be added in the future. However, since all of the metrics below essentially scale linearly with the size of the dataset (at least for QLever), already the results on this one dataset say a lot.

All evaluations (of all engines) were run on an AMD Ryzen 9 7950X with 16 cores, 128 GB, and 7.1 TB of NVMe SSD. This is high-quality but affordable consumer hardware (as opposed to typical server hardware), with a total cost of around 2500 €.

Evaluation and comparison on the DBLP dataset (390 M triples)

The dataset used was the RDF dump of DBLP, version 02.04.2024 (1.8 GB compressed, 390 million triples, 68 predicates, see this SPARQL endpoint).

The following table compares loading time (in seconds), loading speed (million triples per second), and index size (in Gigabytes). The next to last column shows the average query time for the small benchmark detailed in the next section. The last column provides a subjective assessment of how easy or not it was to build the index and run queries (Blazegraph requires explicit chunking to load larger datasets, GraphDB's normal load takes forever, Virtuoso is old and error-prone with unusual interfaces, the setup for Stardog was by far the most complicated of all, see Section "Command lines ..." below).

SPARQL engine	Code	Loading time	Loading speed	Index size	Avg. query time	Usability
Oxigraph	Rust	640s	0.6 M/s	67 GB	93s	very good
Apache Jena	Java	2392s	0.2 M/s	42 GB	69s	very good
Stardog	Java	724s	0.5 M/s	28 GB	17s	complicated
GraphDB	Java	1066s	0.4 M/s	28 GB	16s	good
Blazegraph	Java	6326s	<0.1 M/s	67 GB	4.3s	good
Virtuoso	C	561s	0.7 M/s	13 GB	2.2s	messy
QLever	C++	231s	1.7 M/s	8 GB	0.7s	very good

The following table compares query processing times on six queries from the "Examples" of https://qlever.cs.uni-freiburg.de/dblp. The queries were selected for their variety (see the "Comment" column), not to make a particular engine look particularly good or bad. For each engine, the query times were measured after emptying the disk cache with sudo bash -c "sync; sleep 5; echo 3 > /proc/sys/vm/drop_caches" and starting the respective server from scratch. For QLever, its internal cache was cleared after each query (this makes it harder for QLever). For the other engines, no such precautions were taken. There was no significant (IO-heavy or CPU-heavy) activity on the machine during the evaluation. The > in one the table cells below indicates that Virtuoso, due to an internal limitation, downloaded only 1,048,576 of the around 7M results for the respective query.

Query	Result shape	Oxigraph	Apache Jena	Stardog	GraphDB	Blazegraph	Virtuoso	QLever	Comment
All papers published in SIGIR	6264 x 3	1.6s	0.3s	0.52s	0.17s	0.47s	0.54s	0.02s	Two simple joins, nothing special
Number of papers by venue	19954 x 2	2.6s	28s	2.0s	3.1s	1.2s	1.0s	0.02s	Scan of a single predicate with GROUP BY and ORDER BY
Author names matching REGEX	513 x 3	5.6s	4.8s	0.61s	0.29s	0.27s	0.98s	0.05s	Joins, GROUP BY, ORDER BY, FILTER REGEX
All papers in DBLP until 1940	70 x 4	313s	50s	16s	0.04s	5.9s	0.08s	0.11s	Three joins, a FILTER, and an ORDER BY
All papers with their title	7167122 x 2	132s	54s	44s	20s	18s	>9.1s	4.2s	Simple, but must materialize large result (problematic for many SPARQL engines)
All predicates ordered by size	68 x 3	106s	279s	37s	72s	0.05s	1.48s	0.01s	Conceptually requires a scan over all triples, but huge optimization potential

Command lines for producing the results above (loading and queries)

For each engine, we created a folder with only the input file dblp.ttl.gz and a file queries.tsv obtained via curl -s https://qlever.cs.uni-freiburg.de/api/examples/dblp | sed -n '3p;4p;5p;6p;10p;15p' > queries.tsv (see below for the contents). For Virtuoso, there was also the config file virtuoso.ini (with generous settings regarding memory consumption). For QLever, there was the config file Qleverfile (with standard settings).

Oxigraph

oxigraph load -f dblp.ttl.gz -l .
sudo bash -c "sync; sleep 5; echo 3 > /proc/sys/vm/drop_caches"
oxigraph serve-read-only -l . -b localhost:8015
qlever example-queries --get-queries-cmd "cat queries.tsv" --download-or-count download --sparql-endpoint localhost:8015/query

Apache Jena

apache-jena-5.0.0/bin/tdb2.xloader --loc data dblp.ttl.gz
sudo bash -c "sync; sleep 5; echo 3 > /proc/sys/vm/drop_caches"
java -jar apache-jena-fuseki-5.0.0/fuseki-server.jar --port 8015 --loc data /dblp
qlever example-queries --get-queries-cmd "cat queries.tsv" --download-or-count download --sparql-endpoint localhost:8015/dblp

Stardog

sed -i 's/UseParallelOldGC/UseParallelGC/' opt/stardog/bin/helpers.sh
export STARDOG_SERVER_JAVA_ARGS="-Xms20g -Xmx20g"
export STARDOG_PROPERTIES=$(pwd) && echo "memory.mode = bulk_load" > stardog.properties
stardog-admin server start
stardog-admin db create -n dblp dblp.ttl.gz
stardog-admin server stop
rm -f stardog.properties
sudo bash -c "sync; sleep 5; echo 3 > /proc/sys/vm/drop_caches"
stardog-admin server start --disable-security
qlever example-queries --get-queries-cmd "cat queries.tsv" --download-or-count download --sparql-endpoint localhost:5820/dblp/query

GraphDB

graphdb-10.6.2/bin/console
> create graphdb   [ID = dblp, rest = default]
> quit
graphdb-10.6.2/bin/importrdf preload -f -i dblp dblp.ttl.gz
sudo bash -c "sync; sleep 5; echo 3 > /proc/sys/vm/drop_caches"
graphdb-10.6.2/bin/graphdb
curl -s localhost:7200/repositories/dblp --data-urlencode 'query=SELECT * { ?s ?p ?o } LIMIT 1'   [minimal warmup]
qlever example-queries --get-queries-cmd "cat queries.tsv" --download-or-count download --sparql-endpoint localhost:7200/repositories/dblp

Blazegraph

java -server -Xmx20g -jar blazegraph.jar &
docker run -it --rm -v $(pwd):/data stain/jena riot --output=NT /data/dblp.ttl.gz | split -a 3 --numeric-suffixes=1 --additional-suffix=.nt -l 1000000  --filter='gzip > $FILE.gz' - dblp-
for CHUNK in dblp-???.nt.gz; do curl -s indus:9999/blazegraph/namespace/kb/sparql --data-binary update="LOAD <file://$(pwd)/${CHUNK}>"; done
kill %1
sudo bash -c "sync; sleep 5; echo 3 > /proc/sys/vm/drop_caches"
java -server -Xmx20g -jar blazegraph.jar &
qlever example-queries --get-queries-cmd "cat queries.tsv" --download-or-count download --sparql-endpoint localhost:9999/blazegraph/namespace/kb/sparql

Virtuoso

isql-vt 8888
SQL> ld_dir('/local/data/qlever/qlever-indices/virtuoso-playground.ssd', 'dblp.ttl.gz', '');
SQL> DB.DBA.rdf_loader_run();
SQL> checkpoint;
SQL> exit;
sudo bash -c "sync; sleep 5; echo 3 > /proc/sys/vm/drop_caches"
/usr/bin/virtuoso-t -f &
qlever example-queries --get-queries-cmd "cat queries.tsv" --download-or-count download --sparql-endpoint localhost:8890/sparql

QLever

qlever index
sudo bash -c "sync; sleep 5; echo 3 > /proc/sys/vm/drop_caches"
qlever start
qlever example-queries --get-queries-cmd "cat queries.tsv" --download-or-count download --sparql-endpoint localhost:7015

Contents of `queries.tsv`

All papers published in SIGIR	PREFIX dblp: <https://dblp.org/rdf/schema#> SELECT ?paper ?title ?year WHERE { ?paper dblp:title ?title . ?paper dblp:publishedIn "SIGIR" . ?paper dblp:yearOfPublication ?year } ORDER BY DESC(?year)
Number of papers by venue	PREFIX dblp: <https://dblp.org/rdf/schema#> SELECT ?venue (COUNT(?paper) as ?count) WHERE { ?paper dblp:publishedIn ?venue } GROUP BY ?venue ORDER BY DESC(?count)
Author names matching REGEX	PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dblp: <https://dblp.org/rdf/schema#> SELECT ?author ?author_label ?count WHERE { { SELECT ?author ?author_label (COUNT(?paper) as ?count) WHERE { ?paper dblp:authoredBy ?author . ?paper dblp:publishedIn "SIGIR" . ?author rdfs:label ?author_label } GROUP BY ?author ?author_label } FILTER REGEX(STR(?author_label), "M.*D.*", "i") } ORDER BY DESC(?count)
All papers in DBLP until 1940	PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dblp: <https://dblp.org/rdf/schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?title ?author ?author_label ?year WHERE { ?paper dblp:title ?title . ?paper dblp:authoredBy ?author . ?paper dblp:yearOfPublication ?year . ?author rdfs:label ?author_label . FILTER (?year <= "1940"^^xsd:gYear) } ORDER BY ASC(?year) ASC(?title)
All papers with their title (large result)	PREFIX dblp: <https://dblp.org/rdf/schema#> SELECT ?paper ?title WHERE { ?paper dblp:title ?title }
All predicates, ordered by number of subjects	SELECT ?predicate (COUNT(?subject) as ?count) WHERE { ?subject ?predicate ?object } GROUP BY ?predicate ORDER BY DESC(?count)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QLever performance evaluation and comparison to other SPARQL engines

QLever performance evaluation and comparison to other SPARQL engines

Evaluation and comparison on the DBLP dataset (390 M triples)

Command lines for producing the results above (loading and queries)

Oxigraph

Apache Jena

Stardog

GraphDB

Blazegraph

Virtuoso

QLever

Contents of `queries.tsv`

Clone this wiki locally

QLever performance evaluation and comparison to other SPARQL engines

QLever performance evaluation and comparison to other SPARQL engines

Evaluation and comparison on the DBLP dataset (390 M triples)

Command lines for producing the results above (loading and queries)

Oxigraph

Apache Jena

Stardog

GraphDB

Blazegraph

Virtuoso

QLever

Contents of queries.tsv

Clone this wiki locally

Contents of `queries.tsv`