-
Notifications
You must be signed in to change notification settings - Fork 3
SPO Balance
Tim L edited this page Jun 17, 2013
·
30 revisions
This page describes the concept and implementation of a technique to summarize arbitrary RDF graphs. We'll summarize the named graphs in http://ieeevis.tw.rpi.edu/sparql as a running example.
vsr-spo-balance.sh wraps the Java invocation using the situate shell paths pattern.
In a separate Prizms node, we set up the dataset "sparql" with source "ieeevis-tw-rpi-edu" at directory data/source/ieeevis-tw-rpi-edu/sparql
.
Sketch of the summarization description. The implementation does it slightly differently.
@prefix sio: <http://semanticscience.org/resource/> .
# We analyzed a graph with name http://xmlns.com/foaf/0.1 that was
# provided by a SPARQL endpoint healthdata.tw.rpi.edu/sparql
<http://healthdata.tw.rpi.edu/sparql?query=PREFIX+sd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2Fns%2Fsparql-service-description%23%3E+CONSTRUCT+%7B+%3Fendpoints_named_graph+%3Fp+%3Fo+%7D+WHERE+%7B+GRAPH+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%3E+%7B+%5B%5D+sd%3Aurl+%3Chttp%3A%2F%2Fhealthdata.tw.rpi.edu%2Fsparql%3E%3B+sd%3AdefaultDatasetDescription+%5B+sd%3AnamedGraph+%3Fendpoints_named_graph+%5D+.+%3Fendpoints_named_graph+sd%3Aname+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%3E%3B+%3Fp+%3Fo+.+%7D+%7D>
a sd:NamedGraph;
sd:name <http://xmlns.com/foaf/0.1>;
prov:hadLocation <http://healthdata.tw.rpi.edu/sparql>;
.
# We derived a few datasets during our analysis.
<spo_balance_for_foaf_graph>
a void:Dataset, vsr:SPOBalanceSet;
void:subset <subjects>, <predicates>, <objects>;
prov:wasDerivedFrom <http://healthdata.tw.rpi.edu/sparql?query=PREFIX+sd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2Fns%2Fsparql-service-description%23%3E+CONSTRUCT+%7B+%3Fendpoints_named_graph+%3Fp+%3Fo+%7D+WHERE+%7B+GRAPH+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%3E+%7B+%5B%5D+sd%3Aurl+%3Chttp%3A%2F%2Fhealthdata.tw.rpi.edu%2Fsparql%3E%3B+sd%3AdefaultDatasetDescription+%5B+sd%3AnamedGraph+%3Fendpoints_named_graph+%5D+.+%3Fendpoints_named_graph+sd%3Aname+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%3E%3B+%3Fp+%3Fo+.+%7D+%7D>;
.
<resources> # This needs to be split up into S and O...
a void:Dataset, vsr:ResourceSet;
sio:count 99;
sio:has-member <http://xmlns.com/foaf/0.1/workplaceHomepage>,
<http://xmlns.com/foaf/0.1/maker>,
<http://purl.org/dc/elements/1.1/description>,
<http://www.w3.org/2002/07/owl#Class>,
<http://xmlns.com/foaf/0.1/page>,
<http://xmlns.com/foaf/0.1/birthday>,
... 93 more ...
.
src/spo-balance.sh wraps the call to RepositorySummarizer.java
We use a Sesame Repository, which can be started by running tomcat: apache-tomcat-7.0.34/bin/startup.sh
log.rtf contains implementation details.
src/spo-balance.sh --help
RepositorySummarizer version: 2013-Jan-14
usage: RepositorySummarizer { -(sysin) [reportURI | .] |
-r(emote) serverURL repositoryID <reportURI | .> [context-to-summarize ...] |
-d(irectory) path/to/sesame-native-dir/ [context-to-summarize ...] |
-f(ile) path/to/a.rdf <reportURI | .> }
where:
-(sysin): Summarize the RDF on standard in; print summary report to standard out.
If reportURI or . are provided, print TRiG instead of RDF/XML.
-r(remote): Summarize listed specimenContexts in repositoryID at serverURL.
If no specimenContexts listed, summarize all contexts in repository.
-d(irectory): Summarize listed specimenContexts in sesame native directory.
If no specimenContexts listed, summarize all contexts in directory.
-f(ile): Summarize the RDF in file; print summary report to standard out.
( version: 2013-Jan-14 )