Skip to content
Tim L edited this page Jun 17, 2013 · 30 revisions

What is next

What we will cover

This page describes the concept and implementation of a technique to summarize arbitrary RDF graphs. We'll summarize the named graphs in http://ieeevis.tw.rpi.edu/sparql as a running example.

Let's get to it!

Invoking the summarizer

vsr-spo-balance.sh

Summary description

Sketch of the summarization description. The implementation does it slightly differently.

@prefix sio: <http://semanticscience.org/resource/> .

# We analyzed a graph with name http://xmlns.com/foaf/0.1 that was
# provided by a SPARQL endpoint healthdata.tw.rpi.edu/sparql

<http://healthdata.tw.rpi.edu/sparql?query=PREFIX+sd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2Fns%2Fsparql-service-description%23%3E+CONSTRUCT+%7B+%3Fendpoints_named_graph+%3Fp+%3Fo+%7D+WHERE+%7B+GRAPH+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%3E+%7B+%5B%5D+sd%3Aurl+%3Chttp%3A%2F%2Fhealthdata.tw.rpi.edu%2Fsparql%3E%3B+sd%3AdefaultDatasetDescription+%5B+sd%3AnamedGraph+%3Fendpoints_named_graph+%5D+.+%3Fendpoints_named_graph+sd%3Aname+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%3E%3B+%3Fp+%3Fo+.+%7D+%7D>
   a sd:NamedGraph;
   sd:name <http://xmlns.com/foaf/0.1>;
   prov:hadLocation <http://healthdata.tw.rpi.edu/sparql>;
.

# We derived a few datasets during our analysis.

<spo_balance_for_foaf_graph>
   a void:Dataset, vsr:SPOBalanceSet;
   void:subset <subjects>, <predicates>, <objects>;
   prov:wasDerivedFrom <http://healthdata.tw.rpi.edu/sparql?query=PREFIX+sd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2Fns%2Fsparql-service-description%23%3E+CONSTRUCT+%7B+%3Fendpoints_named_graph+%3Fp+%3Fo+%7D+WHERE+%7B+GRAPH+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%3E+%7B+%5B%5D+sd%3Aurl+%3Chttp%3A%2F%2Fhealthdata.tw.rpi.edu%2Fsparql%3E%3B+sd%3AdefaultDatasetDescription+%5B+sd%3AnamedGraph+%3Fendpoints_named_graph+%5D+.+%3Fendpoints_named_graph+sd%3Aname+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%3E%3B+%3Fp+%3Fo+.+%7D+%7D>;
.
<resources> # This needs to be split up into S and O...
   a void:Dataset, vsr:ResourceSet;
   sio:count 99;
   sio:has-member <http://xmlns.com/foaf/0.1/workplaceHomepage>,
                  <http://xmlns.com/foaf/0.1/maker>,
                  <http://purl.org/dc/elements/1.1/description>,
                  <http://www.w3.org/2002/07/owl#Class>,
                  <http://xmlns.com/foaf/0.1/page>,
                  <http://xmlns.com/foaf/0.1/birthday>,
                  ... 93 more ...
.

src/spo-balance.sh wraps the call to RepositorySummarizer.java

We use a Sesame Repository, which can be started by running tomcat: apache-tomcat-7.0.34/bin/startup.sh

log.rtf contains implementation details.

src/spo-balance.sh --help

RepositorySummarizer version: 2013-Jan-14
usage: RepositorySummarizer { -(sysin) [reportURI | .] |
                              -r(emote) serverURL repositoryID <reportURI | .> [context-to-summarize ...] |
                              -d(irectory) path/to/sesame-native-dir/ [context-to-summarize ...] |
                              -f(ile) path/to/a.rdf <reportURI | .> }
where:
   -(sysin):     Summarize the RDF on standard in; print summary report to standard out.
                 If reportURI or . are provided, print TRiG instead of RDF/XML.
   -r(remote):   Summarize listed specimenContexts in repositoryID at serverURL. 
                 If no specimenContexts listed, summarize all contexts in repository.
   -d(irectory): Summarize listed specimenContexts in sesame native directory. 
                 If no specimenContexts listed, summarize all contexts in directory.
   -f(ile):      Summarize the RDF in file; print summary report to standard out.

( version: 2013-Jan-14 )

What is next