-
Notifications
You must be signed in to change notification settings - Fork 3
SPO Balance
- General Strategies for Visualizing RDF Graphs
- Naming sparql service description's sd:NamedGraph
- SPO Balance summaries can be used to derive Centrifuge-able graph layouts.
This page describes the concept and implementation of a technique to summarize arbitrary RDF graphs. We'll summarize the named graphs in http://ieeevis.tw.rpi.edu/sparql as a running example.
vsr-spo-balance.sh wraps the Java invocation using the situate shell paths pattern.
In a separate Prizms node, we set up the dataset "sparql" with source "ieeevis-tw-rpi-edu" at directory data/source/ieeevis-tw-rpi-edu/sparql
.
bash-3.2$ vsr-spo-balance.sh
usage: RepositorySummarizer { -(sysin) [reportURI | .] |
-r(emote) serverURL repositoryID <reportURI | .> [context-to-summarize ...] |
-d(irectory) path/to/sesame-native-dir/ [context-to-summarize ...] |
-f(ile) path/to/a.rdf <reportURI | .> }
where:
-(sysin): Summarize the RDF on standard in; print summary report to standard out.
If reportURI or . are provided, print TRiG instead of RDF/XML.
-r(remote): Summarize listed specimenContexts in repositoryID at serverURL.
If no specimenContexts listed, summarize all contexts in repository.
-d(irectory): Summarize listed specimenContexts in sesame native directory.
If no specimenContexts listed, summarize all contexts in directory.
-f(ile): Summarize the RDF in file; print summary report to standard out.
( version: 2013-Apr-03 )
Sketch of the summarization description. The implementation does it slightly differently.
@prefix sio: <http://semanticscience.org/resource/> .
# We analyzed a graph with name http://xmlns.com/foaf/0.1 that was
# provided by a SPARQL endpoint healthdata.tw.rpi.edu/sparql
<http://healthdata.tw.rpi.edu/sparql?query=PREFIX+sd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2Fns%2Fsparql-service-description%23%3E+CONSTRUCT+%7B+%3Fendpoints_named_graph+%3Fp+%3Fo+%7D+WHERE+%7B+GRAPH+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%3E+%7B+%5B%5D+sd%3Aurl+%3Chttp%3A%2F%2Fhealthdata.tw.rpi.edu%2Fsparql%3E%3B+sd%3AdefaultDatasetDescription+%5B+sd%3AnamedGraph+%3Fendpoints_named_graph+%5D+.+%3Fendpoints_named_graph+sd%3Aname+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%3E%3B+%3Fp+%3Fo+.+%7D+%7D>
a sd:NamedGraph;
sd:name <http://xmlns.com/foaf/0.1>;
prov:hadLocation <http://healthdata.tw.rpi.edu/sparql>;
.
# We derived a few datasets during our analysis.
<spo_balance_for_foaf_graph>
a void:Dataset, vsr:SPOBalanceSet;
void:subset <subjects>, <predicates>, <objects>;
prov:wasDerivedFrom <http://healthdata.tw.rpi.edu/sparql?query=PREFIX+sd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2Fns%2Fsparql-service-description%23%3E+CONSTRUCT+%7B+%3Fendpoints_named_graph+%3Fp+%3Fo+%7D+WHERE+%7B+GRAPH+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%3E+%7B+%5B%5D+sd%3Aurl+%3Chttp%3A%2F%2Fhealthdata.tw.rpi.edu%2Fsparql%3E%3B+sd%3AdefaultDatasetDescription+%5B+sd%3AnamedGraph+%3Fendpoints_named_graph+%5D+.+%3Fendpoints_named_graph+sd%3Aname+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%3E%3B+%3Fp+%3Fo+.+%7D+%7D>;
.
<resources> # This needs to be split up into S and O...
a void:Dataset, vsr:ResourceSet;
sio:count 99;
sio:has-member <http://xmlns.com/foaf/0.1/workplaceHomepage>,
<http://xmlns.com/foaf/0.1/maker>,
<http://purl.org/dc/elements/1.1/description>,
<http://www.w3.org/2002/07/owl#Class>,
<http://xmlns.com/foaf/0.1/page>,
<http://xmlns.com/foaf/0.1/birthday>,
... 93 more ...
.
src/spo-balance.sh wraps the call to RepositorySummarizer.java
We use a Sesame Repository, which can be started by running tomcat: apache-tomcat-7.0.34/bin/startup.sh
log.rtf contains implementation details.
src/spo-balance.sh --help
RepositorySummarizer version: 2013-Jan-14
usage: RepositorySummarizer { -(sysin) [reportURI | .] |
-r(emote) serverURL repositoryID <reportURI | .> [context-to-summarize ...] |
-d(irectory) path/to/sesame-native-dir/ [context-to-summarize ...] |
-f(ile) path/to/a.rdf <reportURI | .> }
where:
-(sysin): Summarize the RDF on standard in; print summary report to standard out.
If reportURI or . are provided, print TRiG instead of RDF/XML.
-r(remote): Summarize listed specimenContexts in repositoryID at serverURL.
If no specimenContexts listed, summarize all contexts in repository.
-d(irectory): Summarize listed specimenContexts in sesame native directory.
If no specimenContexts listed, summarize all contexts in directory.
-f(ile): Summarize the RDF in file; print summary report to standard out.
( version: 2013-Jan-14 )
color by those predicates that occur in a given curated list of vocabulary namespaces.
Decorate the SPO balance with a "word cloud" of prefixes for the [preferred] namespaces that the graph uses. This aggregated information should be derivable from the SPO repository summary RDF description.
at http://opendap.tw.rpi.edu/sparql
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX vsr: <http://purl.org/twc/vocab/vsr#>
select ?vocabulary ?predicate ?count
where {
<http://purl.org/twc/vocab/vsr#RepositorySummarizer_2014-Jan-15_15-44_1389800671057_ms/spo>
a vsr:SPODataset;
void:subset [
a vsr:PredicatesDataset;
void:subset [
a vsr:PredicateOccurrenceDataset;
owl:hasValue ?predicate;
sio:count ?count
];
] .
optional { ?predicate rdfs:isDefinedBy ?vocabulary }
}
group by ?vocabulary
order by ?vocabulary ?predicate ?count
The following query results:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX vsr: <http://purl.org/twc/vocab/vsr#>
select distinct ?predicate ?count
where {
?spo
a vsr:SPODataset;
void:subset [ # </spo/p>
a vsr:PredicatesDataset;
void:subset [ # </spo/p/bin/1>, </spo/p/bin/2>, ...
a vsr:PredicateOccurrenceDataset;
owl:onProperty rdf:predicate;
owl:hasValue ?predicate;
sio:count ?count
];
] .
filter(regex(str(?spo),'1389898600145'))
}
order by ?predicate ?count
If a dataset uses the following properties and frequencies, then we can model it as the following RDF. void:vocabulary,
- http://usefulinc.com/ns/doap#anon-root 1
- http://usefulinc.com/ns/doap#audience 1
- http://usefulinc.com/ns/doap#browse 2
- http://purl.org/dc/terms/author 1
- http://purl.org/dc/terms/contributor 6
- http://purl.org/dc/terms/created 8
# This was already provided by the SPO summary calculation:
<spo/p/bin/1>
a vsr:Bin, vsr:Dataset, vsr:PredicateOccurrenceDataset;
owl:onProperty rdf:predicate;
owl:hasValue <http://usefulinc.com/ns/doap#anon-root>;
sio:count "1"^^xsd:int;
.
<spo/p/bin/2>
a vsr:Bin, vsr:Dataset, vsr:PredicateOccurrenceDataset;
owl:onProperty rdf:predicate;
owl:hasValue <http://usefulinc.com/ns/doap#audience>;
sio:count "1"^^xsd:int;
.
<spo/p/bin/3>
a vsr:Bin, vsr:Dataset, vsr:PredicateOccurrenceDataset;
owl:onProperty rdf:predicate;
owl:hasValue <http://usefulinc.com/ns/doap#browse>;
sio:count "2"^^xsd:int;
.
<spo/p/bin/4>
a vsr:Bin, vsr:Dataset, vsr:PredicateOccurrenceDataset;
owl:onProperty rdf:predicate;
owl:hasValue <http://purl.org/dc/terms/author>;
sio:count "1"^^xsd:int;
.
<spo/p/bin/5>
a vsr:Bin, vsr:Dataset, vsr:PredicateOccurrenceDataset;
owl:onProperty rdf:predicate;
owl:hasValue <http://purl.org/dc/terms/contributor>;
sio:count "6"^^xsd:int;
.
<spo/p/bin/6>
a vsr:Bin, vsr:Dataset, vsr:PredicateOccurrenceDataset;
owl:onProperty rdf:predicate;
owl:hasValue <http://purl.org/dc/terms/created>;
sio:count "8"^^xsd:int;
.
<spo/p/ns/doap> # We'll start a new branch, and use prefixes when we have them, hash of ns o/w.
owl:onProperty rdfs:isDefinedBy;
owl:hasValue <http://usefulinc.com/ns/doap#>;
sio:count 4;
a void:Dataset;
void:vocabulary <http://usefulinc.com/ns/doap#>;
void:propertyPartition </spo/p/bin/1>, # These predicate bins are already defined.
</spo/p/bin/2>,
</spo/p/bin/3>;
.
<spo/p/ns/dcterms>
owl:onProperty rdfs:isDefinedBy;
owl:hasValue <http://purl.org/dc/terms/>;
sio:count 15;
a void:Dataset;
void:vocabulary <http://purl.org/dc/terms/>;
void:propertyPartition </spo/p/bin/4>, # These predicate bins are already defined.
</spo/p/bin/5>,
</spo/p/bin/6>;
.
The node
- Deriving a VoID Linkset view of an SPO Balance
- Centrifuge is a spanning-tree based graph layout algorithm for void:Linkset-type graphs. It can be generalized to arbitrary graphs, but we'll get there...