General Strategies for Visualizing RDF Graphs

What is first

Rubber, meet road

What we will cover

This page outlines the general patterns that have emerged while trying to visualize different types of RDF graphs.

Let's get to it!

Domain forms, Visual forms, and Visual Strategies

We distinguish between Domain forms and Visual forms. Although both are semiotic symbols and represent content, Domain forms are what is traditionally considered the "data" while Visual forms are expressed in terms of visual characteristics. So, a visualization is a mapping from a Domain form to a Visual form. We call the specification of the mapping from Domain forms to Visual forms a Visual strategy and strive to ensure that Visual strategies can be declarative in nature.

In our case, the Domain form is expressed in RDF/OWL (e.g. an instance of sioc:Post that prov:wasAttributedTo instances of foaf:Person), while the Visual form can be expressed in a variety of ways, such as GraphML (e.g. node), SVG (e.g. svg:Rect), PNG (e.g. row,col,red,green,blue), etc. If the concepts in each of these representation systems (GraphML, SVG, PNG) where modeled in OWL, we envision that Visual strategies could be simply constructed by specifying intersections of Domain forms and Visual forms. For example, to say that male persons should be rendered as blue rectangles, then the visual strategy would assert "blue svg:Rects are a subclass of male foaf:Persons".

The most common visualization "pipeline" begins with a selection whose results march down a relatively fixed procedure before producing a visual artifact. We propose to reframe the visualization paradigm by assuming that everything should be shown unless otherwise excluded. By explicitly excluding everything that does not appear in the final visual result, both the visualization designer and her audience can more rigorously interrogate the visual's claims. Visualization, after all, is about understanding, communicating, and convincing oneself or others. Further, the requirement to visualize everything raises new challenges for visualization designers that can expose many hidden assumptions about their work. Facing these assumptions can further the progress of visualization as a scientific endeavor.

Excluding domain forms

Visualization is inherently lossy, and requires the right simplifying choices to be effective. We believe that what is not being shown is just as important -- if not more important -- than what is being shown. So, we outline a variety of ways to declare that certain portions of the domain are not of interest for the goals of the current Visual strategy.

Blacklisted {subjects, predicates, objects}[namespaces] and classes

If a predicate ?p is blacklisted, then no triple using the predicate will be depicted.

if ?s ?p ?o and ?p a vsr:Blacklisted then notexists(?v) such that ?v ov:depicts [ :s ?s; :p ?p; :o ?o ])

For example, if we believe that our audience need not know the author of a blog post, then we can blacklist the predicate dcterms:author. While this can decrease the clutter in a visual, it may also fail to fulfill an audience's need to know a blog's author. By using a blacklist (instead of just not selecting it in some query), we have the ability to answer why authors were not shown. Further, a visualization designer could justify why he chose to blacklist the author predicate (e.g., "the audience doesn't need to know it", "the audience can look it up quickly enough somewhere else", "it wouldn't fit within the mobile device screen", or "it was more important to list the title.").

If a resource ?o is object-blacklisted, then no triple using the resource as an object will be depicted.

if ?s ?p ?o and ?o a vsr:Blacklisted then notexists(?v) such that ?v ov:depicts [ :s ?s; :p ?p; :o ?o ])

For example, rdf:nil never really needs to be seen in a visual, since it is used as a list terminator and the list elements and their ordering is the important part. So, object-blacklisting rdf-nil would prevent any last element from appearing. Beyond the nuances of data structures in RDF, one could imagine a resource that should never be shown. For example, if it is clear enough from other context, the school "RPI" should not appear in a visualization showing professors that teach there. Again, these kinds of justifications can be provided for each object-blacklisted resource, so that audiences know why it is part of the visual strategy.

Visual mapping

Predicates as Notes

If the values of a predicate are long, then showing them all at all times can lead to clutter. If the properties and values are important, but not important enough to occupy the view at all times, placing the value in a Note could be one strategy to satisfy the tradeoff.

if ?s ?p ?o and ?p a vsr:Note then ?v ov:depicts [ :s ?s; :p ?p; :o ?o ]; a vsr:Note

Predicates are Labels

If a predicate is a label, then it should constrain itself to the visual element that depicts the resource that it is labeling.

Predicates are in description

Often it is convenient to display many literals of a subgraph into a single span of textual description. This avoids the need to show the entire subgraph. If this is done, then the properties that are used to create the textual description can be essentially blacklisted.

Anonymous instance classes

If an resource is an instance of an Anonymous instance class, then its serialized URI need not be displayed in V(resource). Instead, serializing the instances type can be used. This can be useful when the class is of secondary importance that instead relates two or more primary classes. It is also useful when the URIs of the instances are very long (e.g. those created using the named graph convention).

Connection versus Containment

swap-directionality-predicates

Visual constructs can use the directionality of an visual edge to influence the arrangement of the visual forms. By default, the direction of the visual edge should align with the directionality of the RDF triple. However, in some situations it can be helpful to reverse this directionality for a set of RDF predicates.

Coloring by URI namespace

An important part of visualization is showing differences. Often, it is useful to distinguish among resources' the naming authority (i.e., the web domain name that hosts the URI). This applies particularly when inspecting data from multiple sources across the web.

Coloring by resource class

An important part of visualization is showing differences. When showing a graph that has many different types of resources, then it can be useful to color the resources by their type. This is related to blacklisting the class resource, since every instance will reference it using rdf:type.

Relaxing a resource

If a resource is referenced more than once, then a visualization designer must decided whether to use more than one visual form to depict the resource. This, again, is a tradeoff. If only one visual element is used to depict all references to the resource, then there are likely more visual connections to it and thus more visual clutter. At the other extreme, if a new visual element is used for every reference to the resource, then it will not be obvious or easy to see how else the resource relates to others.

If a user "knows much more" about a resource, then it can be safer to relax a resource. But if they do not know as much about it, and the information about the resource is in the visual, then it would be better to use the same visual element for all occurrences of the resource.

Relaxing a namespace

Relaxes all resources (see above) whose URIs are in the given namespace.

Contextually relaxing a namespace

Relax a resource when it is within a given namespace and appears as the object of a given predicate. (e.g. namespaces-to-relax-in-ranges)

What is next

vsr2.xsl implements this
SPO Balance
https://github.com/timrdf/vsr/wiki/Characterizing-a-list-of-RDF-node-URIs

Related work

CoLD computes a color for any URI. e.g.
Similar Structures inside RDF-Graphs
ProLOD++ pdf homepage
DataSum homepage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly