You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've had an idea for a possible extension to SHACL for a while and I'm wondering what others think about it.
Over the past couple years, I have run into several situations where constraints are part of the domain of interest and those constraints should apply to other data in the domain. In those cases, it would be helpful to have shapes be defined as part of data instead of at the schema level, and it would be helpful if the SHACL engine knew how data were connected to these shapes they should be validated against via some existing path expressed in domain terminology.
Doing this would prevent users from needing to extend the ontology/schema to add new constraints. Also, it could prevent the use of metamodeling to accomplish a similar goal, which can get messy and confusing for users.
Here are three generic examples where this feature could potentially be helpful to help convey the idea:
Example 1
Consider the Function Ontology (https://fno.io/spec/#ontology-abstract). If you look at the documentation for fno:Parameter and fno:Output they look very similar to sh:PropertyShape in spirit, and the class fno:Function is therefore like sh:NodeShape. It might be useful to use SHACL to validate sure that function arguments and outputs match what is expected based on the function definition.
However, if instances of fno:Function were Node Shapes, then there would be no convenient way to configure each fno:Function instance to target the right nodes with current SHACL. You'd have to either make each one a class and have the corresponding instances of fno:Execution be instances of each (which might tempt the introduction of metamodeling similar to SPIN Functions), write a clunky custom target type using SHACL-AF that wouldn't be supported by all SHACL engines, or use sh:targetNode to connect each fno:Function instance to the corresponding fno:Execution instances instead of the domain property fno:executes (or in addition to it, which would be redundant).
In this case, it would be convenient if each fno:Execution could be validated against whatever node it was connected to via fno:executes.
Example 2
Consider some future state of the W3C Data Cube ontology. Data Structure Definitions (https://www.w3.org/TR/vocab-data-cube/#dsd-dsd) and Component Specifications are data in this domain. However, they could be modified to be represented as Node Shapes and Property Shapes respectively such that SHACL could be used to validate that the Observations that are part of DataSets that have that Data Structure Definition actually conform to that structure.
The same challenges exist for trying to validate a qb:DataStructureDefinition as a Node Shape as for fno:Function; there is no convenient way to configure each qb:DataStructureDefinition instance to target the right nodes with current SHACL.
In this case, it would be convenient if each qb:Observation could be validated against whatever node it was connected to via the path qb:dataSet/qb:structure.
Consider the EP-PLAN ontology (https://trustlens.github.io/EP-PLAN/, documentation: https://trustlens.github.io/EP-PLAN/widoco_output/index-en.html), an extension to W3C PROV for capturing in detail the plans that go along with the Activities in PROV. It may be desired to use SHACL to determine whether an activity went according to plan or if some deviation occured. Note that ep-plan:Step and ep-plan:Variable both could be similar in spirit to sh:NodeShape.
The same challenges exist for trying to validate instances of these classes as Node Shapes as for fno:Function; there is no convenient way to configure each ep-plan:Step and ep-plan:Variable instance to target the right nodes with current SHACL.
In this case, it would be convienient if each ep-plan:Activity could be validated against whatever node it was connected to via ep-plan:correspondsToStep and if each ep-plan:Entity could be validated against whatever node it was connected to via ep-plan:correspondsToVariable.
Possible Implementation
I've thought of a few different ways to implement this behavior, but I think the simplest and most efficient way I've thought of so far is to create a new Constraint Component.
This new Constraint Component would function somewhat like the one for sh:node. However, instead of specifying the URI of a Node Shape that value nodes must also conform to, it specifies a SHACL path using a parameter perhaps called, e.g., sh:nodesPath. For each value node for the shape with a value for sh:nodesPath, that value node is also validated against any Node Shape(s) found at the specifed path from the value node (if any resources at that path exist and are Node Shapes).
This would enable the following addition for the Function Ontology in order to validate that all instances of fno:Execution conform to any corresponding instance of fno:Function:
fno:Execution
sh:nodesPath fno:executes ;
.
And this addition for the Data Cube Ontology in order to validate that all instances of qb:Observation conform to any corresponding instance of qb:DataStructureDefinition:
And these additions for the EP-PLAN Ontology in order to validate that all instances of ep-plan:Activity conform to any corresponding instance(s) of ep-plan:Step and that all instances of ep-plan:Entity conform to any corresponding instance(s) of ep-plan:Variable:
My main reservation with this approach is that I'm not a huge fan of how if sh:node fails validation, many SHACL engines don't include the nested results via sh:detail in their reports, and this constraint would probably function the same way. I hope that more validators would use/take advantage of sh:detail in the future in general.
I have added a prototype implementation of this to this branch in this fork of pyshacl (just because I happen to be the most familiar with the internals of that SHACL engine) and have been playing around with it. Included in this folder in the repo is a file with example data and shapes that demonstrates how it works, as well as the output from the modified version of pyshacl (cleaned up a bit for readability).
I'm curious to know what the community thinks of this, both as a concept and also this particular method of implementation.
The text was updated successfully, but these errors were encountered:
It looks to me like this Issue veers close to a philosophy-of-SHACL question, which I'm not sure would be in scope of the WG or not. (I know scoping like this was mentioned in the meeting yesterday, but that was an early hour for me, so apologies if I misremember.)
#185 poses a question about how to carve up the graphs involved in a SHACL validation process. For the duration of this comment, I'll assume there is a divide, but not necessarily a partitioning, into a data graph (to be reviewed), a shapes graph (providing review rules), and an ontology graph (which helps with the data to be reviewed, but incidentally also gets reviewed due to mix-in). It's not "partitioning" because triples could be in multiple of these graphs simultaneously.
SHACL does support reviewing SHACL. SHACL-SHACL specifically does that.
Having an ontology that uses and extends SHACL as a more-foundational model doesn't seem inconsistent with the nature of RDF modeling. (Apologies for the double-negative.) At some point, the ontology developer (and/or data implementer) would need to decide on whether there would be shapes that need to review only the "TBox" -- but, at least the Function Ontology you noted sounds like a case where "ABox" and "TBox" have a pretty blurry divide.
Is there any change to the core SHACL specification suggested by these use cases? There's already discussion on #215 related to sh:path.
I've had an idea for a possible extension to SHACL for a while and I'm wondering what others think about it.
Over the past couple years, I have run into several situations where constraints are part of the domain of interest and those constraints should apply to other data in the domain. In those cases, it would be helpful to have shapes be defined as part of data instead of at the schema level, and it would be helpful if the SHACL engine knew how data were connected to these shapes they should be validated against via some existing path expressed in domain terminology.
Doing this would prevent users from needing to extend the ontology/schema to add new constraints. Also, it could prevent the use of metamodeling to accomplish a similar goal, which can get messy and confusing for users.
Here are three generic examples where this feature could potentially be helpful to help convey the idea:
Example 1
Consider the Function Ontology (https://fno.io/spec/#ontology-abstract). If you look at the documentation for
fno:Parameter
andfno:Output
they look very similar tosh:PropertyShape
in spirit, and the classfno:Function
is therefore likesh:NodeShape
. It might be useful to use SHACL to validate sure that function arguments and outputs match what is expected based on the function definition.However, if instances of
fno:Function
were Node Shapes, then there would be no convenient way to configure eachfno:Function
instance to target the right nodes with current SHACL. You'd have to either make each one a class and have the corresponding instances offno:Execution
be instances of each (which might tempt the introduction of metamodeling similar to SPIN Functions), write a clunky custom target type using SHACL-AF that wouldn't be supported by all SHACL engines, or use sh:targetNode to connect eachfno:Function
instance to the correspondingfno:Execution
instances instead of the domain propertyfno:executes
(or in addition to it, which would be redundant).In this case, it would be convenient if each
fno:Execution
could be validated against whatever node it was connected to viafno:executes
.Example 2
Consider some future state of the W3C Data Cube ontology. Data Structure Definitions (https://www.w3.org/TR/vocab-data-cube/#dsd-dsd) and Component Specifications are data in this domain. However, they could be modified to be represented as Node Shapes and Property Shapes respectively such that SHACL could be used to validate that the Observations that are part of DataSets that have that Data Structure Definition actually conform to that structure.
The same challenges exist for trying to validate a
qb:DataStructureDefinition
as a Node Shape as forfno:Function
; there is no convenient way to configure eachqb:DataStructureDefinition
instance to target the right nodes with current SHACL.In this case, it would be convenient if each
qb:Observation
could be validated against whatever node it was connected to via the pathqb:dataSet/qb:structure
.Furthermore, this would allow more fancy data cube behavior more easily, like how shapes are used for datatypes of QB components here: https://docs.allotrope.org/ADF%20Data%20Cube%20Ontology.html (see examples 5 and 11)
Example 3
Consider the EP-PLAN ontology (https://trustlens.github.io/EP-PLAN/, documentation: https://trustlens.github.io/EP-PLAN/widoco_output/index-en.html), an extension to W3C PROV for capturing in detail the plans that go along with the Activities in PROV. It may be desired to use SHACL to determine whether an activity went according to plan or if some deviation occured. Note that
ep-plan:Step
andep-plan:Variable
both could be similar in spirit tosh:NodeShape
.The same challenges exist for trying to validate instances of these classes as Node Shapes as for
fno:Function
; there is no convenient way to configure eachep-plan:Step
andep-plan:Variable
instance to target the right nodes with current SHACL.In this case, it would be convienient if each
ep-plan:Activity
could be validated against whatever node it was connected to viaep-plan:correspondsToStep
and if eachep-plan:Entity
could be validated against whatever node it was connected to viaep-plan:correspondsToVariable
.Possible Implementation
I've thought of a few different ways to implement this behavior, but I think the simplest and most efficient way I've thought of so far is to create a new Constraint Component.
This new Constraint Component would function somewhat like the one for
sh:node
. However, instead of specifying the URI of a Node Shape that value nodes must also conform to, it specifies a SHACL path using a parameter perhaps called, e.g.,sh:nodesPath
. For each value node for the shape with a value forsh:nodesPath
, that value node is also validated against any Node Shape(s) found at the specifed path from the value node (if any resources at that path exist and are Node Shapes).This would enable the following addition for the Function Ontology in order to validate that all instances of
fno:Execution
conform to any corresponding instance offno:Function
:And this addition for the Data Cube Ontology in order to validate that all instances of
qb:Observation
conform to any corresponding instance ofqb:DataStructureDefinition
:And these additions for the EP-PLAN Ontology in order to validate that all instances of
ep-plan:Activity
conform to any corresponding instance(s) ofep-plan:Step
and that all instances ofep-plan:Entity
conform to any corresponding instance(s) ofep-plan:Variable
:My main reservation with this approach is that I'm not a huge fan of how if sh:node fails validation, many SHACL engines don't include the nested results via
sh:detail
in their reports, and this constraint would probably function the same way. I hope that more validators would use/take advantage ofsh:detail
in the future in general.I have added a prototype implementation of this to this branch in this fork of pyshacl (just because I happen to be the most familiar with the internals of that SHACL engine) and have been playing around with it. Included in this folder in the repo is a file with example data and shapes that demonstrates how it works, as well as the output from the modified version of pyshacl (cleaned up a bit for readability).
I'm curious to know what the community thinks of this, both as a concept and also this particular method of implementation.
The text was updated successfully, but these errors were encountered: