CIM XML (IEC 61970-552) as Lang based on current RDF/XML parser (read-only at first) #2529

arne-bdt · 2024-06-09T16:29:04Z

arne-bdt
Jun 9, 2024
Collaborator

The ENTSO-E RDF-SYNTAX USER GUIDE describes some key differences between CIM XML (IEC 61970-552) and RDF XML (W3C).

Currently, CIM XML is the dominant format for IT processes and standards defined by ENTSO-E for the European transport system operators. The most important is the CGMES (Common Grid Model Exchange Standard), but there are many additional application profiles with published RDF Schemas and SHACL constraints.

The European Network of Transmission System Operators for Electricity (ENTSO-E) and its member TSOs are among the most important drivers of the energy transition in the European electricity grid. A working toolchain for CIM XML would greatly support these projects. It could also lead to more software architectures that are truly graph-based and oriented on RDF, SPARQL, and SHACL.

In my opinion, it would help a lot if there was a Lang/parser in Jena that uses a given RDF Schema to determine the datatypes for parsing.

That parser could also provide the missing xml:base from the corresponding RDF Schema as a default.

I already have the necessary code snippets and sample data to build such a parser.

My current implementations all work as an implementation of StreamRDF. But that causes most of the object literal nodes and their triples to be created twice, once as a string literal and once parsed with the correct datatype.
An efficient implementation could modify the determination of the datatype in org.apache.jena.riot.lang.rdfxml.rrx.ParserRDFXML_SAX#startPropertyElement. I would like to avoid a second parser implementation and duplicated tests when the difference is so small. But I read the comment "parsing is sensitive to the JIT optimizer" by @afs on the mailing list yesterday, so I am not sure about the best approach here.

What do you think? Would the project benefit from CIM XML as Lang? What else should I consider?

Note: A corresponding CIM XML writer could be implemented later. But that would be another discussion.

afs · 2024-06-10T11:35:09Z

afs
Jun 10, 2024
Collaborator

To meet the objective of precisely following CIM XML then there should be a separate Lang with a derived RDF/XML parser.

Adding a Lang+parser is not so bad. Only differences need new tests.

There are items in CIM that are not correct RDF/XML.

While PR #2477 can be justified on the grounds of "compatibility", if the claim is "Jena parses CIM", there needs to be detailed compliance. The way to do that is parser adjusted for CIM, derived (subclass) from RRX. That gives the freedom for precise control.

What do other CIM parsers do?

Do they accept arbitrary (non-CIM) RDF/XML or enforce CIM features?

I would like to avoid a second parser implementation and duplicated tests when the difference is so small.

What I want to avoid is CIM features/profile of RDF/XML leaking back into general use. The Postel's law argument of "be conservative in what you send, be liberal in what you accept" only goes so far when people look to Jena as an accurate implementation of the W3C standards.

2 replies

afs Jun 10, 2024
Collaborator

This feels like it would fit in nicely as a new "extras" module jena-extras/jena-cim-xml.

arne-bdt Jun 10, 2024
Collaborator Author

I have been working with CIMXML and CGMES since 2015, and with Apache Jena since 2017. Only the RDF-Syntax User Guide from 2024 made me realize that I have not been too stupid to use Apache Jena properly all that time.
We discovered a workaround for every problem but never doubted the specifications, which stated, for example:

However, the CIMXML format (described in IEC 61970-552:2016) is based on RDF technology
and RDF-compliant software natively can validate CIMXML files against profiles,
or rather an ontology generated from the profile. (source)

The IEC standard claims they use a simplified RDF syntax, which is a proper subset of the RDF standard and can be deserialized by RDF software like SirPAC.

What do other CIM parsers do?
Do they accept arbitrary (non-CIM) RDF/XML or enforce CIM features?

--> So I am not sure if there are many real CIM parsers or rather a lot of pre- and post-processing around RDF parsers.

What I want to avoid is CIM features/profile of RDF/XML leaking back into general use.

I had absolutely no intention to make the RDF/XML parser more general or mix implementations. I wanted to know if a copy of the RDF parser was necessary or if there are any pain points when I inherit the parser, as I would need to split some methods and make them protected.

Implementation Levels

There are different levels at which CIMXML could be supported:

Datatype Parsing Only
- Reading all predicates and the corresponding datatypes from one or more RDFS files (there is an RDFS for the file header and one for the content).
- For each triple, trying to parse with the matching datatype.
- There could be warnings if a property was not part of the RDFS.
- There should be errors when a property could not be properly parsed.
Semantic Matching of CIMXML File Header model.profile Against the RDFS owl:versionIRI
- The given RDFS file must match the profile of the CIMXML file.
- Maybe a schema registry would be nice, where the corresponding profile is looked up by the header data.
Variant of 2, Where the File Header Data is Read into a Different Graph
- This makes sense since a separation of the file header is already planned when switching to JSON-LD.

In my opinion, level 1 would be best suited for Jena. It would pave the way and remove the most disturbing hurdles. That would suffice to load CIMXML with parsed datatypes into Jena. With a proper xml:base, users would stop removing the source path from every subject's IRI with code like BIND (strafter(str(?ConductingEquipment), "#") AS ?eqMRID) to join data from two different CIMXML files.

xml:base

In QUALITY OF CGMES DATASETS AND CALCULATIONS, some specialties around rdf:ID/rdf:about are explained, and they interestingly define xml:base=“urn:uuid:”. --> I have not tried using this as a base for the parser yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CIM XML (IEC 61970-552) as Lang based on current RDF/XML parser (read-only at first) #2529

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

CIM XML (IEC 61970-552) as Lang based on current RDF/XML parser (read-only at first) #2529

arne-bdt Jun 9, 2024 Collaborator

Replies: 1 comment · 2 replies

afs Jun 10, 2024 Collaborator

afs Jun 10, 2024 Collaborator

arne-bdt Jun 10, 2024 Collaborator Author

Implementation Levels

xml:base

arne-bdt
Jun 9, 2024
Collaborator

Replies: 1 comment 2 replies

afs
Jun 10, 2024
Collaborator

afs Jun 10, 2024
Collaborator

arne-bdt Jun 10, 2024
Collaborator Author