Replies: 1 comment 2 replies
-
To meet the objective of precisely following CIM XML then there should be a separate Lang with a derived RDF/XML parser. Adding a Lang+parser is not so bad. Only differences need new tests. There are items in CIM that are not correct RDF/XML. While PR #2477 can be justified on the grounds of "compatibility", if the claim is "Jena parses CIM", there needs to be detailed compliance. The way to do that is parser adjusted for CIM, derived (subclass) from RRX. That gives the freedom for precise control. What do other CIM parsers do? Do they accept arbitrary (non-CIM) RDF/XML or enforce CIM features?
What I want to avoid is CIM features/profile of RDF/XML leaking back into general use. The Postel's law argument of "be conservative in what you send, be liberal in what you accept" only goes so far when people look to Jena as an accurate implementation of the W3C standards. |
Beta Was this translation helpful? Give feedback.
-
The ENTSO-E RDF-SYNTAX USER GUIDE describes some key differences between CIM XML (IEC 61970-552) and RDF XML (W3C).
Currently, CIM XML is the dominant format for IT processes and standards defined by ENTSO-E for the European transport system operators. The most important is the CGMES (Common Grid Model Exchange Standard), but there are many additional application profiles with published RDF Schemas and SHACL constraints.
The European Network of Transmission System Operators for Electricity (ENTSO-E) and its member TSOs are among the most important drivers of the energy transition in the European electricity grid. A working toolchain for CIM XML would greatly support these projects. It could also lead to more software architectures that are truly graph-based and oriented on RDF, SPARQL, and SHACL.
In my opinion, it would help a lot if there was a Lang/parser in Jena that uses a given RDF Schema to determine the datatypes for parsing.
That parser could also provide the missing xml:base from the corresponding RDF Schema as a default.
I already have the necessary code snippets and sample data to build such a parser.
My current implementations all work as an implementation of StreamRDF. But that causes most of the object literal nodes and their triples to be created twice, once as a string literal and once parsed with the correct datatype.
An efficient implementation could modify the determination of the datatype in org.apache.jena.riot.lang.rdfxml.rrx.ParserRDFXML_SAX#startPropertyElement. I would like to avoid a second parser implementation and duplicated tests when the difference is so small. But I read the comment "parsing is sensitive to the JIT optimizer" by @afs on the mailing list yesterday, so I am not sure about the best approach here.
What do you think? Would the project benefit from CIM XML as Lang? What else should I consider?
Note: A corresponding CIM XML writer could be implemented later. But that would be another discussion.
Beta Was this translation helpful? Give feedback.
All reactions