-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving a cube with a mesh does not include coordinate
attribute, as required by UGrid conventions
#5202
Comments
Iris can only load each coordinate from a NetCDF file once per data variable, and the edge/face coordinates are needed within Iris' This therefore relates to #4215 - Iris does not have an explicit 'contract' to perform file-to-file roundtripping, and is only guaranteed to preserve the information that is important to Iris. We can of course try to make allowances if preserving certain elements is important to users. Ways forward
|
@trexfeathers - round tripping is a side issue here - we're already pretty familiar with the issue that what we put in to iris is not necessarily what we get out the other end. What we do need to do is produce files that are CF compliant, which is what the round trip example here shows i.e. there is information that is expected/needed that is not being saved/handled. Based on the Iris docs statement of "Iris implements a data model based on the CF Conventions" I'd have naively expected that the item @hdyson flagged from http://ugrid-conventions.github.io/ugrid-conventions/#data-variables were being carried out, reproduced here for brevity:
regardless of whether info could be reconstructed from elsewhere in the file or not. Indeed, the UGRid conventions are such that they currently go for redundancy over brevity, meaning that any downstream system processing or ingesting UGrid data could reasonably expect information to come from one of a number of possible entry points. There would likely be no scope for changing things downstream as that would need work on XIOS itself... |
(N.B. the docs at: https://scitools-iris.readthedocs.io/en/latest/further_topics/ugrid/operations.html?highlight=ngvat#save also say that the save routine "saves the file in a UGRID-conformant format") |
Thanks for the detail @arjclark. I just talked through a hypothetical with @hdyson that helped me break the link between this and roundtripping concerns... During development we had interpreted the redundancy as meaning it didn't matter about the data variable so long as there were attributes in the mesh variable. But the interpretation in this issue is that the referenced section of the UGRID docs is THE correct way things should be. That would mean that: even if Iris loaded a file where edge/face coordinates were only referenced as attributes on the mesh variable, Iris should also add these as attributes on the data variable when saving. This sounds like a defensible idea to me, and simpler to implement than any of my previous suggestions. |
I think the result is CF (and UGRID) compliant. In this case, as the load occurs with UGRID interpretation "turned on", Iris has interpreted this data as mesh coordinates, rather than regular CF auxiliary coordinates, and outputs them as such. So, UGRID makes it clear that face/edge-coordinates are optional. ( BTW it happens that Iris itself cannot at present do without them, though that is a known undesirable limitation). More generally, while you can interpret UGRID data purely from a CF point of view, you will generally lose information as not all UGRID information is supplied in purely CF terms. |
I hadn't thought of that ! |
@pp-mo I think your first comment here is conflating mesh topology with data variables. I'm going to be explicit here about the distinction, in part for my own future reference. For data variables (i.e. netCDF variables with "The use of the coordinates attribute is copied from the CF-conventions. It is used to map the values of variables defined on the unstructured meshes directly to their location: latitude, longitude, or other spatial coordinates, and optional elevation. " Following on to the CF conventions, we have: "The latitude and longitude coordinates of a horizontal grid that was not defined as a Cartesian product of latitude and longitude axes, can sometimes be represented using two-dimensional coordinate variables. These variables are identified as coordinates by use of the coordinates attribute." For mesh variables (i.e. the netCDF variables referenced by a netCDF variable with a |
I see what you mean, and I agree that para does sound like it is saying that an entry in "coordinates" attribute should be present where possible.
I totally agree. Those, "variables referenced by a netCDF variable with a cf_role of topology" (? really I think For completeness though, I should explain that I think some of your terminology is not correct, according to my best understanding ... data variables
In CF terms, I believe that a data variable is effectively just any variable that does not have another role, such as being a coordinate (aka "dimension" coordinate in Iris terms), an auxiliary coordinate, a cell measure, a grid-mapping, a mesh or whatever. While the CF document makes repeated references to this term, I don't think there is a clear simple definition anywhere. I would say that, logically, data variables were originally intended to carry "dependent" values, while everything else is sort of a-priori. But I think the more recent acceptance of flag and ancillary variables does muddy that, since these contain dependent or measured information which is nevertheless subsidiary to the "main" data. At any rate, that definition by exclusion is precisely how the Iris loader works : if you remove the reference on another variable which identifies a variable as an aux-coord or cell-measure or grid-mapping, it simply becomes a data variable, and is presented in its own cube. This principle also applies to bounds variables (which are referenced by coordinates), and finally extends to mesh connectivities and coordinates -- which is where this entire discussion begins! BTW I must say that I do wish CF marked components unambiguously with their intended role, and not use this style of "classification by reference". mesh variables
"Mesh variable" is not a term used anywhere in the UGRID spec. (and, of course, CF itself has absolutely nothing to say about meshes, at least not yet). |
Yes, you're right - I should have said
I think this is where UGrid is more explicit: http://ugrid-conventions.github.io/ugrid-conventions/#data-variables does indicate that UGrid data variables need a The The At the risk of a tangent, I would expect potential future use cases to be mixing CF data variables and UGrid data variables in the same file. In the ancillary space, I would not rule out e.g. full 4D (time, hybrid height, mesh) aerosols mixed with zonal mean aerosols, for example. For some data variables, there's a science benefit for full 4D while for others the science benefit may be sufficiently negligible that the extra data storage for 4D is a waste.
I think there is a value in distinguishing between which netCDF variables are used to construct the UGrid mesh, and which are data that is on the UGrid mesh. If there's a better term than "mesh variable", I'd be happy to use it. |
I totally agree. But my point was that when UGRID refers to "data variables", it is not a UGRID concept, it is a CF one. In terms of their role, "UGRID data variables" are still CF data variables, which importantly means that they are still subject to all the usual interpretations of CF-controlled metadata (attributes), such as standard_name / long_name / units / axis / coordinates etc (all of CF appendix A). |
It certainly seems to me that this principle ought to extend to mesh-coordinates and connectivities also, but I don't think UGRID make a clear statement of that anywhere. |
It's really a big aside, but I want to point out a known problem that comes with this ... The LAM data is on faces, and its "primary" mesh coordinates are the usual node coordinates. Thus, I would prefer that we can say that they connect via the "mesh" attribute, and are not listed in "coordinates". So, we probably want to assume a rule (but it is nowhere stated) that, if a mesh is referenced by a data-variable, which also specifies a grid-mapping, then all the mesh-coordinates are referred to that coordinate system. Unfortunately the CDL examples in the UGRID spec are incomplete, and none of them actually contains a data-variable at all. Although how it should work is discussed in general in its own section. |
Probably a lot of the above content should do into separate discussions. Maybe even on the UGRID project, but unfortunately there seems to be little activity there at present. |
Would you like me to close this issue and open up a new one without the sidetracks? I think we do now have a consensus that the |
Not just yet, I think. But maybe we can start a general discussion. We can create a PR to target this, and further discuss any specific details there |
I've now raised this here |
Following offline conversation with @hdyson, I'm coming around to this view.
This also feeds back into the nascent conformance rules and the ugrid-checker |
I just put something very similar to this in an email to @pp-mo, so apologies for the redundancy but I think it's worth flagging somewhere more public. In the linked UGrid issue: ugrid-conventions/ugrid-conventions#63 there's discussion of the potential of UGrid compliant files that are not CF compliant. My personal opinion is that iris should aim to be writing mesh cubes that are both UGrid and CF compliant - I think the inconsistency between CF compliant times/vertical levels/etc in the same file as CF non-compliant meshes would lead to user confusion. |
Yes that scared me a bit too. I would certainly hope + expect Iris output to be both CF and UGRID compliant. |
I really think this one has run its course and we can't see the wood for the trees now. I'm going to close this ticket and open up a new issue with the iris specific parts of this, and let the conventions conversations carry on in the UGrid repository. |
🐛 Bug Report
It looks like the mesh support in iris is not setting the
coordinates
attribute for the data variables correctly. Worse, when working with a source data file that does have the correct attributes, these are being lost in the save. This is causing an issue when trying to load the resulting files into LFRic.How To Reproduce
Steps to reproduce the behaviour:
Expected behaviour
Saving a cube with a mesh should set the
coordinates
attribute for the data variables, as per: http://ugrid-conventions.github.io/ugrid-conventions/#data-variablesEnvironment
The text was updated successfully, but these errors were encountered: