Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving a cube with a mesh does not include coordinate attribute, as required by UGrid conventions #5202

Closed
hdyson opened this issue Mar 20, 2023 · 20 comments
Assignees

Comments

@hdyson
Copy link
Contributor

hdyson commented Mar 20, 2023

🐛 Bug Report

It looks like the mesh support in iris is not setting the coordinates attribute for the data variables correctly. Worse, when working with a source data file that does have the correct attributes, these are being lost in the save. This is causing an issue when trying to load the resulting files into LFRic.

How To Reproduce

Steps to reproduce the behaviour:

In [1]: import iris

In [2]: # Using scitools default version – same behaviour is seen with default-next too.

In [3]: iris.__file__
Out[3]: '.../default-2022_11_22/lib/python3.9/site-packages/iris/__init__.py'

In [4]: # Example data does have coordinate attributes for the data variables:

In [5]: !ncdump -h ./lib/ants/tests/resources/stock/data_C4.nc | grep 'coordinates'
           sample_data:coordinates = "example_C4_face_x example_C4_face_y" ;
           additional_sample_data:coordinates = "example_C4_face_x example_C4_face_y" ;
           example_C4:node_coordinates = "example_C4_node_y example_C4_node_x" ;
           example_C4:face_coordinates = "example_C4_face_y example_C4_face_x" ;

In [6]: # (First two results are attributes on data variables, second two are for topology attributes)

In [7]: from iris.experimental.ugrid import PARSE_UGRID_ON_LOAD

In [8]: with PARSE_UGRID_ON_LOAD.context():
   ...:     cubes = iris.load('./lib/ants/tests/resources/stock/data_C4.nc')
   ...: 

In [9]: iris.save(cubes, 'result.nc')

In [10]: # Notice that the coordinate attributes have been dropped from the data variables

In [11]: !ncdump -h result.nc | grep 'coordinates'
           example_C4:node_coordinates = "example_C4_node_x example_C4_node_y" ;
           example_C4:face_coordinates = "example_C4_face_x example_C4_face_y" ;

In [12]: # Full ncdump for completeness

In [13]: !ncdump -h result.nc
netcdf result {
dimensions:
     num_node = 98 ;
     dim0 = 96 ;
     example_C4_face_N_nodes = 4 ;
variables:
     int example_C4 ;
           example_C4:cf_role = "mesh_topology" ;
           example_C4:topology_dimension = 2 ;
           example_C4:long_name = "Topology data of 2D unstructured mesh" ;
           example_C4:node_coordinates = "example_C4_node_x example_C4_node_y" ;
           example_C4:face_coordinates = "example_C4_face_x example_C4_face_y" ;
           example_C4:face_node_connectivity = "face_nodes" ;
     double example_C4_node_x(num_node) ;
           example_C4_node_x:units = "degrees_east" ;
           example_C4_node_x:standard_name = "longitude" ;
           example_C4_node_x:long_name = "longitude of 2D mesh nodes." ;
     double example_C4_node_y(num_node) ;
           example_C4_node_y:units = "degrees_north" ;
           example_C4_node_y:standard_name = "latitude" ;
           example_C4_node_y:long_name = "latitude of 2D mesh nodes." ;
     double example_C4_face_x(dim0) ;
           example_C4_face_x:units = "degrees_east" ;
           example_C4_face_x:standard_name = "longitude" ;
           example_C4_face_x:long_name = "longitude of 2D face centres" ;
     double example_C4_face_y(dim0) ;
           example_C4_face_y:units = "degrees_north" ;
           example_C4_face_y:standard_name = "latitude" ;
           example_C4_face_y:long_name = "latitude of 2D face centres" ;
     int face_nodes(dim0, example_C4_face_N_nodes) ;
           face_nodes:long_name = "Maps every quadrilateral face to its four corner nodes." ;
           face_nodes:Conventions = "UGRID-1.0" ;
           face_nodes:cf_role = "face_node_connectivity" ;
           face_nodes:start_index = 1 ;
     double sample_data(dim0) ;
           sample_data:long_name = "sample_data" ;
           sample_data:units = "1" ;
           sample_data:mesh = "example_C4" ;
           sample_data:location = "face" ;
     double additional_sample_data(dim0) ;
           additional_sample_data:long_name = "additional_sample_data" ;
           additional_sample_data:units = "1" ;
           additional_sample_data:mesh = "example_C4" ;
           additional_sample_data:location = "face" ;

// global attributes:
           :online_operation = "once" ;
           :Conventions = "CF-1.7" ;
}

Expected behaviour

Saving a cube with a mesh should set the coordinates attribute for the data variables, as per: http://ugrid-conventions.github.io/ugrid-conventions/#data-variables

Environment

  • OS & Version: RHEL7
  • Iris Version: 3.3.1 and 3.4.0
@trexfeathers
Copy link
Contributor

trexfeathers commented Mar 20, 2023

Iris can only load each coordinate from a NetCDF file once per data variable, and the edge/face coordinates are needed within Iris' Mesh object before it can be attached to a Cube (via a MeshCoord). Since the coordinates are being preserved within the Mesh, we have so far not worried about them not also being associated directly with the data variable.

This therefore relates to #4215 - Iris does not have an explicit 'contract' to perform file-to-file roundtripping, and is only guaranteed to preserve the information that is important to Iris. We can of course try to make allowances if preserving certain elements is important to users.

Ways forward

  • It would take a lot of work to represent cases like this within Iris' data model.
  • We could maybe make NetCDF save more configurable, although it is already famously complex.
  • I know file-to-file roundtripping problems are some of the motivating cases for @pp-mo's nc-data idea.
  • I don't know how much scope there is to change things downstream, but the saved file still associates example_C4_face_x with sample_data, exclusively via the Mesh - example_C4.

@arjclark
Copy link

@trexfeathers - round tripping is a side issue here - we're already pretty familiar with the issue that what we put in to iris is not necessarily what we get out the other end.

What we do need to do is produce files that are CF compliant, which is what the round trip example here shows i.e. there is information that is expected/needed that is not being saved/handled. Based on the Iris docs statement of "Iris implements a data model based on the CF Conventions" I'd have naively expected that the item @hdyson flagged from http://ugrid-conventions.github.io/ugrid-conventions/#data-variables were being carried out, reproduced here for brevity:

double Mesh2_waterlevel(time,nMesh2_face) ;
Mesh2_waterlevel:standard_name = "sea_surface_height_above_geoid" ;
Mesh2_waterlevel:units = "m" ;
Mesh2_waterlevel:mesh = "Mesh2"
Mesh2_waterlevel:location = "face" ;
Mesh2_waterlevel:coordinates = "Mesh2_face_x Mesh2_face_y" ;

regardless of whether info could be reconstructed from elsewhere in the file or not. Indeed, the UGRid conventions are such that they currently go for redundancy over brevity, meaning that any downstream system processing or ingesting UGrid data could reasonably expect information to come from one of a number of possible entry points.

There would likely be no scope for changing things downstream as that would need work on XIOS itself...

@arjclark
Copy link

(N.B. the docs at: https://scitools-iris.readthedocs.io/en/latest/further_topics/ugrid/operations.html?highlight=ngvat#save also say that the save routine "saves the file in a UGRID-conformant format")

@trexfeathers trexfeathers self-assigned this Mar 20, 2023
@trexfeathers
Copy link
Contributor

Thanks for the detail @arjclark. I just talked through a hypothetical with @hdyson that helped me break the link between this and roundtripping concerns...

During development we had interpreted the redundancy as meaning it didn't matter about the data variable so long as there were attributes in the mesh variable. But the interpretation in this issue is that the referenced section of the UGRID docs is THE correct way things should be.

That would mean that: even if Iris loaded a file where edge/face coordinates were only referenced as attributes on the mesh variable, Iris should also add these as attributes on the data variable when saving. This sounds like a defensible idea to me, and simpler to implement than any of my previous suggestions.

@pp-mo
Copy link
Member

pp-mo commented Mar 20, 2023

What we do need to do is produce files that are CF compliant ... saves the file in a UGRID-conformant format

I think the result is CF (and UGRID) compliant.

In this case, as the load occurs with UGRID interpretation "turned on", Iris has interpreted this data as mesh coordinates, rather than regular CF auxiliary coordinates, and outputs them as such.
No information is lost in this approach, it just means that the result must be interpreted with knowledge of UGRID.

So, UGRID makes it clear that face/edge-coordinates are optional.
These are the only "mesh" coordinates that can normally be treated as regular CF auxiliary coordinates -- i.e. listed in a data "coordinates" attribute : The required location info is the "node coordinates", but these can't be treated as aux-coords as they aren't indexed by a dimension of the data (for face/edge-based data).
So I think that inclusion in "coordinates" is basically also optional, and in fact redundant.
E.G. UGRID says, under "2d mesh topology", that "These auxiliary coordinate variables will have length nFaces and nEdges respectively, and may have in turn a bounds attribute that specifies the bounding coordinates of the face or edge (thereby duplicating the data in the node_coordinates)".

( BTW it happens that Iris itself cannot at present do without them, though that is a known undesirable limitation).

More generally, while you can interpret UGRID data purely from a CF point of view, you will generally lose information as not all UGRID information is supplied in purely CF terms.
E.G. In this case, if this data had not had 'face_coordinates', then in a purely CF view it would have no location information at all, since that is then only given via the mesh (i.e. node coordinates and face-node-connectivity).

@pp-mo
Copy link
Member

pp-mo commented Mar 20, 2023

Iris should also add these as attributes on the data variable when saving

I hadn't thought of that !
I guess it would be possible, and if so a sensible "defensive" approach.

@hdyson
Copy link
Contributor Author

hdyson commented Mar 20, 2023

@pp-mo I think your first comment here is conflating mesh topology with data variables. I'm going to be explicit here about the distinction, in part for my own future reference.

For data variables (i.e. netCDF variables with mesh and location attributes), the UGrid conventions are pretty clear:

"The use of the coordinates attribute is copied from the CF-conventions. It is used to map the values of variables defined on the unstructured meshes directly to their location: latitude, longitude, or other spatial coordinates, and optional elevation. "

Following on to the CF conventions, we have:

"The latitude and longitude coordinates of a horizontal grid that was not defined as a Cartesian product of latitude and longitude axes, can sometimes be represented using two-dimensional coordinate variables. These variables are identified as coordinates by use of the coordinates attribute."

For mesh variables (i.e. the netCDF variables referenced by a netCDF variable with a cf_role of topology), on the other hand, I don't think the coordinates attribute should be set at all. The coordinates attribute only really has meaning for a data variable.

@pp-mo
Copy link
Member

pp-mo commented Mar 20, 2023

@hdyson For data variables ... "The use of the coordinates attribute is copied from the CF-conventions"

I see what you mean, and I agree that para does sound like it is saying that an entry in "coordinates" attribute should be present where possible.
I think as @trexfeathers said, this is probably possible.

For mesh variables (i.e. the netCDF variables referenced by a netCDF variable with a cf_role of topology), on the other hand, I don't think the coordinates attribute should be set at all. The coordinates attribute only really has meaning for a data variable.

I totally agree. Those, "variables referenced by a netCDF variable with a cf_role of topology" (? really I think mesh_topology ?) : these are all either coordinates (of mesh elements = nodes/edges/faces etc), or connectivities : I don't think either could have "coordinates" of its own.


For completeness though, I should explain that I think some of your terminology is not correct, according to my best understanding ...

data variables

(i.e. netCDF variables with mesh and location attributes)

In CF terms, I believe that a data variable is effectively just any variable that does not have another role, such as being a coordinate (aka "dimension" coordinate in Iris terms), an auxiliary coordinate, a cell measure, a grid-mapping, a mesh or whatever.

While the CF document makes repeated references to this term, I don't think there is a clear simple definition anywhere.
It is mentioned in the "data model" appendix here and the following diagram : that doesn't really enable you to define a programmatic way of identifying one, but it does show its intended role.

I would say that, logically, data variables were originally intended to carry "dependent" values, while everything else is sort of a-priori. But I think the more recent acceptance of flag and ancillary variables does muddy that, since these contain dependent or measured information which is nevertheless subsidiary to the "main" data.

At any rate, that definition by exclusion is precisely how the Iris loader works : if you remove the reference on another variable which identifies a variable as an aux-coord or cell-measure or grid-mapping, it simply becomes a data variable, and is presented in its own cube. This principle also applies to bounds variables (which are referenced by coordinates), and finally extends to mesh connectivities and coordinates -- which is where this entire discussion begins!
( So, the face/edge coordinates are treated as belonging to the mesh, and not to the primary data variable. )

BTW I must say that I do wish CF marked components unambiguously with their intended role, and not use this style of "classification by reference".
But this is very longstanding so I guess simply "it is what it is".

mesh variables

(i.e. the netCDF variables referenced by a netCDF variable with a cf_role of topology)

"Mesh variable" is not a term used anywhere in the UGRID spec. (and, of course, CF itself has absolutely nothing to say about meshes, at least not yet).
I would say that if anything this "should" mean simply "the variable which contains a mesh" (or which is a mesh).
It's just personal preference obviously, but if I said that somewhere, that was definitely what I meant by it !
I would not use it to mean "a data variable which references a mesh", as opposed to "a 'normal' data variable" which does not.

@hdyson
Copy link
Contributor Author

hdyson commented Mar 21, 2023

I totally agree. Those, "variables referenced by a netCDF variable with a cf_role of topology" (? really I think mesh_topology ?)

Yes, you're right - I should have said mesh_topology.

In CF terms, I believe that a data variable is effectively just any variable that does not have another role, such as being a coordinate (aka "dimension" coordinate in Iris terms), an auxiliary coordinate, a cell measure, a grid-mapping, a mesh or whatever.

I think this is where UGrid is more explicit:

http://ugrid-conventions.github.io/ugrid-conventions/#data-variables does indicate that UGrid data variables need a mesh and location attribute.

The mesh attribute is definitely needed when we get to the point where there's multiple meshes in the same file.

The location attribute is needed to distinguish between e.g. face centred and node centred fields. In the ancillary space we'll need that for at least orography. It is likely that we will need ancillary files with two data variables with the standard name of surface_altitude that will only be distinguishable by the location attribute (i.e. node and face centred surface altitudes).

At the risk of a tangent, I would expect potential future use cases to be mixing CF data variables and UGrid data variables in the same file. In the ancillary space, I would not rule out e.g. full 4D (time, hybrid height, mesh) aerosols mixed with zonal mean aerosols, for example. For some data variables, there's a science benefit for full 4D while for others the science benefit may be sufficiently negligible that the extra data storage for 4D is a waste.

"Mesh variable" is not a term used anywhere in the UGRID spec.

I think there is a value in distinguishing between which netCDF variables are used to construct the UGrid mesh, and which are data that is on the UGrid mesh. If there's a better term than "mesh variable", I'd be happy to use it.

@pp-mo
Copy link
Member

pp-mo commented Mar 21, 2023

I think this is where UGrid is more explicit ... UGrid data variables need a mesh and location attribute

I totally agree. But my point was that when UGRID refers to "data variables", it is not a UGRID concept, it is a CF one.
So I would say that we don't have a definite distinction between a CF data variable and a UGRID one,
rather that a UGRID data variable is just a CF data variable that happens to also have a "mesh" attribute
(in which case, it definitely ought to have a 'location;' attribute also).

In terms of their role, "UGRID data variables" are still CF data variables, which importantly means that they are still subject to all the usual interpretations of CF-controlled metadata (attributes), such as standard_name / long_name / units / axis / coordinates etc (all of CF appendix A).
I think it is exactly this which allows us UGRID to assume that a lot of CF still works, and they are a pure "extension" of it.

@pp-mo
Copy link
Member

pp-mo commented Mar 21, 2023

"UGRID data variables" are still CF data variables ... a lot of CF still works

It certainly seems to me that this principle ought to extend to mesh-coordinates and connectivities also, but I don't think UGRID make a clear statement of that anywhere.
And I'm just not clear yet on whether it might be a problem or not : It will not matter for "generic" concepts from netcdf- or CF-conventions, which I think apply to all variables in a netcdf dataset. It just might matter for information that is specific to a certain CF role (like coordinates or flag variables).
So, in particular, I think we want a mesh-coordinate to behave like an auxiliary coordinate, but if it is not an actual auxiliary coordinate of some data-variable (however defined), then maybe there could be a problem with that ?
The Iris UGRID loading is, naturally, just assuming some of this at present !!

@pp-mo
Copy link
Member

pp-mo commented Mar 21, 2023

I think we want a mesh-coordinate to behave like an auxiliary coordinate

It's really a big aside, but I want to point out a known problem that comes with this ...
One thing that is definitely missing at present is a way for mesh coordinates to be given in an alternative coordinate system. This has already cropped up here at the MetOffice with LAM data which uses a grid based on rotated-pole coordinates.
The problem is basically that "classification by reference" mechanism again ...

The LAM data is on faces, and its "primary" mesh coordinates are the usual node coordinates.
We would really like these to be given in the rotated coordinate system, which implies they should have standard_names of grid_longitude and grid_latitude, and units of degrees.
However, they probably should not be listed in a data-variable "coordinates" attribute as auxiliary coordinates, since they index over the node dimension which is not a dimension of the data variable (naturally, the data has a face dimension,).

Thus, I would prefer that we can say that they connect via the "mesh" attribute, and are not listed in "coordinates".
But in CF, the only way that a coordinate is "connected" to a grid-mapping is by both being referenced by a data variable.
( Aside : I have always thought this a very odd and unobvious mechanism : Whereas it might seem natural to assume that a coordinate "has" a unique coordinate system, in CF this is simply not the case : Logically, it is even possible that the same coordinate value arrays can be used in different coordinate systems for different data.
)

So, we probably want to assume a rule (but it is nowhere stated) that, if a mesh is referenced by a data-variable, which also specifies a grid-mapping, then all the mesh-coordinates are referred to that coordinate system.
But this approach could prove inflexible in the future, and such a principle is certainly not stated anywhere.

Unfortunately the CDL examples in the UGRID spec are incomplete, and none of them actually contains a data-variable at all. Although how it should work is discussed in general in its own section.
In fact I believe we should definitely avoid depending on the UGRID CDL examples too much, as they are incomplete and in some places seemingly wrong (there are certainly some typos, which indicate that they have not been actualised + tested).

@pp-mo
Copy link
Member

pp-mo commented Mar 21, 2023

Probably a lot of the above content should do into separate discussions. Maybe even on the UGRID project, but unfortunately there seems to be little activity there at present.

@hdyson
Copy link
Contributor Author

hdyson commented Mar 21, 2023

Probably a lot of the above content should do into separate discussions. Maybe even on the UGRID project, but unfortunately there seems to be little activity there at present.

Would you like me to close this issue and open up a new one without the sidetracks? I think we do now have a consensus that the coordinates attribute should be included in the iris save (regardless of whether it was originally present or not, and regardless of whether the cube with a mesh came from a source file or not), so creating a clean issue for that may simplify things.

@pp-mo
Copy link
Member

pp-mo commented Mar 21, 2023

Would you like me to close this issue and open up a new one

Not just yet, I think.
I want to make some more replies here, to issue raised here.

But maybe we can start a general discussion.

We can create a PR to target this, and further discuss any specific details there

@pp-mo
Copy link
Member

pp-mo commented Mar 21, 2023

a big aside, but I want to point ... definitely missing at present is a way for mesh coordinates to be given in an alternative coordinate system

I've now raised this here

@pp-mo
Copy link
Member

pp-mo commented Mar 21, 2023

@arjclark What we do need to do is produce files that are CF compliant ... saves the file in a UGRID-conformant format

@pp-mo I think the result is CF (and UGRID) compliant.

Following offline conversation with @hdyson, I'm coming around to this view.
In which case, it will make sense for Iris to

  1. issue a warning if this connection is "missing" in input
  2. include relevant mesh coordinates in every data variable "coordinates" attributes.

This also feeds back into the nascent conformance rules and the ugrid-checker

@hdyson
Copy link
Contributor Author

hdyson commented Mar 22, 2023

I just put something very similar to this in an email to @pp-mo, so apologies for the redundancy but I think it's worth flagging somewhere more public. In the linked UGrid issue: ugrid-conventions/ugrid-conventions#63 there's discussion of the potential of UGrid compliant files that are not CF compliant.

My personal opinion is that iris should aim to be writing mesh cubes that are both UGrid and CF compliant - I think the inconsistency between CF compliant times/vertical levels/etc in the same file as CF non-compliant meshes would lead to user confusion.

@pp-mo
Copy link
Member

pp-mo commented Mar 22, 2023

UGrid compliant files that are not CF compliant

Yes that scared me a bit too.
I have posted a bit of a request for clarification

I would certainly hope + expect Iris output to be both CF and UGRID compliant.
We have an issue to include UGRID in the Conventions declaration #5030

@hdyson
Copy link
Contributor Author

hdyson commented Mar 23, 2023

I really think this one has run its course and we can't see the wood for the trees now. I'm going to close this ticket and open up a new issue with the iris specific parts of this, and let the conventions conversations carry on in the UGrid repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants