Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Experimental Support for Zarr Version 3 #2918

Closed
wants to merge 8 commits into from

Conversation

DennisHeimbigner
Copy link
Collaborator

I am putting up this PR just to get it on the record. It is still failing the github action workflows for mingw and for Visual Studio for cmake. Despite my best efforts I cannot fix them.

Change Overview

  • Add experimental support for the Zarr Version 3 Specification. This is a bit of a moving target, so it is subject to change. This required major re-factoring of the NCZarr code.
  • As discussed in a recent netcdf meeting, convert NCZarr V2 to store all netcdf-4 specific info as attributes. Backward read-compatibility with the older NCZARR format is maintained.
  • Add a new directory -- v3_nczarr_tests -- to test out the Zarr V3 implementation. This requires duplicating the tests in directory "nczarr_test" into the directory "v3_nczarr_tests". This is a bit of a complicated hack and is documented in docs/internal.md. One consequence is that V3 testing cannot be done under CMake.
  • Fix the -Wconversion and -Wsign-compare warnings in libnczarr, nczarr_test, and v3_nczarr_test.

Specific Changes

Automake and CMake Build Changes

  • Convert the ENABLE_XXX options to NETCDF_ENABLE_XXX options (part of the cmake overhaul).
  • Disable the NETCDF_ENABLE_NETCDF4/NETCDF_ENABLE_NETCDF_4 options in favor of NETCDF_ENABLE_HDF5. This will cause complaints.
  • Remove the ENABLE_CLIENTSIDE_FILTERS options since has been disabled for a while.
  • Add option to specify the default Zarr format to use when creating datasets. It is currently set to use Zarr Version 2.

NCZarr Specific Changes

  • Add an "hdf5raw" codec format for cases where an hdf5 filter is known but no corresponding codec format is defined.
  • The Zarr/Codec API has been modified to support including a property list argument to pass information about which Zarr format (Version2 vs Version 2) is required.

Misc. Unrelated Changes

  • The remotetest server is currently unstable, so provide a mechanism to force disabling calls to remotetest.unidata.ucar.edu. This is enabled by adding a repository variable named REMOTETESTDOWN with the value "yes".
  • Fix CMakeLists.txt to use the uname command as an alternate to using the hostname command (which does not work under cygwin).

Notes

  • The current refactor to handle Version 3 is a stop-gap. At some point, it needs fixing to remove as much common code as possible to minimize the size of zformat2.c and zformat3.c
  • Additional refactoring will be needed to support the new array->array codecs.
  • The NCZarr related CMakeLists.txt needs work to conform to the new cmake build methods.

I am putting up this PR just to get it on the record.  It is
still failing the github action workflows for mingw and for
Visual Studio for cmake. Despite my best efforts I cannot fix
them.

## Change Overview
* Add experimental support for the Zarr Version 3 Specification.
  This is a bit of a moving target, so it is subject to change.
  This required major re-factoring of the NCZarr code.
* As discussed in a recent netcdf meeting, convert NCZarr V2 to
  store all netcdf-4 specific info as attributes.  Backward
  read-compatibility with the older NCZARR format is maintained.
* Add a new directory -- v3_nczarr_tests -- to test out the Zarr V3
  implementation. This requires duplicating the tests in directory
  "nczarr_test" into the directory "v3_nczarr_tests".
  This is a bit of a complicated hack and is documented in docs/internal.md.
* Fix the -Wconversion and -Wsign-compare warnings in
  libnczarr, nczarr_test, and v3_nczarr_test.

## Specific Changes

### Automake and CMake Build Changes
* Convert the ENABLE_XXX options to NETCDF_ENABLE_XXX options
  (part of the cmake overhaul).
* Disable the ENABLE_NETCDF4/ENABLE_NETCDF_4 options in favor of ENABLE_HDF5.
  This will cause complaints.
* Remove the ENABLE_CLIENTSIZE_FILTERS options since has been disabled for a while.
* Add option to specify the default Zarr format to use when creating datasets.
  It is currently set to use Zarr Version 2.

### NCZarr Specific Changes
* Add an "hdf5raw" codec format for cases where an hdf5 filter is known
  but no corresponding codec format is defined.
* The Zarr/Codec API has been modified to support including
  a property list argument to pass information about which Zarr
  format (Version2 vs Version 2) is required.

### Misc. Unrelated Changes
* The remotetest server is currently unstable, so provide a mechanism
  for force disabling calls to remotetest.unidata.ucar.edu.
  This is enabled by adding a repository variable named
  REMOTETESTDOWN with the value "yes".
* Fix CMakeLists.txt to use the uname command as an alternate
  to using the hostname command (which does not work under cygwin).

## Notes
* The current refactor to handle Version 3 is a stop-gap.
  At some point, it needs fixing to remove as much common code
  as possible to minimize the size of zformat2.c and zformat3.c
* Additional refactoring will be needed to support the new array->array
  codecs.
* The NCZarr related CMakeLists.txt need work to conform to the new
  cmake build methods.
@DennisHeimbigner DennisHeimbigner requested a review from WardF as a code owner May 5, 2024 01:31
@ZedThree
Copy link
Contributor

  • Convert the ENABLE_XXX options to NETCDF_ENABLE_XXX options (part of the cmake overhaul).
  • Disable the NETCDF_ENABLE_NETCDF4/NETCDF_ENABLE_NETCDF_4 options in favor of NETCDF_ENABLE_HDF5. This will cause complaints.
  • Remove the ENABLE_CLIENTSIDE_FILTERS options since has been disabled for a while.
  • Fix CMakeLists.txt to use the uname command as an alternate to using the hostname command (which does not work under cygwin).

Is there any chance that these changes could be pulled out into (at least one) separate PR? These add up to some really big changes that seem quite orthogonal to the main feature of this PR. A single commit touching 350+ files and adding 27k lines makes it really difficult to navigate and understand repo history when it has many unrelated changes like this.

I'm also worried that disabling NETCDF_ENABLE_NETCDF4 will break a lot of builds downstream. Why not just emit a warning that this has changed? That gives downstream some time to notice and change things.

@DennisHeimbigner
Copy link
Collaborator Author

I'm also worried that disabling NETCDF_ENABLE_NETCDF4 will break a lot of builds downstream. Why not just emit a warning that this has changed? That gives downstream some time to notice and change things.

Such a warning has been active for some time now and we have received no complaints. I can leave it
for another release iteration. For how many releases do you think we should wait?

@DennisHeimbigner
Copy link
Collaborator Author

I expect that the conversion of option names is the source of the non-zarr file changes.
I can do this as a separate PR, but those changes must be done immediately in order
to operate with the existing code base.

@DennisHeimbigner
Copy link
Collaborator Author

Closed to accomodate Comment #2918 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants