-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NCZarr does not support reading of many zarr files #2474
Comments
@DennisHeimbigner do you have some insight? |
Sorry for the delay. It appears to be related to the use of '/' as the |
I was wrong. The problem is this (its complicated).
Since both are in the same group (the root group), this is an inconsistency |
OME NGFF specifies that "Metadata about an image can be found under the "multiscales" key in the group-level metadata." As far as I can tell, multiple arrays are designed to have different dimensions to represent different levels of the image pyramid. @joshmoore Am I interpreting this correctly? This might be contrary to NCZarr's constraints? Is there a way to reconcile this? Or is NCZarr a bad choice for Zarr-backend for NGFF implementation? |
I do not know anything about the OME standard, but why do you think |
@DennisHeimbigner: but that should only be the case for As a side note, @dzenanz, ome/ngff#114 is intended to play nicely with xarray and nczarr. |
It is actually an XArray problem since the _ARRAY_DIMENSIONS attribute |
It's definitely true that Zarr v2 is agnostic to
Not in Zarr. |
I was speaking more about nczarr and Xarray, than Zarr, because Zarr |
I assume that the highest resolution is used by default in Xarray. |
In any case, try changing the access url to include the following: |
I would be surprised if that was so. It would be seen as violating shape constraints, |
|
@DennisHeimbigner you are right.
|
I would suggest two changes to the OME format:
There may be other attributes that also should be changed. |
Suggestion noted, thanks, @DennisHeimbigner, but I don't think we should open that conversation here. Leaving nczarr, s3, and http out of the picture for a moment, I'm still struggling to understand whether you think these file-based pure-zarr examples from the original description should be covered by netcdf-c, @DennisHeimbigner:
In my mind, they should especially if we are pointing the community to this as the C implementation. |
Speaking for myself, I say no. The fact that they cannot be read by Xarray As for the case of
that is more a matter of opinion and I would really think that using a dictionary |
* re: Unidata#2278 * re: Unidata#2485 * re: Unidata#2474 This PR subsumes PR Unidata#2278. Actually is a bit an omnibus covering several issues. ## PR Unidata#2278 Add support for the Zarr string type. Zarr strings are restricted currently to be of fixed size. The primary issue to be addressed is to provide a way for user to specify the size of the fixed length strings. This is handled by providing the following new attributes special: 1. **_nczarr_default_maxstrlen** — This is an attribute of the root group. It specifies the default maximum string length for string types. If not specified, then it has the value of 64 characters. 2. **_nczarr_maxstrlen** — This is a per-variable attribute. It specifies the maximum string length for the string type associated with the variable. If not specified, then it is assigned the value of **_nczarr_default_maxstrlen**. This PR also requires some hacking to handle the existing netcdf-c NC_CHAR type, which does not exist in zarr. The goal was to choose numpy types for both the netcdf-c NC_STRING type and the netcdf-c NC_CHAR type such that if a pure zarr implementation read them, it would still work and an NC_CHAR type would be handled by zarr as a string of length 1. For writing variables and NCZarr attributes, the type mapping is as follows: * "|S1" for NC_CHAR. * ">S1" for NC_STRING && MAXSTRLEN==1 * ">Sn" for NC_STRING && MAXSTRLEN==n Note that it is a bit of a hack to use endianness, but it should be ok since for string/char, the endianness has no meaning. For reading attributes with pure zarr (i.e. with no nczarr atribute types defined), they will always be interpreted as of type NC_CHAR. ## Issue: Unidata#2474 This PR partly fixes this issue because it provided more comprehensive support for Zarr attributes that are JSON valued expressions. This PR still does not address the problem in that issue where the _ARRAY_DIMENSION attribute is incorrectly set. Than can only be fixed by the creator of the datasets. ## Issue: Unidata#2485 This PR also fixes the scalar failure shown in this issue. It generally cleans up scalar handling. It also adds a note to the documentation describing that NCZarr supports scalars while Zarr does not and also how scalar interoperability is achieved. ## Misc. Other Changes 1. Convert the nczarr special attributes and keys to be all lower case. So "_NCZARR_ATTR" now used "_nczarr_attr. Support back compatibility for the upper case names. 2. Cleanup my too-clever-by-half handling of scalars in libnczarr.
Partly fixed by #2492 |
@DennisHeimbigner, to clarify: is the intent for netcdf-c to be considered a Zarr library (as opposed to an nczarr library)?
@dzenanz: will you have a chance to test if the partial fix works for you? |
I am finishing up a grant proposal today and tomorrow, I plan to test #2492 later in the week. |
The primary goal is to read/write netcdf-4 stored in Zarr format datasets. On the other hand, we do not want to cut ourselves off from pure Zarr |
Thanks for the explanation, @DennisHeimbigner. I definitely understand the primary goal. But is there also a sub-goal of mapping all pure Zarr datasets to some subset of netcdf-4? |
Short answer is yes. For example, I have just added support for Zarr fixed length |
I guess I defer there to @dzenanz' testing since in #2474 (comment) there appear to be Zarr datasets that aren't openable. If it would help to copy them somewhere for testing, happy to do so. |
Trying to read |
That is because the OME use of _ARRAY_DIMENSION is still wrong and I |
Removing either "compressor": {
"blocksize": 0,
"clevel": 5,
"cname": "lz4",
"id": "blosc",
"shuffle": 1
} |
How did you install netcdf-c? Are you building it yourself? |
To have the latest version, I have to build it myself. My build setup is here: InsightSoftwareConsortium/ITKIOOMEZarrNGFF#6 |
So it looks like you are building with your own CMakeLists.txt and not |
I modified netCDF's CMakeLists.txt, in order to point to HDF5 bundled with ITK, and to avoid some duplicate targets with other libraries I use together. |
I am trying to read sample zarr files provided here: https://github.com/ome/ome-ngff-prototypes#data-availability
Am I doing something wrong? I was under impression that NCZarr would read most zarr files. What are the offending features in these examples which break NCZarr?
Ubuntu 22.04.1 LTS, GCC 11.2.0, libnetcdf-dev Version: 1:4.8.1-1.
The text was updated successfully, but these errors were encountered: