Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage questions & feedback #121

Open
will-moore opened this issue Jan 17, 2025 · 18 comments
Open

Usage questions & feedback #121

will-moore opened this issue Jan 17, 2025 · 18 comments

Comments

@will-moore
Copy link

I'm testing the generation of OME-Zarr metadata with this script (then checking the output in ome-ngff-validator)...

from ome_zarr_models.v04.axes import Axis
from ome_zarr_models.v04.coordinate_transformations import (
    VectorScale,
    VectorTranslation,
)
from ome_zarr_models.v04.image import ImageAttrs
from ome_zarr_models.v04.omero import Channel, Omero, Window
from ome_zarr_models.v04.multiscales import Dataset, Multiscale
import os
from shutil import rmtree

import zarr

if os.path.exists("write_image.zarr"):
    rmtree("write_image.zarr")

pixel_sizes = (1, 0.45, 0.34, 0.34)
dataset_scales = [1, 2, 4]

# write Zarr v2 arrays manually...(pixel data omitted)
store = zarr.DirectoryStore('write_image.zarr', dimension_separator='/')
root = zarr.group(store=store)
for f in dataset_scales:
    root.create_dataset(f"scale{f}", shape=(1, 512/f, 512/f, 512/f), chunks=(1, 32, 32, 32), dtype='uint8')

# create the image metadata
axes = (
    Axis(name="c", type="channel", unit=None),
    Axis(name="z", type="space", unit="meter"),
    Axis(name="x", type="space", unit="meter"),
    Axis(name="y", type="space", unit="meter"),
)
datasets = []
for f in dataset_scales:
    transforms_dset = (VectorScale.build((1, 0.45 * f, 0.34 * f, 0.34 * f)),
                        VectorTranslation.build((0, 0, 0, 0)))
    datasets.append(
        Dataset(path=f"scale{f}", coordinateTransformations=transforms_dset)
    )

multi = Multiscale(axes=axes, datasets=tuple(datasets), version="0.4")
w = Window(min=0, max=1024, start=100, end=200)
channel = Channel(color="FF0000", window=w)
om = Omero(channels=[channel])

image = ImageAttrs(multiscales=[multi], omero=om)

# populate the zarr group with the image metadata
for k, v in image.model_dump().items():
    root.attrs[k] = v

A couple of questions:

  • Initially I omitted to add the version="0.4" to the Multiscale. Although the spec doesn't require it (level is SHOULD), it causes issues in the validator as we don't know which schemas to load etc. Since the class is ome_zarr_models.v04.multiscales.Multiscale I wonder if it should automatically populate the version in this case?
  • With the code above, the validator reports the data is invalid since /multiscales/0/name is not a string (since it is null in the JSON). Although it is optional, if it's present it must be a string. Also multiscale.type and multiscale.metadata are also null. Maybe there's a better way to populate the zarr.attrs from the model that doesn't include null items?
  • Minor point: All the model classes are nice and easy to guess, apart from ImageAttrs. I guessed Image first and got some strange error.
  • Any other feedback on this example? I wondered if the docs / tutorial would benefit from something like this?

Cheers

@d-v-b
Copy link
Collaborator

d-v-b commented Jan 17, 2025

  • Maybe there's a better way to populate the zarr.attrs from the model that doesn't include null items?

I think on each class where this is an issue, we can have a private class variable that lists the names of the fields that shouldn't be serialized if they are None, and when serializing to JSON we skip all the fields in that list if the value of the field is None. This worked for me over in pydantic-ome-ngff

@dstansby
Copy link
Collaborator

Thanks for giving it a spin!

Initially I omitted to add the version="0.4" to the Multiscale. Although the spec doesn't require it (level is SHOULD), it causes issues in the validator as we don't know which schemas to load etc. Since the class is ome_zarr_models.v04.multiscales.Multiscale I wonder if it should automatically populate the version in this case?

I'm afraid the spec doesn't require it, then we won't automatically populate it, in line with our ethos of sticking rigidly to the spec. At least I think this is solved in version 0.5 of the spec now, with version mandated?

With the code above, the validator reports the data is invalid since /multiscales/0/name is not a string (since it is null in the JSON). Although it is optional, if it's present it must be a string. Also multiscale.type and multiscale.metadata are also null. Maybe there's a better way to populate the zarr.attrs from the model that doesn't include null items?

Ooh, this is a bug on our part. We claim to not be serialising null at all (see known issues), so that should be fixed.

All the model classes are nice and easy to guess, apart from ImageAttrs. I guessed Image first and got some strange error.

What error did you get? As far as I know it should be Image (and our API docs agree that Image should exist)

I wondered if the docs / tutorial would benefit from something like this?

Yes please! We haven't got round to working out the best way of creating objects and then using them to write out metadata, so if you have done that a PR to add it to the tutorial would be great!

@will-moore
Copy link
Author

With:

from ome_zarr_models.v04.image import Image
...
image = Image(multiscales=[multi], omero=om)

I get:

Traceback (most recent call last):
  File "/Users/wmoore/Desktop/python-scripts/zarr_scripts/ome-zarr-models-py/write_image_metadata.py", line 47, in <module>
    image = Image(multiscales=[multi], omero=om)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wmoore/opt/anaconda3/envs/test_zarr_libs_py312/lib/python3.12/site-packages/pydantic/main.py", line 214, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 2 validation errors for Image
multiscales
  Extra inputs are not permitted [type=extra_forbidden, input_value=[Multiscale(axes=(Axis(na..., name=None, type=None)], input_type=list]
    For further information visit https://errors.pydantic.dev/2.10/v/extra_forbidden
omero
  Extra inputs are not permitted [type=extra_forbidden, input_value=Omero(channels=[Channel(c...art=100.0, end=200.0))]), input_type=Omero]
    For further information visit https://errors.pydantic.dev/2.10/v/extra_forbidden

@will-moore
Copy link
Author

I see under "Known Issues" you have: "Any fields set to None in the Python objects are currently not written when they are saved back to the JSON metadata using this package." But it doesn't say how you should be saving the JSON metadata?

Maybe this isn't the way to do it?:

for k, v in image.model_dump().items():
    root.attrs[k] = v

@dstansby
Copy link
Collaborator

Ah right, the Image vs. ImageAttrs thing is that Image represents the entire Zarr group, including attributes, and ImageAttrs represents the OME-Zarr attributes. So using ImageAttrs to create the attributes is expected. I tried to explain this at the top of our API docs, but suggestions for improvement there definitely welcome.

But it doesn't say how you should be saving the JSON metadata?

I think when we wrote that we were expecting the model to be serialised using Pydantic, which I think you're doing using model_dump()? Something probably needs fixing so model_dump() doesn't include None values.

@imagejan
Copy link
Contributor

Something probably needs fixing so model_dump() doesn't include None values.

model_dump() or model_dump_json() should be used with exclude_none=True

Maybe we should have our own utility method that serlializes the model that way by default?

@dstansby
Copy link
Collaborator

Maybe we should have our own utility method that serlializes the model that way by default?

I think this is a great idea 👍 - does anyone want to have a go implementing this?

For now, you could use the exclude_none argument to model_dump() to avoid writing out None values.

@d-v-b
Copy link
Collaborator

d-v-b commented Jan 30, 2025

if you add a new method, then pydantic will not know about it, and your model will then not serialize correctly. This will impair embedding these models inside other pydantic data structures.

I would recommend just copying exactly (or improving) what I did over in pydantic-ome-ngff, which was to hard-code for each model the fields that should not be serialized if they are None, and override the default pydantic serialization on such models to check and exclude such fields. You could distinguish between json and dict serialization if you want.

Just using exclude_none is probably not safe, because there is no guarantee that all None values will always be excluded. In 0.4 there are fields that are untyped, so None is valid there, and should be serialized.

@dstansby
Copy link
Collaborator

Could we just use the .to_zarr() method in pydantic-zarr instead? I think that would solve the "I want to write some OME-Metadata from scratch" use case.

@d-v-b
Copy link
Collaborator

d-v-b commented Jan 30, 2025

# populate the zarr group with the image metadata
for k, v in image.model_dump().items():
    root.attrs[k] = v

yeah I missed this part of the issue -- users should never have to write code like this. to_zarr exists for that reason.

@d-v-b
Copy link
Collaborator

d-v-b commented Jan 30, 2025

similarly, users don't need to manually create zarr arrays, or a zarr group. to_zarr takes care of that.

@will-moore
Copy link
Author

@d-v-b Do you mean that to_zarr already does that, or that's something that's planned?

So, in my example above, if you missed out the creation of root group and root.create_dataset() calls and at the end you did

root = image.to_zarr()

how would that also handle the array creation (since you've not provided any data, shape, dtype etc)?

@d-v-b
Copy link
Collaborator

d-v-b commented Jan 30, 2025

to_zarr can already create arrays and groups. see this example in the docs: https://zarr.dev/pydantic-zarr/usage_zarr_v2/#flattening-and-unflattening-zarr-hierarchies. You have to tell pydantic-zarr what properties your arrays have, and then associate those arrays with a group, but after that you can create the entire zarr hierarchy in one shot.

There are utilities in ome-zarr-models-py for creating ome-zarr metadata from arrays + other parameters, but they are only used in tests because not everyone was convinced that these utilities would be useful. maybe it's time to revisit that decision.

@d-v-b
Copy link
Collaborator

d-v-b commented Jan 30, 2025

@will-moore look at this example in the tests:

group_model = from_arrays(
arrays=arrays,
axes=(Axis(name="x", type="space"), Axis(name="y", type="space")),
paths=array_names,
scales=((1, 1), (2, 2)),
translations=((0, 0), (0.5, 0.5)),
)

I think this is the kind of API we should be aiming for.

@will-moore
Copy link
Author

That looks nice, yes.
If I could specify that my pixel size was e.g. sizes = (1, 0.45, 0.34, 0.34) (from above) if it could then calculate all the scales and translations for me, based on the array dimensions, that would be even nicer (and avoid me making mistakes)!

Otherwise everyone will have to write their own helpers to do that.

@d-v-b
Copy link
Collaborator

d-v-b commented Jan 30, 2025

it's not possible to calculate the translations from the pixel size, because that requires knowledge about how the image was downsampled. I address this by using xarray for downsampling, and using bindings between xarray and ome-zarr to ensure that the translation information gets propagated.

@dstansby
Copy link
Collaborator

not everyone was convinced that these utilities would be useful.

My reluctance here was having to re-implement the interface/function signature in zarr-python (or another implementation) for making Zarr arrays.

How about a model where you create your metadata without any arrays, and then add arrays to it, e.g., in add_multiscale_array() below. When an array is added the metadata is saved to the zarr group, but no data actually written.

from ome_zarr_models.v04.axes import Axis
from ome_zarr_models.v04.image import Image, ImageAttrs
from ome_zarr_models.v04.multiscales import Multiscale

multiscale = Multiscale(axes=(Axis(name="x")), datasets=())
attrs = ImageAttrs(multiscales=[])
image = Image(attributes=attrs)
image.to_zarr(...)  # This just writes out groups, not arrays


image.add_multiscale_array(path=..., arr=zarr.empty(...), transform=...)

I haven't thought it this would work for other groups (e.g., HCS), but might be a way forward without us having to re-implement function signatures from zarr-python?

@dstansby
Copy link
Collaborator

Oh actually this is just a variation of #121 (comment)... so I think we agree that something like #121 (comment) or my comment above would be useful - would anyone be up for putting together a draft PR for how this could work for Image?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants