Usage questions & feedback #121

will-moore · 2025-01-17T11:49:13Z

I'm testing the generation of OME-Zarr metadata with this script (then checking the output in ome-ngff-validator)...

from ome_zarr_models.v04.axes import Axis
from ome_zarr_models.v04.coordinate_transformations import (
    VectorScale,
    VectorTranslation,
)
from ome_zarr_models.v04.image import ImageAttrs
from ome_zarr_models.v04.omero import Channel, Omero, Window
from ome_zarr_models.v04.multiscales import Dataset, Multiscale
import os
from shutil import rmtree

import zarr

if os.path.exists("write_image.zarr"):
    rmtree("write_image.zarr")

pixel_sizes = (1, 0.45, 0.34, 0.34)
dataset_scales = [1, 2, 4]

# write Zarr v2 arrays manually...(pixel data omitted)
store = zarr.DirectoryStore('write_image.zarr', dimension_separator='/')
root = zarr.group(store=store)
for f in dataset_scales:
    root.create_dataset(f"scale{f}", shape=(1, 512/f, 512/f, 512/f), chunks=(1, 32, 32, 32), dtype='uint8')

# create the image metadata
axes = (
    Axis(name="c", type="channel", unit=None),
    Axis(name="z", type="space", unit="meter"),
    Axis(name="x", type="space", unit="meter"),
    Axis(name="y", type="space", unit="meter"),
)
datasets = []
for f in dataset_scales:
    transforms_dset = (VectorScale.build((1, 0.45 * f, 0.34 * f, 0.34 * f)),
                        VectorTranslation.build((0, 0, 0, 0)))
    datasets.append(
        Dataset(path=f"scale{f}", coordinateTransformations=transforms_dset)
    )

multi = Multiscale(axes=axes, datasets=tuple(datasets), version="0.4")
w = Window(min=0, max=1024, start=100, end=200)
channel = Channel(color="FF0000", window=w)
om = Omero(channels=[channel])

image = ImageAttrs(multiscales=[multi], omero=om)

# populate the zarr group with the image metadata
for k, v in image.model_dump().items():
    root.attrs[k] = v

A couple of questions:

Initially I omitted to add the version="0.4" to the Multiscale. Although the spec doesn't require it (level is SHOULD), it causes issues in the validator as we don't know which schemas to load etc. Since the class is ome_zarr_models.v04.multiscales.Multiscale I wonder if it should automatically populate the version in this case?
With the code above, the validator reports the data is invalid since /multiscales/0/name is not a string (since it is null in the JSON). Although it is optional, if it's present it must be a string. Also multiscale.type and multiscale.metadata are also null. Maybe there's a better way to populate the zarr.attrs from the model that doesn't include null items?
Minor point: All the model classes are nice and easy to guess, apart from ImageAttrs. I guessed Image first and got some strange error.
Any other feedback on this example? I wondered if the docs / tutorial would benefit from something like this?

Cheers

The text was updated successfully, but these errors were encountered:

d-v-b · 2025-01-17T11:54:59Z

Maybe there's a better way to populate the zarr.attrs from the model that doesn't include null items?

I think on each class where this is an issue, we can have a private class variable that lists the names of the fields that shouldn't be serialized if they are None, and when serializing to JSON we skip all the fields in that list if the value of the field is None. This worked for me over in pydantic-ome-ngff

dstansby · 2025-01-17T12:01:33Z

Thanks for giving it a spin!

Initially I omitted to add the version="0.4" to the Multiscale. Although the spec doesn't require it (level is SHOULD), it causes issues in the validator as we don't know which schemas to load etc. Since the class is ome_zarr_models.v04.multiscales.Multiscale I wonder if it should automatically populate the version in this case?

I'm afraid the spec doesn't require it, then we won't automatically populate it, in line with our ethos of sticking rigidly to the spec. At least I think this is solved in version 0.5 of the spec now, with version mandated?

With the code above, the validator reports the data is invalid since /multiscales/0/name is not a string (since it is null in the JSON). Although it is optional, if it's present it must be a string. Also multiscale.type and multiscale.metadata are also null. Maybe there's a better way to populate the zarr.attrs from the model that doesn't include null items?

Ooh, this is a bug on our part. We claim to not be serialising null at all (see known issues), so that should be fixed.

All the model classes are nice and easy to guess, apart from ImageAttrs. I guessed Image first and got some strange error.

What error did you get? As far as I know it should be Image (and our API docs agree that Image should exist)

I wondered if the docs / tutorial would benefit from something like this?

Yes please! We haven't got round to working out the best way of creating objects and then using them to write out metadata, so if you have done that a PR to add it to the tutorial would be great!

will-moore · 2025-01-17T12:33:23Z

With:

from ome_zarr_models.v04.image import Image
...
image = Image(multiscales=[multi], omero=om)

I get:

Traceback (most recent call last):
  File "/Users/wmoore/Desktop/python-scripts/zarr_scripts/ome-zarr-models-py/write_image_metadata.py", line 47, in <module>
    image = Image(multiscales=[multi], omero=om)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wmoore/opt/anaconda3/envs/test_zarr_libs_py312/lib/python3.12/site-packages/pydantic/main.py", line 214, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 2 validation errors for Image
multiscales
  Extra inputs are not permitted [type=extra_forbidden, input_value=[Multiscale(axes=(Axis(na..., name=None, type=None)], input_type=list]
    For further information visit https://errors.pydantic.dev/2.10/v/extra_forbidden
omero
  Extra inputs are not permitted [type=extra_forbidden, input_value=Omero(channels=[Channel(c...art=100.0, end=200.0))]), input_type=Omero]
    For further information visit https://errors.pydantic.dev/2.10/v/extra_forbidden

will-moore · 2025-01-17T12:47:14Z

I see under "Known Issues" you have: "Any fields set to None in the Python objects are currently not written when they are saved back to the JSON metadata using this package." But it doesn't say how you should be saving the JSON metadata?

Maybe this isn't the way to do it?:

for k, v in image.model_dump().items():
    root.attrs[k] = v

dstansby · 2025-01-17T13:24:41Z

Ah right, the Image vs. ImageAttrs thing is that Image represents the entire Zarr group, including attributes, and ImageAttrs represents the OME-Zarr attributes. So using ImageAttrs to create the attributes is expected. I tried to explain this at the top of our API docs, but suggestions for improvement there definitely welcome.

But it doesn't say how you should be saving the JSON metadata?

I think when we wrote that we were expecting the model to be serialised using Pydantic, which I think you're doing using model_dump()? Something probably needs fixing so model_dump() doesn't include None values.

imagejan · 2025-01-17T21:07:48Z

Something probably needs fixing so model_dump() doesn't include None values.

model_dump() or model_dump_json() should be used with exclude_none=True

Maybe we should have our own utility method that serlializes the model that way by default?

dstansby · 2025-01-30T10:23:51Z

Maybe we should have our own utility method that serlializes the model that way by default?

I think this is a great idea 👍 - does anyone want to have a go implementing this?

For now, you could use the exclude_none argument to model_dump() to avoid writing out None values.

d-v-b · 2025-01-30T10:47:54Z

if you add a new method, then pydantic will not know about it, and your model will then not serialize correctly. This will impair embedding these models inside other pydantic data structures.

I would recommend just copying exactly (or improving) what I did over in pydantic-ome-ngff, which was to hard-code for each model the fields that should not be serialized if they are None, and override the default pydantic serialization on such models to check and exclude such fields. You could distinguish between json and dict serialization if you want.

Just using exclude_none is probably not safe, because there is no guarantee that all None values will always be excluded. In 0.4 there are fields that are untyped, so None is valid there, and should be serialized.

dstansby · 2025-01-30T12:56:33Z

Could we just use the .to_zarr() method in pydantic-zarr instead? I think that would solve the "I want to write some OME-Metadata from scratch" use case.

d-v-b · 2025-01-30T13:09:26Z

# populate the zarr group with the image metadata
for k, v in image.model_dump().items():
    root.attrs[k] = v

yeah I missed this part of the issue -- users should never have to write code like this. to_zarr exists for that reason.

d-v-b · 2025-01-30T13:11:48Z

similarly, users don't need to manually create zarr arrays, or a zarr group. to_zarr takes care of that.

will-moore · 2025-01-30T13:21:14Z

@d-v-b Do you mean that to_zarr already does that, or that's something that's planned?

So, in my example above, if you missed out the creation of root group and root.create_dataset() calls and at the end you did

root = image.to_zarr()

how would that also handle the array creation (since you've not provided any data, shape, dtype etc)?

d-v-b · 2025-01-30T13:28:24Z

to_zarr can already create arrays and groups. see this example in the docs: https://zarr.dev/pydantic-zarr/usage_zarr_v2/#flattening-and-unflattening-zarr-hierarchies. You have to tell pydantic-zarr what properties your arrays have, and then associate those arrays with a group, but after that you can create the entire zarr hierarchy in one shot.

There are utilities in ome-zarr-models-py for creating ome-zarr metadata from arrays + other parameters, but they are only used in tests because not everyone was convinced that these utilities would be useful. maybe it's time to revisit that decision.

d-v-b · 2025-01-30T13:31:17Z

@will-moore look at this example in the tests:

ome-zarr-models-py/tests/v04/test_multiscales.py

Lines 442 to 448 in 9c610fe

    
           group_model = from_arrays( 
        
               arrays=arrays, 
        
               axes=(Axis(name="x", type="space"), Axis(name="y", type="space")), 
        
               paths=array_names, 
        
               scales=((1, 1), (2, 2)), 
        
               translations=((0, 0), (0.5, 0.5)), 
        
           )

I think this is the kind of API we should be aiming for.

will-moore · 2025-01-30T13:49:57Z

That looks nice, yes.
If I could specify that my pixel size was e.g. sizes = (1, 0.45, 0.34, 0.34) (from above) if it could then calculate all the scales and translations for me, based on the array dimensions, that would be even nicer (and avoid me making mistakes)!

Otherwise everyone will have to write their own helpers to do that.

d-v-b · 2025-01-30T13:53:16Z

it's not possible to calculate the translations from the pixel size, because that requires knowledge about how the image was downsampled. I address this by using xarray for downsampling, and using bindings between xarray and ome-zarr to ensure that the translation information gets propagated.

dstansby · 2025-01-30T14:09:22Z

not everyone was convinced that these utilities would be useful.

My reluctance here was having to re-implement the interface/function signature in zarr-python (or another implementation) for making Zarr arrays.

How about a model where you create your metadata without any arrays, and then add arrays to it, e.g., in add_multiscale_array() below. When an array is added the metadata is saved to the zarr group, but no data actually written.

from ome_zarr_models.v04.axes import Axis
from ome_zarr_models.v04.image import Image, ImageAttrs
from ome_zarr_models.v04.multiscales import Multiscale

multiscale = Multiscale(axes=(Axis(name="x")), datasets=())
attrs = ImageAttrs(multiscales=[])
image = Image(attributes=attrs)
image.to_zarr(...)  # This just writes out groups, not arrays


image.add_multiscale_array(path=..., arr=zarr.empty(...), transform=...)

I haven't thought it this would work for other groups (e.g., HCS), but might be a way forward without us having to re-implement function signatures from zarr-python?

dstansby · 2025-01-30T14:12:58Z

Oh actually this is just a variation of #121 (comment)... so I think we agree that something like #121 (comment) or my comment above would be useful - would anyone be up for putting together a draft PR for how this could work for Image?

will-moore mentioned this issue Jan 30, 2025

Add a how do I? page #141

Merged

dstansby mentioned this issue Jan 30, 2025

Document (and decide?) how to write metadata #143

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage questions & feedback #121

Usage questions & feedback #121

will-moore commented Jan 17, 2025

d-v-b commented Jan 17, 2025

dstansby commented Jan 17, 2025

will-moore commented Jan 17, 2025

will-moore commented Jan 17, 2025

dstansby commented Jan 17, 2025

imagejan commented Jan 17, 2025

dstansby commented Jan 30, 2025

d-v-b commented Jan 30, 2025 •

edited

Loading

dstansby commented Jan 30, 2025

d-v-b commented Jan 30, 2025 •

edited

Loading

d-v-b commented Jan 30, 2025

will-moore commented Jan 30, 2025

d-v-b commented Jan 30, 2025 •

edited

Loading

d-v-b commented Jan 30, 2025

will-moore commented Jan 30, 2025

d-v-b commented Jan 30, 2025 •

edited

Loading

dstansby commented Jan 30, 2025

dstansby commented Jan 30, 2025

Usage questions & feedback #121

Usage questions & feedback #121

Comments

will-moore commented Jan 17, 2025

d-v-b commented Jan 17, 2025

dstansby commented Jan 17, 2025

will-moore commented Jan 17, 2025

will-moore commented Jan 17, 2025

dstansby commented Jan 17, 2025

imagejan commented Jan 17, 2025

dstansby commented Jan 30, 2025

d-v-b commented Jan 30, 2025 • edited Loading

dstansby commented Jan 30, 2025

d-v-b commented Jan 30, 2025 • edited Loading

d-v-b commented Jan 30, 2025

will-moore commented Jan 30, 2025

d-v-b commented Jan 30, 2025 • edited Loading

d-v-b commented Jan 30, 2025

will-moore commented Jan 30, 2025

d-v-b commented Jan 30, 2025 • edited Loading

dstansby commented Jan 30, 2025

dstansby commented Jan 30, 2025

d-v-b commented Jan 30, 2025 •

edited

Loading

d-v-b commented Jan 30, 2025 •

edited

Loading

d-v-b commented Jan 30, 2025 •

edited

Loading

d-v-b commented Jan 30, 2025 •

edited

Loading