-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Field of view parallelization via OME-NGFF ROI tables #24
Comments
Here are two points to consider, based on discussions with @mfranzon and @jluethi. Parallel writing of ROIsTwo ROIs can have different kinds of overlaps, that make parallel writing of these regions unsafe. The first case is when two ROIs have an actual overlap in physical space. This is probably not true in most cases (think about ROIs=FOVs, or ROIs=organoids), and in some cases we could mitigate this problem by switching to complex 3D shapes. Anyway, we can probably also check for this issue explicitly, if needed. The second case is overlap in the underlying array structure, i.e. two ROIs may belong to the same chunk (a single ROI can also touch several chunks, but this doesn't change the current issue). The only case in which this problem is not there is when ROIs=FOVs and we work at the highest-resolution level (where chunks=FOVs). In all other cases, this problem is present. For this reason, it is not safe to parallelize over ROIs, since it would possibly lead to simultaneous writing of the same chunk during the processing of two different ROIs. Current (planned) solution: parallelization happens over wells, and within wells we scan through ROIs in a sequential way. Note that this requires a knowledge of "which ROIs belong to a certain well" (which shouldn't be very hard to obtain). Storing data at lower-resolution levelsOne of the reasons for defining ROIs is that some processing (e.g. labeling) can happen for FOVs at pyramid levels larger than 0 (the highest-res one), say level 1. The output (e.g. a mask) has a shape (in pixels) which is different from level 0 in the image array, but it will still be saved as level 0 in the new (labels) folder. There are at least two options for how to proceed, and we are now aiming at the second one:
|
Question (regarding parallel writing): maybe someone else already thought about parallel writing of arbitrary irregular regions of an array? To be verified. |
It could be worth looking into the advanced indexing for Zarr. I haven't fully understood it yet, but this part of the documentation does sound promising:
Extracting & updating subsets of data is what we are after here. Not sure whether this would still fail when multiple chunks are attempted to be written to in parallel. But let's test this :) And let's keep looking whether someone has an example or implementation to get around this issue. |
Regarding Personally, I think going with option 2 would be preferable. It is the general case and allows us to store multi-resolution data if we ever need to support this. If we can use the same |
Interesting, thanks for the link. The relevant section is https://zarr.readthedocs.io/en/stable/tutorial.html#parallel-computing-and-synchronization. Some quotes from a first quick read:
Unclear to me whether the synchronization is taken care of by zarr (with something like |
Closing this, moving to |
The problem
We need to have a way to process information per original field of view. Our initial attempts to do so by saving each field of view to a separate folder within the well (following the HCS spec) has failed due to performance issue of browsing such data at low resolution. See #12 for the details.
Some of our tasks still rely on processing based on the original FOVs. For example, illumination is run by FOV and currently needs the image sizes as an input: https://github.com/fractal-analytics-platform/fractal/blob/529345b2d2f855a2b9cae7b237965fe35ee314bb/fractal/tasks/illumination_correction.py#L30
Labeling currently hard-codes FOV-sizes: https://github.com/fractal-analytics-platform/fractal/blob/529345b2d2f855a2b9cae7b237965fe35ee314bb/fractal/tasks/image_labeling.py#L112
We should generalize this and enable labeling to be done at different pyramid levels (e.g. labeling may be faster and perform good enough at pyramid level 1). For this, we need information about where the individual FOVs are within the Zarr file.
Plus, if we want to distribute the processing e.g. of labeling to multiple jobs, we need to be able to know what those jobs will be ahead of time.
Proposal
I suggest that we use the proposed OME-NGFF Tables for this. https://forum.image.sc/t/proposal-for-ome-ngff-table-specification/68908
The current OME-NGFF Tables proposal is for a table that is used to save features (see fractal-analytics-platform/fractal-client#27 for discussions on how we can use it for our feature saving). Saving regions of interest (ROIs) is mentioned, but more as an outlook of a potential type of table to be supported. The reason for this is that ROIs can get quite complicated if one wants to save arbitrary 3D polygons. See some early work on this here: https://github.com/openssbd/bdz
For our use case though, I think we can get away with a very simple RO definition. It could be a table like the following:
The corner coordinates & lengths are in physical units, thus one can map them to pixels by combining the scale information & the coordinate.
We could populate this table based on the metadata parsed from the Yokogawa or generated separately (see #25)
Using such a table would allow us 2 things:
If this works well, such an approach could eventually be generalized and applied to e.g. ROIs based on organoid segmentation, ROIs that don't follow the original FOVs anymore.
Limitations
There is not yet a defined way to save ROI information to the OME-NGFF Tables (the tables themselves are still very new, but work well using this prototype: https://github.com/kevinyamauchi/ome-ngff-tables-prototype). Given that the generalized way of saving ROIs could get quite a bit more complex (arbitrary shapes etc.), I suggest we start with a simple ROI table that just handles rectangular cuboids (3D rectangles, i.e. all sides are 90 degrees, but side lengths aren't necessarily equal. If a standard of how to save ROIs emerges later and is both general & still easy to use, we could still switch to that.
The text was updated successfully, but these errors were encountered: