Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROIs] Use FOV ROIs for labeling (at arbitrary pyramid level) #19

Closed
tcompa opened this issue Jul 20, 2022 · 9 comments
Closed

[ROIs] Use FOV ROIs for labeling (at arbitrary pyramid level) #19

tcompa opened this issue Jul 20, 2022 · 9 comments
Labels
Tables AnnData and ROI/feature tables

Comments

@tcompa
Copy link
Collaborator

tcompa commented Jul 20, 2022

Starting from discussion in

we are now splitting the work on ROIs into several steps/issues:

  1. Parsing pixel sizes and FOV ROIs.
  2. Transform physical-unit ROI into pixels, at arbitrary pyramid level.
  3. Use FOV ROIs for illumination correction (at highest-resolution level).
  4. Use FOV ROIs for labeling (at arbitrary pyramid level). (THIS ISSUE)

Useful resources:


This is the slightly more complex that the illumination-correction case, because:

  1. We should read/write at an arbitrary resolution level.
  2. We may have to force a sequential scan of ROIs (this is to be verified).

Other than that, the tentative plan is similar to the illumination-correction one:

  1. Read ROIs from some path, and transform them to indices at an arbitrary level.
  2. Switch from map_blocks to an explicit loop over ROIs which constructs an array by delayed correction functions. In v0, this should be constrained to never act on more than a single ROI at a time.
  3. Write to disk, triggering execution of the correction function for all FOVs. Keep in mind that when working at level>0 there are some additional scale prefactors to introduce - see https://github.com/fractal-analytics-platform/fractal/blob/tasks/fractal/tasks/image_labeling_whole_well.py#L159-L175.
tcompa referenced this issue in fractal-analytics-platform/fractal-client Jul 21, 2022
tcompa referenced this issue in fractal-analytics-platform/fractal-client Jul 21, 2022
tcompa referenced this issue in fractal-analytics-platform/fractal-client Jul 22, 2022
tcompa referenced this issue in fractal-analytics-platform/fractal-client Jul 25, 2022
* Generalize image_labeling to work with ROIs;
* Generalize image_labeling to work at arbitrary resolution level;
* Add lib_zattrs_utils.py, also with rescale_datasets;
* Rename extract_zyx_pixel_sizes_from_zattrs to extract_zyx_pixel_sizes;
@tcompa
Copy link
Collaborator Author

tcompa commented Jul 25, 2022

Current status of the per-FOV image-labeling task:

  1. It takes an argument labeling_level (in the same way as image_labeling_whole_well), which can be larger than 0.
  2. It loops over FOV ROIs at this arbitrary resolution level and performs segmentation.
  3. The loop is currently not in parallel - this is something to explore later.
  4. After segmentation, the task writes the mask with the correct scale factors. Since the rescaling operation is shared among the two labeling tasks, it is now obtained with the function rescale_datasets from lib_zattrs_utils.py:
def rescale_datasets(datasets : List[Dict] = None,
                     coarsening_xy : int = None,
                     reference_level : int = None,) -> List[Dict]:
    if datasets is None or coarsening_xy is None or reference_level is None:
        raise TypeError("Missing argument in rescale_datasets")

    # Construct rescaled datasets
    new_datasets = []
    for ds in datasets:
        new_ds = {}

        # Copy all keys that are not coordinateTransformations (e.g. path)
        for key in ds.keys():
            if key != "coordinateTransformations":
                new_ds[key] = ds[key]

        # Update coordinateTransformations
        old_transformations = ds["coordinateTransformations"]
        new_transformations = []
        for t in old_transformations:
            if t["type"] == "scale":
                new_t = {"type": "scale"}
                new_t["scale"] = [
                    t["scale"][0],
                    t["scale"][1] * coarsening_xy**reference_level,
                    t["scale"][2] * coarsening_xy**reference_level,
                ]
                new_transformations.append(new_t)
            else:
                new_transformations.append(t)
        new_ds["coordinateTransformations"] = new_transformations
        new_datasets.append(new_ds)

Note that we explicitly forbid the existence of a global coordinateTransformation, at the moment. We can also decide to allow this feature, and then we need to combine the two scale's (here and also somewhere else - e.g. when parsing pixel sizes).

There a few of relatively simple open points:

  1. Make sure that output is chunked correctly;
  2. Perhaps include the global scale transformation;
  3. Explore parallel writing.
  4. Make sure that image_labeling still works also for 2D (concerning this point: do we actually need to support the per-FOV 2D segmentation? Or should we drop it?)

tcompa referenced this issue in fractal-analytics-platform/fractal-client Jul 25, 2022
@jluethi
Copy link
Collaborator

jluethi commented Jul 25, 2022

Sounds great! 👏🏻

Make sure that image_labeling still works also for 2D (concerning this point: do we actually need to support the per-FOV 2D segmentation? Or should we drop it?)

Per FOV 2D segmentation can still be useful, because it allows for 2D segmentation at full resolution, while whole well 2D segmentation needs to be done at a lower resolution for memory reasons.

Perhaps include the global scale transformation;

If it's simple to do, why not. Otherwise, let's wait how transformations change in the v0.5 spec and eventually refactor for that

Explore parallel writing.

There are some cases we know would be save. e.g. at level 0. Is there a way to detect if parallel writing will be save? Also interesting to explore whether it can be safely done in cases where FOV are in overlapping chunks.

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 25, 2022

Make sure that image_labeling still works also for 2D (concerning this point: do we actually need to support the per-FOV 2D segmentation? Or should we drop it?)

Per FOV 2D segmentation can still be useful, because it allows for 2D segmentation at full resolution, while whole well 2D segmentation needs to be done at a lower resolution for memory reasons.

Ok.
This feature is in-place, and I just tested it on a MIP array. It cannot be tested within Fractal, at the moment, because of our "poor" handling of task arguments (the per-FOV segmentation always acts on the "main" zarr file, and in the current version it cannot be overridden), but it works smoothly from the command-line interface.

tcompa referenced this issue in fractal-analytics-platform/fractal-client Jul 25, 2022
tcompa referenced this issue in fractal-analytics-platform/fractal-client Jul 25, 2022
tcompa referenced this issue in fractal-analytics-platform/fractal-client Jul 25, 2022
@tcompa
Copy link
Collaborator Author

tcompa commented Jul 25, 2022

Partial update on status:

  1. Make sure that output is chunked correctly.
    Done. Chunk sizes correspond to images (e.g. 1x2160x2560), and the image size is obtained by combining FOV ROIs (from the AnnData table) and pixel sizes (from the zattrs file) - both for the highest-resolution level.
  2. Make sure that image_labeling still works also for 2D.
    Done (although not as part of Fractal).
  3. Perhaps include the global scale transformation.
    This is indeed something simple, but perhaps let's postpone since more changes will come with v0.5. As of fractal-analytics-platform/fractal-client@273ab07, we raise a NotImplementedError with a decent error message when a global transformation is found.
  4. Explore parallel writing.
    Ongoing. The first bunch of tests with num_threads>1 just ran through (with reasonable output), even if they included possibly-non-safe parallel writing. Maybe to_zarr already takes care of it? More tests/understanding needed. We may also provisionally change the chunking choices (making chunks very large), to create more edge cases where many processes should write to the same chunk.
  5. Re: Is there a way to detect if parallel writing will be save?
    There is a way, of course, but it requires an explicit check that each chunk only hosts a single FOV/ROI. If expected safe operations are only the ones acting on single images, then I wouldn't take the road towards a general automatic check of safety - since this can be achieved directly by requiring that chunks are of the size of images. If @jluethi has in mind something else (like checking that no pair of organoids belong to the same chunk), then the automatic check would be needed.
    I'd say let's re-assess after we explore point 4. In the best case, parallel writing is always safe and we can skip this discussion.
  6. Profile memory usage. This is just to make sure that we've not introduced any unexpected large memory usage.
    EDIT: this is under control - see next comment.

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 25, 2022

Concerning memory usage, here is the profiling for three single-well cases. TL;DR Everything looks OK.

Note that multi-well examples may lead to a known issue - see fractal-analytics-platform/fractal-client#109 (comment).

In all three cases below, segmentation takes place per-FOV, at the highest-resolution level, and with relabeling.

Case 1: 2x2 well with 10 Z planes (num_threads=2):

figure

Case 2: 2x2 well with 42 Z planes (num_threads=1):

figure

Case 3: 9x8 well with 19 Z planes (num_threads=2):

figure

@jluethi
Copy link
Collaborator

jluethi commented Jul 25, 2022

Awesome progress, very nice to see that memory is stable as well!

A NotImplementedError is harsh, but I guess we don't have good ways to parse warnings so far, right? We probably wouldn't want the whole pipeline to crash if someone specifies extra metadata. So as soon as we think the library may gain broader usage, this would have to go. But fine for now :)

Explore parallel writing.

Sound good. Let's make this the focus. I agree, only if parallel writing is possible, but causes issues in some cases will we need to implement some check as in 5. Eventually, we may also want to check that there is no ROI overlap (e.g. for the organoid case). But let's have a look first at how save the to_zarr is. I like the idea of provoking simultaneous writes, could also do this for some synthetic examples. If to_zarr handles this well, than we only need to worry about actual pixel overlap, not parallel chunk access :)

👏🏻👏🏻👏🏻

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 26, 2022

Here is some first anecdotal evidence of successful parallel writing.

For a single 9x8 well, I performed per-FOV segmentation at resolution level 4, where a FOV has YX shape 135x160 and a single 2160x2560 chunk may host 256 FOVs (that is, more than the 72 FOVs of the well).
I ran this test with num_threads=10 parallel executions of Cellpose (which is possible, memory-wise, because each FOV has shape 10x135x160), which is a highly "dangerous" operation because all segmentation calls write on the same (single) chunk.

Observations:

  1. The code ran through, with no obvious errors/warnings.
  2. The labels appear reasonable in napari (apart from the fact that they are extremely coarse and irregular, but that is an expected consequence of working at level 4).

This points in the good direction!
We should now find some more stringent tests to be sure that everything is happening safely (maybe we can quickly come up with an artificial test of to_zarr.. let's see).

@jluethi
Copy link
Collaborator

jluethi commented Jul 26, 2022

Oh, this sounds great! Very promising indeed! 🎉

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 28, 2022

Based on the discussion in #20 (and under the assumptions described there), we consider parallel writing as safe - until further notice. This ends the current work on per-ROI labeling, and I'm closing this issue.

Things that are not addressed here:

  1. There is still a broader memory issue which comes up during labeling, but it is not related to the per-ROI scheme -- see Labeling-tasks integration into fractal + (Torch) memory errors fractal-client#109.
  2. Another minor missing point (support for a global coordinate transformation) is not specific for labeling but rather shared among tasks. It can be addressed in the future, the first time that we actually need it. See Support global scale transformations at the multiscale level #50.

@tcompa tcompa closed this as completed Jul 28, 2022
@jluethi jluethi transferred this issue from fractal-analytics-platform/fractal-client Sep 2, 2022
@jluethi jluethi moved this from Done to Done Archive in Fractal Project Management Oct 5, 2022
@tcompa tcompa added the Tables AnnData and ROI/feature tables label Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Tables AnnData and ROI/feature tables
Projects
None yet
Development

No branches or pull requests

2 participants