Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROIs] Transform physical-unit ROI into pixels, at arbitrary pyramid level #22

Closed
tcompa opened this issue Jul 20, 2022 · 15 comments
Closed
Labels
Tables AnnData and ROI/feature tables

Comments

@tcompa
Copy link
Collaborator

tcompa commented Jul 20, 2022

Starting from discussion in

we are now splitting the work on ROIs into several steps/issues:

  1. Parsing pixel sizes and FOV ROIs.
  2. Transform physical-unit ROI into pixels, at arbitrary pyramid level. (THIS ISSUE)
  3. Use FOV ROIs for illumination correction (at highest-resolution level).
  4. Use FOV ROIs for labeling (at arbitrary pyramid level).

Useful resources:


Here is a prototype of the function that returns indices corresponding to a ROI:

def ROI_to_indices(ROI, pxl_sizes=(1,1,6), level=0, coarsening_xy=2):
    x, y, z, len_x, len_y, len_z = ROI[:]
    prefactor = coarsening_xy ** level
    pxl_sizes = (pxl_size[0], pxl_size[1] * prefactor, pxl_size[2] * prefactor)

    start_x = x / pxl_size[0]
    end_x = (x + len_x) / pxl_size[0]
    start_y = y / pxl_size[1]
    end_y = (y + len_y) / pxl_size[1]
    start_z = z / pxl_size[2]
    end_z = (z + len_z) / pxl_size[2]

    return (start_x, end_x, start_y, end_y, start_z, end_z)

Note: this only works for orthogonal ROIs, which are the only ones we plan to support for the moment (AKA: non-orthogonal regions must be placed into a bounding box).

Missing: how to treat pixel rounding for higher levels.

@jluethi
Copy link
Collaborator

jluethi commented Jul 20, 2022

Great! This should then become a library function that different tasks can call.

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 20, 2022

This is our prototype:

import anndata as ad
import pandas as pd
import numpy as np
import math
import typing
from typing import List, Union, Dict

def convert_ROI_table_to_indices(ROI : ad.AnnData,
                                 level : int = 0,
                                 coarsening_xy : int = 2) -> List[List]:
    list_indices = []

    for FOV in range(ROI.n_obs):

        # Extract data from anndata table
        # FIXME: is there a better way to do this??
        x_micrometer = ROI[FOV, ROI.var_names == "x_micrometer"].X[0, 0]
        y_micrometer = ROI[FOV, ROI.var_names == "y_micrometer"].X[0, 0]
        z_micrometer = ROI[FOV, ROI.var_names == "z_micrometer"].X[0, 0]
        len_x_micrometer = ROI[FOV, ROI.var_names == "len_x_micrometer"].X[0, 0]
        len_y_micrometer = ROI[FOV, ROI.var_names == "len_y_micrometer"].X[0, 0]
        len_z_micrometer = ROI[FOV, ROI.var_names == "len_z_micrometer"].X[0, 0]
        pixel_size_x = ROI[FOV, ROI.var_names == "pixel_size_x"].X[0, 0]
        pixel_size_y = ROI[FOV, ROI.var_names == "pixel_size_y"].X[0, 0]
        pixel_size_z = ROI[FOV, ROI.var_names == "pixel_size_z"].X[0, 0]

        # Set pyramid-level pixel sizes
        prefactor = coarsening_xy ** level
        pixel_size_x *= prefactor
        pixel_size_y *= prefactor

        # Identify indices along the three dimensions
        # FIXME: We don't need Z indices for FOVs, but perhaps we will for
        #        more complex 3D shapes
        start_x = x_micrometer / pixel_size_x
        end_x = (x_micrometer + len_x_micrometer) / pixel_size_x
        start_y = y_micrometer / pixel_size_y
        end_y = (y_micrometer + len_y_micrometer) / pixel_size_y
        start_z = z_micrometer / pixel_size_z
        end_z = (z_micrometer + len_z_micrometer) / pixel_size_z
        indices = [start_z, end_z, start_y, end_y, start_x, end_x]

        # Round indices to lower integer
        # FIXME: to be checked
        indices = list(map(math.floor, indices))

        list_indices.append(indices)

    return list_indices

(I'm pushing it now)

@jluethi
Copy link
Collaborator

jluethi commented Jul 20, 2022

    # FIXME: is there a better way to do this??

I'm also no AnnData expert. If this is really the best way, we can also convert the AnnData dataframe to pandas and then use .loc and such.

Though according to the AnnData documentation, indexing with column names should be possible:

Indexing into an AnnData object can be performed by relative position with numeric indices (like pandas’ iloc()), or by labels (like loc()). To avoid ambiguity with numeric indexing into observations or variables, indexes of the AnnData object are converted to strings by the constructor.

    # FIXME: We don't need Z indices for FOVs, but perhaps we will for
    #        more complex 3D shapes

I think it's a good idea to support 3D ROIs from the beginning. Organoid ROIs may not always cover the whole Z span. Especially when we have 80 or 100 Z slices, it will be very valuable to be able to just select e.g. the 30-50 Z slices that an organoid is in for processing, instead of the whole stack

tcompa referenced this issue in fractal-analytics-platform/fractal-client Jul 20, 2022
@tcompa
Copy link
Collaborator Author

tcompa commented Jul 20, 2022

I think it's a good idea to support 3D ROIs from the beginning. Organoid ROIs may not always cover the whole Z span. Especially when we have 80 or 100 Z slices, it will be very valuable to be able to just select e.g. the 30-50 Z slices that an organoid is in for processing, instead of the whole stack

We need to better define what's the expected output for 2D ROIs, when used to index a 3D array.
Our proposal is to add a num_z_replicas argument, equal to 1 by default. If it is larger than 1, and if the ROIs are actually 2D (which is checked by verifying that their Z indices are equal to [0,1]), then we just create a stack of those 2D ROIs which also loops over Z planes.

The obvious use case is illumination correction, which we want to apply to a bunch of images of shape (1, 2160, 2560).

To be more explicit we are now adding this functionality

        # Default behavior
        if num_z_replicas == 1:
            list_indices.append(indices)
        # Create 3D stack of 2D ROIs
        else:
            # Check that this ROI is 2D, i.e. it has z indices [0:1]
            if start_z != 0 or end_z != 1:
                raise Exception(f"ERROR: num_z_replicas={num_z_replicas}, "
                                f"but [start_z,end_z]={[start_z,end_z]}")
            # Loop over Z planes
            for z_start in range(num_z_replicas):
                indices[0:2] = [z_start, z_start + 1]
                list_indices.append(indices[:])

Comments?

tcompa referenced this issue in fractal-analytics-platform/fractal-client Jul 20, 2022
tcompa referenced this issue in fractal-analytics-platform/fractal-client Jul 20, 2022
@tcompa
Copy link
Collaborator Author

tcompa commented Jul 20, 2022

Though according to the AnnData documentation, indexing with column names should be possible:

True.
For some reason in our first test that method failing, and then we switched to this complex approach. Fixed now in fractal-analytics-platform/fractal-client@e925d1c.

I guess it won't be possible to remove the .X[0, 0] part because those are still two-dimensional tables, but the new version is already quite better - and I would call it acceptable. Also note that this part of code will hardly ever be critical in terms of runtime, so we can be a bit more flexible.

@jluethi
Copy link
Collaborator

jluethi commented Jul 21, 2022

We need to better define what's the expected output for 2D ROIs, when used to index a 3D array.

Good question. To my mind, ROIs are always 3D representations. If we have sites, it's some x & y location plus all the z. If we have an organoid, it is x & y box plus some to all of the z planes.

When a task wants to run per plane, it could lazily process the different planes of a ROI. That would be the more general solution, because then tasks should run on 2D & 3D, just depending on what ROI they get and how they process it.

Regarding the .X[0, 0] part: interesting question, will have to look into that. Do you have an example dataset somewhere on a share so that I could have a look at the AnnData object?

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 21, 2022

Good question. To my mind, ROIs are always 3D representations. If we have sites, it's some x & y location plus all the z. If we have an organoid, it is x & y box plus some to all of the z planes.

When a task wants to run per plane, it could lazily process the different planes of a ROI. That would be the more general solution, because then tasks should run on 2D & 3D, just depending on what ROI they get and how they process it.

This seems to match with what we proposed in #22: if ROIs are 2D then we replicate them over all Z planes, while if ROIs are 3D we just use the ones that are defined in the table.
Does this correspond to the expected behavior?

@jluethi
Copy link
Collaborator

jluethi commented Jul 21, 2022

Ah, now I understand. Yes, that is a good idea. As such, the Z parameter is optional and if it's not provided, it's assumed that the ROI contains all Z planes, right?

That actually is a nice generalization! :)

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 21, 2022

We should decide the default behavior.

The one which is in-place now simply returns the ROIs as they are defined in the table:

  • If they are 3D, no problem.
  • If they are 2D ROIs (with trivial Z position and length), then it returns a single set of 2D ROIs (corresponding to Z index equal to 0, which is not really what we care about). Returning the whole set of 2D ROIs is easily achieved by setting num_z_replicas equal to the number of planes, but this number of planes is currently not part of the table so it's not something we can guess.

We can improve this behavior with a constraint that says: if you provided only ROIs which have a trivial Z extensions (corresponding to indices [0:1]), but you did not provide the number of z planes (needed to replicate this ROIs in the Z direction), then we throw an error.

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 21, 2022

Regarding the .X[0, 0] part: interesting question, will have to look into that. Do you have an example dataset somewhere on a share so that I could have a look at the AnnData object?

You can use this snippet

import numpy as np
import anndata as ad
import zarr
from anndata.experimental import write_elem

adata = ad.AnnData(X=np.arange(6).reshape(2,3), dtype=int)
print(f"First table: {adata}")
adata.obs_names = ["row_A", "row_B"]
adata.var_names = ["col_1", "col_2", "col_3"]
zarr_group = zarr.group("test_anndata.zarr")
write_elem(zarr_group, "table", adata)


adata2 = ad.read_zarr("test_anndata.zarr/table")
print(f"Second table: {adata2}")

single_element = adata2["row_B", "col_2"]
print(f"Single element: {single_element}")
print(f"Single element (data): {single_element.X[0, 0]}")

@jluethi
Copy link
Collaborator

jluethi commented Jul 21, 2022

Maybe we should briefly discuss this, it's a good question. Can we have a quick call at 9:30? (or later today)

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 21, 2022

As of our last call with @jluethi:

  1. When we construct the original FOV ROIs, we define them as 3D shape that span the entire Z direction (AKA columns, AKA stacks of images).
  2. Some tasks (e.g. per-FOV labeling) require exactly this kind of shapes, so that they won't need any additional logic.
  3. In some other tasks (e.g. illumination correction), we will have to split each 3D ROI into a set of ROIs which correspond to 2D images. This logic is included into the task, and it doesn't lead to another table being saved within the zarr file. Note that 2D images are still 3D arrays, but with a Z depth of one pixel.
  4. When generating a MIP, we convert the original 3D ROIs to new ROIs which only have a Z depth equal to one pixel size in Z (e.g. 1 um). These new ROIs (2D information, stored in a 3D structure) are then saved as a table in the MIP zarr file.
  5. If some future task needs to act on a MIP array at the per-FOV level, it will seamlessly use the corresponding ROI table.

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 21, 2022

  1. When we construct the original FOV ROIs, we define them as 3D shape that span the entire Z direction (AKA columns, AKA stacks of images).

Can we also extract the Z dimension of a well from the metadata? Or should we parse the filenames?
At the moment this information is not immediately available in the dataframe parsed by metadata_parsing.py.

(this goes together with the other request of parsing the image size - see #23)

@jluethi
Copy link
Collaborator

jluethi commented Jul 21, 2022

Hmm, I think it's feasible to parse this. Will add this to the list of things to add to metadata_parsing.py 🙂

tcompa referenced this issue in fractal-analytics-platform/fractal-client Jul 21, 2022
tcompa referenced this issue in fractal-analytics-platform/fractal-client Jul 21, 2022
tcompa referenced this issue in fractal-analytics-platform/fractal-client Jul 21, 2022
tcompa referenced this issue in fractal-analytics-platform/fractal-client Jul 21, 2022
tcompa referenced this issue in fractal-analytics-platform/fractal-client Jul 21, 2022
tcompa referenced this issue in fractal-analytics-platform/fractal-client Jul 21, 2022
@tcompa tcompa mentioned this issue Sep 2, 2022
tcompa referenced this issue in fractal-analytics-platform/fractal-client Jul 22, 2022
* Do not store pixel sizes in the FOV-ROI table;
* Always parse pixel sizes from the zattrs file, when needed;
* Add level arg to extract_zyx_pixel_sizes_from_zattrs.
@tcompa tcompa closed this as completed Jul 22, 2022
@tcompa
Copy link
Collaborator Author

tcompa commented Jul 22, 2022

I close this as lib_regions_of_interest.py now includes a comprehensive set of tools:

def prepare_FOV_ROI_table(df: pd.DataFrame) -> ad.AnnData:

def convert_ROI_table_to_indices(
    ROI: ad.AnnData,
    level: int = 0,
    coarsening_xy: int = 2,
    pixel_sizes_zyx: Union[List[float], Tuple[float]] = None,
) -> List[List[int]]:

def convert_ROI_table_to_indices(
    ROI: ad.AnnData,
    level: int = 0,
    coarsening_xy: int = 2,
    pixel_sizes_zyx: Union[List[float], Tuple[float]] = None,
) -> List[List[int]]:

def split_3D_indices_into_z_layers(
    list_indices: List[List[int]],
) -> List[List[int]]:

def _inspect_ROI_table(
    path: str = None,
    level: int = 0,
    coarsening_xy: int = 2,
    pixel_sizes_zyx=[1.0, 0.1625, 0.1625],
) -> None:

def extract_zyx_pixel_sizes_from_zattrs(zattrs_path: str, level: int = 0):

Let's reopen it if something else comes up.

@jluethi jluethi transferred this issue from fractal-analytics-platform/fractal-client Sep 2, 2022
@jluethi jluethi moved this from Done to Done Archive in Fractal Project Management Oct 5, 2022
@tcompa tcompa added the Tables AnnData and ROI/feature tables label Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Tables AnnData and ROI/feature tables
Projects
None yet
Development

No branches or pull requests

2 participants