-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ROI tables, dimension separator & build_omero_channel_metadata changes #32
Conversation
…nge default upper quantile to 0.999
for more information, see https://pre-commit.ci
Fixed the pre-commit issues now. It looks like one of my new test for the Fractal tasks seems to fail in the CI, but it's running locally. Will need to look into this afterwards. |
Hi @jluethi, Thank you for opening this PR. Extracting the information of the individual fields of view during the data parsing and handing it back makes a lot of sense. I also like adding support to write tables to AnnData following the emerging NGFF spec. However, I would like to keep the core processing functionality strictly separated from any workflow orchestration tool. Could you refactor the PR such that only core functionality is added to faim-hcs? We could also schedule a pair-programming session where we go over the PR together. Cheers! |
Hey @tibuch Thanks for the feedback. Yes, I can refactor it that way and make a new repository for the actual task, which will depend on the faim-hcs repository. Are there plans to deploy this repository to pypi for easier dependency handling? I'll ping you for a pair-programming session to go over the refactored PR once I have it ready. |
Currently there are no plans, but if this will make development easier on you side we can put it on pypi. We are using it like this at the moment:
|
Hey @tibuch I went through all the suggested changes from our meeting on Monday (cleanup, using the output of I have one additional suggestion to add to this PR (see below) that came up. Other than that, it would be ready from my side. During the work on pyramid level specification, I noticed that it's hard to solve ome/napari-ome-zarr#84 with just the two existing parameters. Thus, I'm suggesting we add an additional parameter there: Add a ⇒ A user should be able to overwrite this, i.e. set it to 512 or 256 to enforce that low resolution pyramids are created and it shouldn’t relate to optimal chunk sizes at higher resolutions. By default, the behavior will remain as it was before (I think that’s a reasonable default for anything below 384 well plates) There is some interaction with max_levels (as before): Whichever condition is hit first limits the number of pyramid levels. That can be a bit confusing, but also desired flexibility. I currently still kept it in (behavior as before). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added a few comments.
My main questions:
- Shouldn't the column naming (
x_micrometer
etc.) be a bit more flexible and not hard-code the unit? - Why did you remove the optional inclusion of the
z_position
from the metadata here? That's unrelated to the topic of this pull request, isn't it?
Update docstring Co-authored-by: Jan Eglinger <[email protected]>
Thanks @imagejan ! I'll update the PR to address these docstring & pytest suggestions. Regarding the main questions:
Yes, this should eventually become more general. The first version of Fractal ROIs has these hard-coded column names though and this PR adds support for such Fractal ROIs. There should be more generic ROIs eventually, but I'm not fully sure what direction they'd go (maybe the shapes variant of spatial data regions, see https://spatialdata.scverse.org/en/latest/design_doc.html#regions-of-interest ? Or some other broader approach?).
I assume you mean |
@imagejan I updated the docstrings as suggested and moved the column names in the tests into a fixture. From my side, this PR is ready to be merged (upon rerunning the github automations on the latest commit, which needs to be approved by a maintainer). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had another look at this pull request and added a few more nitpicky comments. I hope that @tibuch and I can finish reviewing this next week and finally merge.
src/faim_hcs/MetaSeriesUtils.py
Outdated
Given that Fractal ROI tables are always 3D, but we only stitch the xy | ||
planes here, the z starting position is always 0 and the | ||
z extent is set to 1. This is overwritten downsteam if the 2D planes are | ||
assembled into a 3D stack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
faim-hcs
doesn't depend on or link to Fractal, so I would rephrase this sentence to not be fractal-specific. Are (field-of-view and well) ROIs always 3D as of this implementation? Or does it make sense to implement them as 2D here, and switch to 3D depending on context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't fully make this Fractal independent, as the ROI tables are created in the Fractal ROI table format. We are working towards having a specification page for v1 of our ROI tables and I'll propose it as an addition then. I updated the wording slightly to reflect that, but it still contains a Fractal reference.
Our v1 ROI tables are always 3D. If a future version of them is more flexible on dimensionality (& column names), I'll be happy to provide an update PR.
adata = ad.AnnData(X=df_roi) | ||
adata.obs_names = roi_table.index | ||
adata.var_names = list(map(str, roi_table.columns)) | ||
ad._io.specs.write_elem(group_tables, table_name, adata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Judging from the _
, the _io
package is meant to be private. Is it safe to use it here? I mean, is it likely to break with future updates of anndata
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. There isn't a more stable table implementation available at the moment though. Given that we're only using the anndata dependency for the table handling & writing: Are you ok with fixing the anndata dependency at a given version (0.8.0 works, I can test the most recent 0.9.2 to ensure it's still fine) and only manually updating it until the table writing gets a stable home?
Would be great if you could run the tests in the github CI again. Had some issues running them locally with the test data (yay Airport wifi? Didn't manage to debug before boarding) I think I've addressed all the issues mentioned above. Let me know if you'd be happy with fixing the AnnData version for the time being, given that the write_elem import comes from the experimental _io subpackage. |
Tl;dr: It wasn't a local CI issue. The same bug gets reproduced locally as in the Github CI. It seems to be an fsspec issue (see below for my investigation on it). The very confusing thing: Commit c471dd1, which passed the CI above (the one before my latest changes) now also fails when I run the test this way. Even the current main branch seems to fail with the same error as above, at least in my local testing. Thus, it appears that some dependency changed since to trigger this. I checked ome-zarr (I had it working in 0.7.1 before, now it's at 0.8.0. But tests in main still fail with it). Also checked Zarr (I had it working with zarr-2.13.6, now it's at 2.16.1. But test still fail with 2.13.6) Also checked fsspec (I had it working with fsspec==2023.5.0, now it's at 2023.9.0) => downgrading fsspec to 2023.5.0 made tests pass again locally. Independent of the ome-zarr & zarr version above |
The most recent commit fixes Would be great to run the online CI again with this, locally it's working again. And that would stop potential anndata changes to interfere with faim-hcs for the moment |
We're working on a formalized Fractal ROI table spec, it's planned for October. I made another reminder issue here for it: #41 |
@tibuch @imagejan Super cool library for converting MD images into OME-Zarrs! 👏🏻
I would like to add Fractal support to use this library. That means 2 things:
This PR contains a proposal to add both things to this repository, as well as a few minor changes I'll go in detail about below (and that we can also move to a different PR, if that's preferred).
ROI tables
We typically rely on 2 types of ROI tables being created by the OME-Zarr converter: Whole-well ROI (trivial, but allows us to use ROI-based loading in a general manner) and FOV ROIs (describing which region of the full image belong to which field of view from the microscope).
To add FOV tables, this needs to be integrated into the core calculations of loading the images (because that's when the decision is made on how to place the FOVs in the well image). Thus I have integrated this into the core functionality in
MetaSeriesUtils
and now a few additional functions return a roi_tables dictionary (keys: str = table_name, values: pd.DataFrame actual table).The actual table writing is optional, the parsing functions now just return a dictionary with tables that can be written to an AnnData table in the OME-Zarr. I also included a helper function to actually do this in
Zarr.py
aswrite_roi_table
.Unfortunately, this is does mean that it's a breaking change to the
write_cyx_image_to_well
&write_czyx_image_to_well
API in that it now returns an additional output (the roi_tables dictionary). I have updated all tests, examples and usage of this function in this package. But if e.g. the prefect tasks depend on this repository, it would require a small adaption in those scripts (e.g. just adding an, -
to the outputs ofwrite_cyx_image_to_well
&write_czyx_image_to_well
to ignore those roi_tables.All other changes should be transparent to the usage of this library, as far as I can tell.
Including the ROI tables writing in the main package (and being able to read them in the tests) does mean that anndata would become a core dependency.
Fractal tasks
I included 2 Fractal tasks in a subpackage. These tasks allow users to parse 2D, 3D or mixed MD images into an OME-Zarr & write the ROI tables.
We could also have these tasks in a separate package if you prefer. It should not interfere with anything in the main package though.
The tasks are in their sub package and running them in Fractal comes with an extra dependency on fractal-tasks-core (we're just refactoring this to be much lighter) & pydantic. This dependency is only needed when installing it to run in Fractal. Thus, they are in an extra
fractal-tasks
.In addition, there is a small json manifest for the tasks, some tests to check their output & a small example in the examples folder using your example data from resources.
Minor suggested changes
a) Changes to
build_omero_channel_metadata
: I added a check that start and end are not the same value (leads to issues in napari otherwise). Also, proposing to switch to 0.999 percentile rescaling to make the visualization a bit less harsh by default (alternatively, we could expose the percentiles as an option the user can set)b) dimension_separator: I hard-coded the dimension_separator = “/” now. I don’t fully understand how it picks the defaults. But sometimes, it picks “.” as a dimension separator for me, and that does not work with napari-ome-zarr for viewing the images. We can also make this an option of course to parse through everything.
Summary
I'm proposing:
write_cyx_image_to_well
&write_czyx_image_to_well
slightly to also output ROI tablesbuild_omero_channel_metadata
slightly/