Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC-3: more dimensions for thee #239

Merged
merged 36 commits into from
Dec 17, 2024
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
2281f3e
Add template for RFC-3
jni May 21, 2024
687cd8f
Complete draft RFC
jni May 21, 2024
96bf663
Add Talley Lambert, Norman Rzepka, and Davis Bennett as endorsers
jni May 21, 2024
35e10f2
Add Doug Shepherd, John Bogovic, Eric Perlman, Lachlan Deakin as endo…
jni May 22, 2024
be40c40
Add Sebastian Rhode to endorsers
jni Jul 5, 2024
4cf0457
Clarify in summary that number, name, ordering and type restrictions …
jni Sep 3, 2024
e32423a
Clarify motivation and make less heated
jni Sep 3, 2024
dd15fd8
Clarify historical context of OME model
jni Sep 3, 2024
d126b1c
Add SHOULD recommendation for 3 space and 1 time dim
jni Sep 3, 2024
f75105d
Add missing references
jni Sep 3, 2024
91d4863
are actively preventing -> prevent
jni Sep 3, 2024
28859f2
Update Davis's role
jni Sep 3, 2024
a53a252
Reviewers -> proposed reviewers
jni Sep 3, 2024
f0c12e6
Add section on partial implementations
jni Sep 3, 2024
f241fd5
Update testing section
jni Sep 3, 2024
8f28df6
remove reference to implementation
jni Sep 11, 2024
a3f0434
Merge remote-tracking branch 'origin/main' into all-the-dims
joshmoore Oct 8, 2024
05eea39
Add navigation
joshmoore Oct 8, 2024
d2437b9
Update status table
joshmoore Oct 8, 2024
e152fb4
Fix codespell issue
joshmoore Oct 8, 2024
f134ea0
Merge branch 'all-the-dims' of github.com:jni/ngff into all-the-dims
jni Oct 31, 2024
b767224
Merge branch 'main' into all-the-dims
jni Oct 31, 2024
33e5362
Remove git merge conflict markers :sweat-smile:
jni Oct 31, 2024
984a9f7
Remove d-v-b as implementer
jni Dec 2, 2024
785f705
Add two paragraphs about performance implications
jni Dec 2, 2024
6f7408a
Remove empty html comments
jni Dec 2, 2024
b19bbbc
Summarise discussion from the PR
jni Dec 2, 2024
7b7dbf5
Add dataset types as examples
jni Dec 2, 2024
7e0e205
NGFF->OME-Zarr where appropriate (I think)
jni Dec 2, 2024
21c8acf
Update spec update link alias
jni Dec 2, 2024
11aa79c
Typo
joshmoore Dec 3, 2024
f7d0629
RFC-3: update listing
joshmoore Dec 3, 2024
315c479
Make image.sc thread a link
joshmoore Dec 3, 2024
c6e6009
Add cryoet link
jni Dec 4, 2024
13f1798
Drop 'impl: tbd'
joshmoore Dec 4, 2024
6e558fa
Drop use of version numbers
joshmoore Dec 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions rfc/3/comments/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Comments

Additional comments of RFC-3:

```{toctree}
:maxdepth: 1
:glob:
*/index
```
309 changes: 309 additions & 0 deletions rfc/3/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,309 @@
# RFC-3: more dimensions for thee

```{toctree}
:hidden:
:maxdepth: 1
reviews/index
comments/index
responses/index
versions/index
```

Remove restrictions on the number, names, ordering, and type of dimensions
stored in OME-Zarr arrays.

## Status

This RFC is currently in RFC state `R1` (send for review).

```{list-table} Record
:widths: 8, 20, 20, 20, 15, 10
:header-rows: 1
:stub-columns: 1

* - Role
- Name
- GitHub Handle
- Institution
- Date
- Status
* - Author
- Juan Nunez-Iglesias
- [jni](https://github.com/jni)
- Monash University
- 2024-05-21
-
* - Endorser
- Talley Lambert
- [tlambert03](https://github.com/tlambert03)
- Harvard Medical School
- 2024-05-21
- [Endorse](https://github.com/ome/ngff/pull/239#issuecomment-2122795327)
* - Endorser
- Norman Rzepka
- [normanrz](https://github.com/normanrz)
- Scalable Minds
- 2024-05-21
- [Endorse](https://github.com/ome/ngff/pull/239#issue-2308436425)
* - Endorser
- Davis Bennett
- [d-v-b](https://github.com/d-v-b)
-
- 2024-05-21
- [Endorse](https://github.com/ome/ngff/pull/239#issue-2308436425)
* - Endorser
- Doug Shepherd
- [dpshepherd](https://github.com/dpshepherd)
- Arizona State University
- 2024-05-22
- [Endorse](https://github.com/ome/ngff/pull/239#issue-2308436425)
* - Endorser
- John Bogovic
- [bogovicj](https://github.com/bogovicj)
- HHMI Janelia Research Campus
- 2024-05-22
- [Endorse](https://github.com/ome/ngff/pull/239#issue-2308436425)
* - Endorser
- Eric Perlman
- [perlman](https://github.com/perlman)
-
- 2024-05-22
- [Endorse](https://github.com/ome/ngff/pull/239#issue-2308436425)
* - Endorser
- Lachlan Deakin
- [LDeakin](https://github.com/LDeakin)
- Australian National University
- 2024-05-22
- [Endorse](https://github.com/ome/ngff/pull/239#issue-2308436425)
* - Endorser
- Sebastian Rhode
- [sebi06](https://github.com/sebi06)
- Carl Zeiss Microscopy GmbH
- 2024-06-05
- [Endorse](https://github.com/ome/ngff/pull/239#issue-2308436425)
```

## Overview

OME-Zarr version 0.4 restricts the number, names, ordering, and type of axes
that are allowed in the axes metadata. These restrictions have limited
conversion of proprietary datasets, usage by microscope vendors[^1], and usage
by novel microscopy modalities[^2].

This RFC removes these restrictions, opening NGFF to many more users within its
target domain (and beyond). Because it *only* removes restrictions, existing
valid OME-Zarr datasets will remain valid after implementation of this
proposal.

## Background
jni marked this conversation as resolved.
Show resolved Hide resolved

OME-Zarr [aims][nat methods paper] to provide a unified open format for
bioimaging data and metadata to make it findable, accessible, interoperable,
and reusable. The [paper describing NGFF and OME-Zarr][nat methods paper] notes
that "the diversity of [biological imaging's] applications have prevented the
establishment of a community-agreed standardized data format", but, [for
historical reasons][ome-model], [version 0.4 of the OME-Zarr
specification][ngff 0.4] [imposes][ngff 0.4 multiscales metadata] strict
restrictions on the
applications:

> The length of "axes" must be between 2 and 5 and MUST be equal to the
> dimensionality of the zarr arrays storing the image data (see
> "datasets:path"). The "axes" MUST contain 2 or 3 entries of "type:space" and
> MAY contain one additional entry of "type:time" and MAY contain one additional
> entry of "type:channel" or a null / custom type. The order of the entries MUST
> correspond to the order of dimensions of the zarr arrays. In addition, the
> entries MUST be ordered by "type" where the "time" axis must come first (if
> present), followed by the "channel" or custom axis (if present) and the axes
> of type "space". If there are three spatial axes where two correspond to the
> image plane ("yx") and images are stacked along the other (anisotropic) axis
> ("z"), the spatial axes SHOULD be ordered as "zyx".

And:

> Each "datasets" dictionary MUST have the same number of dimensions and MUST
> NOT have more than 5 dimensions.

These restrictions prevent users from converting existing
datasets to OME-Zarr. For example, Zeiss .czi datasets [may contain][czi format
dimensions] dimensions such as H, I, and V to store different phases,
illumination directions, or views respectively. To say nothing of synthetic data
that may contain "artificial" dimensions such as principal components or axes of
other dimensionality reduction-techniques from many images.

## Motivation

In addition to the .czi datasets mentioned in the preceding paragraph, this
section describes six dataset types that are currently impossible to represent
in OME-Zarr:

- in [electron backscatter diffraction (EBSD)][ebsd], a microscopy technique
common in materials science, a beam of electrons is scanned over a surface,
and for each (2D) position in the scan, a full 2D diffraction pattern is
recorded, resulting in a 4-dimensional data array.
- from the diffraction patterns, it is possible to obtain an *orientation map*,
containing a 3D angle at each 2D position of the material.
- the same principles apply to [diffusion tensor imaging][dti], where a
three-dimensional diffusion tensor is measured at each voxel.
- it is common to compute Fourier transforms of 3D images. The datasets have
three dimensions but they are measured in *frequency*, not space.
- when computing segmentations, one may use finer or coarser priors, resulting
in overlapping, equally valid segmentations, for example, of organelles at
one level, cells at another, and tissues at yet another. One common way to
store such a segmentation is to add a dimension for "coarseness".
- computed spaces may have arbitrary dimensions related to the computation. For
exmaple, in subtomogram averaging of [cryo electron tomography][CryoET],

Check failure on line 155 in rfc/3/index.md

View workflow job for this annotation

GitHub Actions / Check for spelling errors

exmaple ==> example
joshmoore marked this conversation as resolved.
Show resolved Hide resolved
single particles from a tomogram are picked and aligned, producing many
instances of the same 3-dimensional particle. One may wish to store all the
instances in a single 4-dimensional array (one dimension being the *instance
number*). Or, one may use dimension-reduction techniques such as PCA, then
browse average particles along each PCA axis. This creates a virtual 5D space
containing the three spatial dimensions, then a "component number" axis for
the PCA components and a "position" axis for the position along that
component.

## Proposal

This document proposes removing any restrictions on the number of dimensions
stored in OME-Zarr arrays. Additionally, it removes restrictions on the names
and types of included dimensions.

To maximise compatibility with existing software, this proposal recommends that
images with 2-3 spatial dimensions SHOULD name them from the subset of "zyx"
and that they SHOULD have type "space". Similarly, if a dataset contains a
single time dimension, it SHOULD have name "t" and type "time".

After this specification change, tools may encounter OME-Zarr files that don't
match the earlier expectations of containing a subset of the TCZYX axes. This
proposal is agnostic as to what to do in those situations, and indeed the
appropriate action depends on the tool, but some suggestions include:
- fail with an informative error message. (i.e. *partial* implementations are
OK, especially if well-documented.)
- prompt the user about which axes to treat as spatial.
- arbitrarily choose which axes to treat as spatial.
- choose how to treat each axis based on heuristics such as size and position.

## Prior art and references

All of the above removals are part of the draft proposed [transformations
specification][trafo spec], with one exception: the draft currently specifies
that a dataset may only have up to three spatial axes. However, this limitation
is [not set in stone][space dims comment] and could be removed, partly to
improve backwards compatibility.

## Stakeholders

Who has a stake in whether this RFC is accepted?

* Facilitator: Josh Moore (OME)
* Proposed reviewers:
- John Bogovic (HHMI Janelia Research Campus): lead author of draft
[transformations specification proposal][trafo spec]
- Will Moore (OME): maintainer of ome-zarr-py library
- Norman Rzepka (Scalable Minds): maintainer of zarrita
* Consulted:
- Every commenter [on this thread](https://github.com/ome/ngff/pull/239).
* Socialization:
- image.sc: https://forum.image.sc/t/ome-ngff-update-postponing-transforms-previously-v0-5/95617/2

## Implementation

To Be Determined.

## Backwards Compatibility

Since this proposal only removes restrictions, these changes are backwards
compatible at the file level: v0.4 files would transparently be v0.5 files if
this proposal is approved.

Any readers or writers that proactively checked the dimension restrictions
(number of dimensions, dimension names, dimension types) would need to remove
those checks. However, this should be a small amount of work in most cases.

## Forward Compatibility

A draft proposal for [coordinate transformations][trafo spec] already includes
most of the changes proposed here, so we envision that this RFC is compatible
with future plans for the format. The proposal does currently limit the number
of dimensions of type "space" to at most 3, but that limit [could be
removed][space dims comment]. If this RFC is approved, the transformation
specification would need to be updated to reflect this. However, that is an easy
change and there seems to be sufficient support in the community for this idea.
Comment on lines +223 to +229
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to talk about that (stalled) PR at all? I don't see why it's relevant here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's relevant because it speaks to the forward compatibility of this RFC — ie it is in line with existing proposals for the format. That the PR is stalled is not really relevant — it's stalled because of minor details (e.g. array order) that don't have a bearing on this PR. Based on the discussion, other aspects, and certainly the ones relevant to this RFC, have quite broad consensus.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case isn't it sufficient to just state that there are no known conflicts with other active proposals?


## Drawbacks, risks, alternatives, and unknowns

The main reason specifications make restrictions on a file format is to limit
the space of possible implementations. This reduces the overall complexity of
supporting a file format and the burden on implementations.

Comments on the [pull request adding this proposal][this pr] and on the related
pull request [updating the specification text][spec update] have indeed
centered on this complexity.

One particular concern that has been voiced is that in general,
software dealing with these images knows what to do with axes called x, y, and
z, but might not know what to do with axes called foo, bar, and baz. However,
this concern is properly addressed by the existence of the "type" key
in the "axes" metadata, and the special type called "space".

Further, this proposal recommends that, in the absence of other considerations,
spatial axes SHOULD be a subset of x, y, and z, to simplify implementations. It
also takes the position that partial implementations are OK: a software package
designed to view xyz volumetric, light microscopy data should feel free to
error when presented with axes foo, bar, and baz with type "arbitrary". This
mechanism allows maximum flexibility for the format while ensuring
domain-specific implementations do not need to grapple with its full
complexity.

The addition of "SHOULD" recommendations for common microscopy data [seems to
have assuaged most implementation concerns][recap comment].

## Performance

The current OME-Zarr specification ensures arrays are stored in order TCZYX.
With C-order array data, this ensures efficient access for *some* but not *all*
access patterns. By removing restrictions on axis orderings, a new class of
"mistake" is possible, as someone could save an array in order XYTCZ, which
would combine poorly with C-order arrays to view XY planes. However, it is
arguable that Zarr chunking is in fact more important here — XYTCZ *could* be
a perfectly cromulent axis ordering for XY planes if the Zarr chunk size was
(1024, 1024, 1, 1, 1).

Therefore, this proposal argues that any performance implications are better
addressed through good documentation and good defaults. Indeed, more flexible
dimension ordering could *improve* performance in some scenarios, such as
"pixel drilling", that is, extracting the value of a single x/y position over
time.

## Testing

If the RFC is accepted, sample datasets matching the new spec will be
produced for implementations to test against.

## License

This RFC is placed in the public domain.


[nat methods paper]: https://www.nature.com/articles/s41592-021-01326-w
[ome-model]: https://github.com/ome/ngff/pull/239/files#r1609781780
[ngff 0.4]: https://ngff.openmicroscopy.org/0.4/index.html
[ngff 0.4 multiscales metadata]: https://ngff.openmicroscopy.org/0.4/index.html#multiscale-md
[ngff 0.4 axes metadata]: https://ngff.openmicroscopy.org/0.4/index.html#axes-md
[czi format dimensions]: https://web.archive.org/web/20240521085825/https://zeiss.github.io/libczi/imagedocumentconcept.html#autotoc_md7
[spec update]: https://github.com/ome/ngff/pull/235
[this pr]: https://github.com/ome/ngff/pull/239
[recap comment]: https://github.com/ome/ngff/pull/239#issuecomment-2327451719
[trafo spec]: https://github.com/ome/ngff/pull/138
[space dims comment]: https://github.com/ome/ngff/pull/138#issuecomment-1852891720
[ebsd]: https://en.wikipedia.org/wiki/Electron_backscatter_diffraction
[dti]: https://en.wikipedia.org/wiki/Diffusion-weighted_magnetic_resonance_imaging

[^1]: https://github.com/ome/ngff/pull/239#issuecomment-2122809286
[^2]: https://github.com/ome/ngff/pull/239#issuecomment-2149119404

## Changelog

| Date | Description | Link |
| ---------- | ---------------------------- | ---------------------------------------------------------------------------- |
| 2024-10-08 | RFC assigned and published | [https://github.com/ome/ngff/pull/239](https://github.com/ome/ngff/pull/239) |
9 changes: 9 additions & 0 deletions rfc/3/responses/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Responses

Responses from the authors of RFC-3:

```{toctree}
:maxdepth: 1
:glob:
*/index
```
9 changes: 9 additions & 0 deletions rfc/3/reviews/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Reviews

Reviews of RFC-3:

```{toctree}
:maxdepth: 1
:glob:
*/index
```
9 changes: 9 additions & 0 deletions rfc/3/versions/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Versions

Key versions of RFC-3 which have been sent for review, etc.

```{toctree}
:maxdepth: 1
:glob:
*/index
```
Loading