Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workflow RDF #478

Open
wants to merge 48 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
aa7cdc6
draft workflow RDF
FynnBe Oct 26, 2022
7924b2f
update passthrough module generation
FynnBe Oct 26, 2022
825cad3
Tensor -> Arg; ArgType
FynnBe Oct 26, 2022
891093f
test examples
FynnBe Oct 26, 2022
8f06f9e
add schema validation
FynnBe Oct 26, 2022
6889bf8
add test_workflow_rdf.py
FynnBe Oct 26, 2022
9f0eed2
fix missing workflow import
FynnBe Oct 26, 2022
8639d8d
update generate_rdf_docs.py and generate_json_specs.py
FynnBe Oct 26, 2022
1476670
fix typing import
FynnBe Oct 26, 2022
fe9243e
test_steps and better workflow kwargs
FynnBe Oct 27, 2022
7afc2c6
Merge branch 'main' into workflow_rdf
FynnBe Oct 27, 2022
5645d4c
Update example_specs/workflows/hpa/single_cell_classification.yaml
FynnBe Oct 28, 2022
218b67a
wip discussion with constantin
FynnBe Oct 28, 2022
94d1292
wip2
FynnBe Oct 28, 2022
5831b04
Merge branch 'main' into workflow_rdf
FynnBe Oct 31, 2022
428d605
axes and options
FynnBe Oct 31, 2022
491e7b4
Merge branch 'main' into workflow_rdf
FynnBe Nov 3, 2022
6dacb68
update workflow RDF schema and raw_nodes
FynnBe Nov 3, 2022
9768848
finish first draft of workflow RDF spec
FynnBe Nov 3, 2022
e3d963e
inputs/options/outputs -> *_spec
FynnBe Nov 4, 2022
cd4bd4c
enforce unique step ids
FynnBe Nov 4, 2022
d894da9
detect type workflow
FynnBe Nov 4, 2022
7ace197
don't accept emtpy strings
FynnBe Nov 4, 2022
f5af22f
also log binarized
FynnBe Nov 4, 2022
96376e5
Merge branch 'main' into workflow_rdf
FynnBe Nov 8, 2022
9fe4ca2
wip remove wf steps
FynnBe Nov 24, 2022
31ecba9
rename importable sources
FynnBe Nov 24, 2022
ea1b826
black
FynnBe Nov 24, 2022
283da9a
update changelog
FynnBe Nov 24, 2022
73b31b9
remove steps from workflow spec
FynnBe Nov 24, 2022
9c4a81d
split up CallableSource field
FynnBe Nov 24, 2022
91e6783
set format_version as default
FynnBe Nov 24, 2022
8bdb9c9
prohibit serializing a list from a string
FynnBe Nov 24, 2022
a89bf07
remove specialized axes classes
FynnBe Nov 24, 2022
5670cbc
remove redundant brackets
FynnBe Nov 24, 2022
9b82e90
update workflow tests
FynnBe Nov 25, 2022
0243665
rename DEFAULT_TYPE_NAME_MAP
FynnBe Nov 28, 2022
91d141f
rename ArbitraryAxes to UnknownAxes
FynnBe Nov 28, 2022
a3d97c8
make nested_errors optional
FynnBe Nov 30, 2022
b7b51a9
assert for mypy
FynnBe Dec 6, 2022
eb4e3f8
some aliases for backward compatibility
FynnBe Dec 6, 2022
1102f6a
add AXIS_LETTER_TO_NAME and AXIS_NAME_TO_LETTER
FynnBe Dec 8, 2022
6780a70
Merge branch 'main' into workflow_rdf
FynnBe Feb 1, 2023
924d667
Merge branch 'main' into workflow_rdf
FynnBe Feb 9, 2023
b66798a
update hello workflow example
FynnBe Feb 9, 2023
c90cdd4
Merge branch 'main' into workflow_rdf
FynnBe Mar 3, 2023
ff5cc6e
Merge branch 'main' into workflow_rdf
FynnBe Mar 15, 2023
052c553
remove +\n from CLI help
FynnBe Mar 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
test_steps and better workflow kwargs
FynnBe committed Oct 27, 2022

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
commit fe9243e2694acdf34ccaa8bd35248f17e68fa811
4 changes: 4 additions & 0 deletions bioimageio/spec/shared/fields.py
Original file line number Diff line number Diff line change
@@ -82,6 +82,10 @@ def deserialize(self, value: typing.Any, attr: str = None, data: typing.Mapping[
return value


class Boolean(DocumentedField, marshmallow_fields.Boolean):
pass


class DateTime(DocumentedField, marshmallow_fields.DateTime):
"""
Parses datetime in ISO8601 or if value already has datetime.datetime type
24 changes: 18 additions & 6 deletions bioimageio/spec/workflow/v0_2/raw_nodes.py
Original file line number Diff line number Diff line change
@@ -4,6 +4,7 @@
serialization and deserialization are defined in schema:
RDF <--schema--> raw nodes
"""
import typing
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Dict, List, Union
@@ -15,18 +16,29 @@
from bioimageio.spec.shared.raw_nodes import RawNode

try:
from typing import Literal
from typing import Literal, get_args
except ImportError:
from typing_extensions import Literal # type: ignore
from typing_extensions import Literal, get_args # type: ignore

FormatVersion = FormatVersion
ArgType = Literal["tensor", "string", "object"]
ArgType = Literal["tensor", "int", "float", "string", "boolean", "list", "dict", "any"]
DefaultType = Union[int, float, str, bool, list, dict, None]
TYPE_NAME_MAP = {int: "int", float: "float", str: "string", bool: "boolean", list: "list", dict: "dict", None: "null"}


@dataclass
class Arg(RawNode):
name: str = missing
type: ArgType = missing
default: Union[_Missing, DefaultType] = missing
description: Union[_Missing, str] = missing


@dataclass
class WorkflowKwarg(RawNode):
name: str = missing
type: ArgType = missing
default: DefaultType = missing
description: Union[_Missing, str] = missing


@@ -46,7 +58,7 @@ class Workflow(_RDF):
inputs: List[Arg] = missing
outputs: List[Arg] = missing

test_inputs: List[Union[URI, Path]] = missing
test_outputs: List[Union[URI, Path]] = missing

steps: List[Step] = missing
test_steps: List[Step] = missing

kwargs: Union[_Missing, List[WorkflowKwarg]] = missing
121 changes: 86 additions & 35 deletions bioimageio/spec/workflow/v0_2/schema.py
Original file line number Diff line number Diff line change
@@ -27,9 +27,66 @@ class Arg(_BioImageIOSchema):
validate=field_validators.OneOf(get_args(raw_nodes.ArgType)),
bioimageio_description=f"Argument type. One of: {get_args(raw_nodes.ArgType)}",
)
default = fields.Raw(
required=False,
bioimageio_description="Default value compatible with type given by `type` field.",
allow_none=True,
)

@validates_schema
def default_has_compatible_type(self, data, **kwargs):
if data.get("default") is None:
return

arg_type_name = data.get("type")
if arg_type_name == "any":
return

default_type = type(data["default"])
type_name = raw_nodes.TYPE_NAME_MAP[default_type]
if type_name != arg_type_name:
raise ValidationError(
f"Default value of type {default_type} (type name: {type_name}) does not match type: {arg_type_name}"
)

description = fields.String(bioimageio_description="Description of argument/tensor.")


class WorkflowKwarg(_BioImageIOSchema):
name = fields.String(
required=True,
bioimageio_description="Key word argument name. No duplicates are allowed.",
)
type = fields.String(
required=True,
validate=field_validators.OneOf(get_args(raw_nodes.ArgType)),
bioimageio_description=f"Argument type. One of: {get_args(raw_nodes.ArgType)}",
)
default = fields.Raw(
required=True,
bioimageio_description="Default value compatible with type given by `type` field.",
allow_none=True,
)

@validates_schema
def default_has_compatible_type(self, data, **kwargs):
if data.get("default") is None:
return

arg_type_name = data.get("type")
if arg_type_name == "any":
return

default_type = type(data["default"])
type_name = raw_nodes.TYPE_NAME_MAP[default_type]
if type_name != arg_type_name:
raise ValidationError(
f"Default value of type {default_type} (type name: {type_name}) does not match type: {arg_type_name}"
)

description = fields.String(required=False, bioimageio_description="Description of key word argument.")


class Step(_BioImageIOSchema):
id = fields.String(
required=False,
@@ -69,7 +126,7 @@ class Workflow(_BioImageIOSchema, RDF):
fields.Nested(Arg()),
validate=field_validators.Length(min=1),
required=True,
bioimageio_description="Describes the inputs expected by this model.",
bioimageio_description="Describes the inputs expected by this workflow.",
)

@validates("inputs")
@@ -84,7 +141,7 @@ def no_duplicate_input_names(self, value: typing.List[raw_nodes.Arg]):
outputs = fields.List(
fields.Nested(Arg()),
validate=field_validators.Length(min=1),
bioimageio_description="Describes the outputs from this model.",
bioimageio_description="Describes the outputs from this workflow.",
)

@validates("outputs")
@@ -115,41 +172,12 @@ def inputs_and_outputs(self, data, **kwargs):
if len(names) > len(set(names)):
raise ValidationError("Duplicate names are not allowed.")

test_inputs = fields.List(
fields.Union([fields.URI(), fields.Path()]),
validate=field_validators.Length(min=1),
required=True,
bioimageio_description="List of URIs or local relative paths to test inputs as described in inputs for "
"**a single test case**. "
"This means if your workflow has more than one input, you should provide one URI for each input."
"Each test input should be a file with a ndarray in "
"[numpy.lib file format](https://numpy.org/doc/stable/reference/generated/numpy.lib.format.html#module-numpy.lib.format)."
"The extension must be '.npy'.",
)

test_outputs = fields.List(
fields.Union([fields.URI(), fields.Path()]),
validate=field_validators.Length(min=1),
required=True,
bioimageio_description="Analog to test_inputs.",
kwargs = fields.List(
fields.Nested(WorkflowKwarg()),
required=False,
bioimageio_description="Key word arguments for this workflow.",
)

@validates_schema
def test_outputs_match(self, data, **kwargs):
steps = data.get("steps")
if not steps or not isinstance(steps, list) or not isinstance(steps[-1], raw_nodes.Step):
raise ValidationError("invalid 'steps'")

test_outputs = data.get("test_outputs")
if not isinstance(test_outputs, list):
raise ValidationError("invalid 'test_outputs'")

if steps[-1].op == "select_outputs":
if steps[-1].outputs:
raise ValidationError("Unexpected 'outputs' defined for op: 'select_outputs'. Did you mean 'inputs'?")
if len(test_outputs) != len(steps[-1].inputs):
raise ValidationError(f"Expected {len(steps[-1].inputs)} 'test_inputs', but found {len(test_outputs)}")

steps = fields.List(
fields.Nested(Step()),
validate=field_validators.Length(min=1),
@@ -175,3 +203,26 @@ def step_input_references_exist(self, data, **kwargs):

if step.outputs:
references.update({f"{step.id}.outputs.{out}" for out in step.outputs})

test_steps = fields.List(
fields.Nested(Step()),
validate=field_validators.Length(min=1),
required=True,
bioimageio_description="Test steps to be executed consecutively.",
)

@validates_schema
def test_step_input_references_exist(self, data, **kwargs):
steps = data.get("test_steps")
if not steps or not isinstance(steps, list) or not isinstance(steps[0], raw_nodes.Step):
raise ValidationError("Missing/invalid 'test_steps'")

references = set()
for step in steps:
if step.inputs:
for si in step.inputs:
if si not in references:
raise ValidationError(f"Invalid test step input reference '{si}'")

if step.outputs:
references.update({f"{step.id}.outputs.{out}" for out in step.outputs})
35 changes: 26 additions & 9 deletions example_specs/workflows/hpa/single_cell_classification.yaml
Original file line number Diff line number Diff line change
@@ -9,36 +9,53 @@ inputs:
- name: protein
type: tensor

test_inputs:
- nuclei.npy
- protein.npy
kwargs:
- name: seg_prep
type: boolean
default: false


outputs:
- name: cells
type: tensor
- name: scores
type: tensor

test_outputs:
- cells.npy
- scores.npy

steps:
- op: set_
- id: segmentation
op: model_inference
inputs: [inputs.nuclei] # take the first output of step 1 (id: data) as the only input
outputs: [cells]
kwargs:
model_id: conscientious-seashell
preprocessing: true
rdf_source: conscientious-seashell
preprocessing: ${{ kwargs.seg_prep }}
postprocessing: false
- id: classification
op: model_inference
inputs: [inputs.protein, segmentation.outputs.cells] # take the second output of step1 and the output of step 2
outputs: [scores]
kwargs:
model_id: straightforward-crocodile
rdf_source: straightforward-crocodile
preprocessing: true
postprocessing: false
- op: select_outputs
inputs: [segmentation.outputs.cells, classification.outputs.scores]

test_steps:
- id: test_tensors
op: load_tensors
outputs: [nuclei, protein, cells, scores]
kwargs:
sources: [nuclei.npy, protein.npy, cells.npy, scores.npy]
- id: workflow
op: run_workflow
inputs: [test_tensors.outputs.nuclei, test_tensors.outputs.protein]
outputs: [cells, scores]
kwargs:
rdf_source: ${{ self.rdf_source }}
- op: assert_close
inputs: [test_tensors.outputs.cells, workflow.outputs.cells]
- op: assert_close
inputs: [test_tensors.outputs.scores, workflow.outputs.scores]
51 changes: 41 additions & 10 deletions example_specs/workflows/stardist/stardist_example.yaml
Original file line number Diff line number Diff line change
@@ -8,9 +8,6 @@ inputs:
type: tensor
description: image with star-convex objects

test_inputs:
- raw.npy

outputs:
- name: labels
type: tensor
@@ -21,19 +18,53 @@ outputs:
- name: prob
type: tensor

test_outputs:
- labels.npy
- coord.npy
- points.npy
- prob.npy
kwargs:
- name: diameter
type: float
default: 2.3

steps:
- op: zero_mean_unit_variance
- op: model_inference
kwargs:
model_id: fearless-crab
rdf_source: fearless-crab
preprocessing: false # disable the preprocessing
postprocessing: false # disable the postprocessing
- op: stardist_postprocessing
kwargs:
diameter: 2.3
diameter: ${{ kwargs.diameter }}

test_steps:
- id: test_tensors
op: load_tensors
outputs:
- raw
- labels
- coord
- points
- prob
kwargs:
sources:
- raw.npy
- labels.npy
- coord.npy
- points.npy
- prob.npy
- id: workflow
op: run_workflow
inputs: [test_tensors.outputs.raw]
outputs:
- labels
- coord
- points
- prob
kwargs:
rdf_source: ${{ self.rdf_source }}
- op: assert_close
inputs: [test_tensors.outputs.labels, workflow.outputs.labels]
- op: assert_close
inputs: [test_tensors.outputs.coord, workflow.outputs.coord]
- op: assert_close
inputs: [test_tensors.outputs.points, workflow.outputs.points]
- op: assert_close
inputs: [test_tensors.outputs.prob, workflow.outputs.prob]