Resolve `Annotated` types (PEP 593) on dataflow visualizations #1276

cswartzvi · 2025-02-08T05:49:57Z

Recently, I was attempting to use Annotated types (PEP 593) for the output of some nodes (see A):

from typing import Annotated

def A() -> Annotated[int, "metadata"]:
    return 42

def B(A: int) -> float:
    """Divide A by 3"""
    return A / 3

def C(A: int, B: float) -> float:
    """Square A and multiply by B"""
    return A**2 * B

if __name__ == "__main__":
    import __main__
    from hamilton import driver

    dr = driver.Builder().with_modules(__main__).build()
    result = dr.execute(["C"])
    print(result)

The execution works as excepted, but when I run dr.display_all_functions("dag.png") the type of A is masked as Annotated on the resulting graphic:

Ideally, IMHO, we would want to see the underlying type such as:

Changes

I updated hamilton.htypes.get_type_as_string so that it will consider Annotated types. Note that it appears that typing.get_origin and typing_inspect.get_origin do not produce the same result for Annotated types (see ilevkivskyi/typing_inspect#109).

How I tested this

I updated the current test for get_type_as_string in test_type_utils.py and added an additional check focusing on Annotated types.

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future TODOs are captured in comments
Project documentation has been updated if adding/changing functionality.

ellipsis-dev

👍 Looks good to me! Reviewed everything up to ace24a3 in 1 minute and 10 seconds

More details

Looked at 15 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 4 drafted comments based on config settings.

1. hamilton/htypes.py:102

Draft comment:
Consider using _get_origin (already imported from typing or typing_extensions) instead of typing.get_origin for consistency across the file.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50%
None

2. hamilton/htypes.py:102

Draft comment:
Add a Sphinx documentation note under docs/ explaining how Annotated types are handled in visualizations.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

3. hamilton/htypes.py:102

Draft comment:
To ensure compatibility with Python versions below 3.9, consider using the existing alias (e.g. column) instead of directly referencing Annotated. In the current conditional, Annotated might be undefined when running on older Python versions.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The comment suggests using column alias instead of Annotated for compatibility, but the code already handles this through version-specific imports. The direct use of Annotated here is inside a function that's used after the proper imports are set up. The code is actually well-structured for version compatibility.
I could be missing some edge case where the imports fail. Maybe there's a reason they specifically created the column alias that I'm not seeing.
The code explicitly handles Python version compatibility through proper imports and version checks. The use of Annotated here is safe because it will be properly defined regardless of Python version.
Delete the comment. The code already properly handles Python version compatibility through explicit version checks and appropriate imports.

4. hamilton/htypes.py:114

Draft comment:
Add Sphinx documentation to explain how Annotated types are resolved (e.g. in a new docs/type_annotations.rst section) to help users understand the visualization and metadata behavior.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_3xrEuMQOW72G2x0l

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

skrawcz

I think this looks good. @elijahbenizzy any thoughts?

Tests seem to be failing due to dask releasing a new version -- will try to see if I can make a fix on a separate branch for them.

zilto · 2025-02-08T21:30:52Z

The changes seem very clean. It's reasonable to think that Annotated is never the type of interest to display.

Regarding typing_inspect, there's an open ticket. Would be easiest to remove it as a dependency once Python 3.9 reaches end of life #472

elijahbenizzy · 2025-02-09T05:41:46Z

This looks good! Technically this could be construed as backwards incompatible (E.G. if you were using the node -> dict representation), but I'm not particularly concerned about that. Specifically, annotated is metadata that may not be serialized, so I'm OK with it. We also don't make guarantees on the stability of the node to dict function. But if you wanted to be a stickleer about backwards compatibility, you could add a parameter (hide_annotated) that defaults to false.

Up to you -- happy to merge either way!

cswartzvi · 2025-02-09T14:15:43Z

@elijahbenizzy Ah, wow, I didn't even think about that incompatibility. Thinking out loud here... I see that the result of HamiltonNode.to_dict is used in both the mlflow and openlineage plugins as a way (I believe) to recreate a hamilton run. I unfortunately do not use either mlflow or openlineage (yet), but wouldn't including Annotated in the serialized result prevent it from recreating the run (in a type checked manner)?

I completely understand that it is a breaking change that someone out there could be relying on, perhaps it's better to introduce something like as temporary provision:

def get_type_as_string(type_: Type, resolve_annotated: bool = True) -> Optional[str]:
    """Get a string representation of a type.

    The logic supports the evolution of the type system between 3.8 and 3.10.
    :param type_: Any Type object. Typically the node type found at Node.type.
    :param resolve_annotated: Determines if Annotated types should be resolved to their underlying type.
    :return: string representation of the type. An empty string if everything fails.
    """

    if resolve_annotated and _is_annotated_type(type_):
        type_string = get_type_as_string(typing.get_args(type_)[0])
    elif getattr(type_, "__name__", None):
        type_string = type_.__name__
    elif typing_inspect.get_origin(type_):
        base_type = typing_inspect.get_origin(type_)
        type_string = get_type_as_string(base_type)
    elif getattr(type_, "__repr__", None):
        type_string = type_.__repr__()
    else:
        type_string = None

    return type_string

With all that said, if you think that mlflow or openlineage would benefit from having the actual type (vice Annotated), then my vote is to merge as is.

elijahbenizzy · 2025-02-09T18:04:30Z

@elijahbenizzy Ah, wow, I didn't even think about that incompatibility. Thinking out loud here... I see that the result of HamiltonNode.to_dict is used in both the mlflow and openlineage plugins as a way (I believe) to recreate a hamilton run. I unfortunately do not use either mlflow or openlineage (yet), but wouldn't including Annotated in the serialized result prevent it from recreating the run (in a type checked manner)?

I completely understand that it is a breaking change that someone out there could be relying on, perhaps it's better to introduce something like as temporary provision:
def get_type_as_string(type_: Type, resolve_annotated: bool = True) -> Optional[str]:
    """Get a string representation of a type.

    The logic supports the evolution of the type system between 3.8 and 3.10.
    :param type_: Any Type object. Typically the node type found at Node.type.
    :param resolve_annotated: Determines if Annotated types should be resolved to their underlying type.
    :return: string representation of the type. An empty string if everything fails.
    """

    if resolve_annotated and _is_annotated_type(type_):
        type_string = get_type_as_string(typing.get_args(type_)[0])
    elif getattr(type_, "__name__", None):
        type_string = type_.__name__
    elif typing_inspect.get_origin(type_):
        base_type = typing_inspect.get_origin(type_)
        type_string = get_type_as_string(base_type)
    elif getattr(type_, "__repr__", None):
        type_string = type_.__repr__()
    else:
        type_string = None

    return type_string
With all that said, if you think that mlflow or openlineage would benefit from having the actual type (vice Annotated), then my vote is to merge as is.

Yeah, so thinking about it, Annotated (without the right information, as it was in the diagram) is a relatively useless type (the prior string), so I think this is strictly better -- we can call it a bug that we fixed?

cswartzvi · 2025-02-10T17:32:50Z

@elijahbenizzy That sounds good to me! I also had an idea (that I mention only for the sake of posterity), that we could separate the Annotated type from the metadata and return both (in a more formalized result) if it was ever needed by a consumer.

skrawcz · 2025-02-10T23:39:31Z

okay if you rebase from main then dask issues should go away.

elijahbenizzy · 2025-02-11T03:40:38Z

tests/test_type_utils.py

@@ -188,6 +194,12 @@ def test_get_type_as_string(type_):
        pytest.fail(f"test get_type_as_string raised: {e}")


+def test_type_as_string_with_annotated_type():


If you feel like doing more testing work we should really test the string output of these, not just that they fail. But that's out of scope.

ellipsis-dev bot reviewed Feb 8, 2025

View reviewed changes

skrawcz reviewed Feb 8, 2025

View reviewed changes

cswartzvi added 4 commits February 10, 2025 18:59

Use first argument of Annotated in string representation

578073e

Updated and add tests

fc5f3c5

Use existing _is_annotated_type

d97d145

Using typing_extensions for Annotated backport

023a384

cswartzvi force-pushed the allow_annotated branch from a48b6c1 to 023a384 Compare February 11, 2025 00:00

elijahbenizzy reviewed Feb 11, 2025

View reviewed changes

elijahbenizzy approved these changes Feb 11, 2025

View reviewed changes

elijahbenizzy merged commit b0929b0 into DAGWorks-Inc:main Feb 11, 2025
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve `Annotated` types (PEP 593) on dataflow visualizations #1276

Resolve `Annotated` types (PEP 593) on dataflow visualizations #1276

cswartzvi commented Feb 8, 2025

ellipsis-dev bot left a comment

skrawcz left a comment

zilto commented Feb 8, 2025 •

edited

Loading

elijahbenizzy commented Feb 9, 2025

cswartzvi commented Feb 9, 2025 •

edited

Loading

elijahbenizzy commented Feb 9, 2025

cswartzvi commented Feb 10, 2025 •

edited

Loading

skrawcz commented Feb 10, 2025

elijahbenizzy Feb 11, 2025

		@@ -188,6 +194,12 @@ def test_get_type_as_string(type_):
		pytest.fail(f"test get_type_as_string raised: {e}")


		def test_type_as_string_with_annotated_type():

Resolve Annotated types (PEP 593) on dataflow visualizations #1276

Resolve Annotated types (PEP 593) on dataflow visualizations #1276

Conversation

cswartzvi commented Feb 8, 2025

Changes

How I tested this

Checklist

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

skrawcz left a comment

Choose a reason for hiding this comment

zilto commented Feb 8, 2025 • edited Loading

elijahbenizzy commented Feb 9, 2025

cswartzvi commented Feb 9, 2025 • edited Loading

elijahbenizzy commented Feb 9, 2025

cswartzvi commented Feb 10, 2025 • edited Loading

skrawcz commented Feb 10, 2025

elijahbenizzy Feb 11, 2025

Choose a reason for hiding this comment

Resolve `Annotated` types (PEP 593) on dataflow visualizations #1276

Resolve `Annotated` types (PEP 593) on dataflow visualizations #1276

zilto commented Feb 8, 2025 •

edited

Loading

cswartzvi commented Feb 9, 2025 •

edited

Loading

cswartzvi commented Feb 10, 2025 •

edited

Loading