Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve Annotated types (PEP 593) on dataflow visualizations #1276

Merged
merged 4 commits into from
Feb 11, 2025

Conversation

cswartzvi
Copy link
Contributor

Recently, I was attempting to use Annotated types (PEP 593) for the output of some nodes (see A):

from typing import Annotated

def A() -> Annotated[int, "metadata"]:
    return 42

def B(A: int) -> float:
    """Divide A by 3"""
    return A / 3

def C(A: int, B: float) -> float:
    """Square A and multiply by B"""
    return A**2 * B

if __name__ == "__main__":
    import __main__
    from hamilton import driver

    dr = driver.Builder().with_modules(__main__).build()
    result = dr.execute(["C"])
    print(result)

The execution works as excepted, but when I run dr.display_all_functions("dag.png") the type of A is masked as Annotated on the resulting graphic:

dag1

Ideally, IMHO, we would want to see the underlying type such as:

dag2

Changes

I updated hamilton.htypes.get_type_as_string so that it will consider Annotated types. Note that it appears that typing.get_origin and typing_inspect.get_origin do not produce the same result for Annotated types (see ilevkivskyi/typing_inspect#109).

How I tested this

I updated the current test for get_type_as_string in test_type_utils.py and added an additional check focusing on Annotated types.

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to ace24a3 in 1 minute and 10 seconds

More details
  • Looked at 15 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. hamilton/htypes.py:102
  • Draft comment:
    Consider using _get_origin (already imported from typing or typing_extensions) instead of typing.get_origin for consistency across the file.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None
2. hamilton/htypes.py:102
  • Draft comment:
    Add a Sphinx documentation note under docs/ explaining how Annotated types are handled in visualizations.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
3. hamilton/htypes.py:102
  • Draft comment:
    To ensure compatibility with Python versions below 3.9, consider using the existing alias (e.g. column) instead of directly referencing Annotated. In the current conditional, Annotated might be undefined when running on older Python versions.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    The comment suggests using column alias instead of Annotated for compatibility, but the code already handles this through version-specific imports. The direct use of Annotated here is inside a function that's used after the proper imports are set up. The code is actually well-structured for version compatibility.
    I could be missing some edge case where the imports fail. Maybe there's a reason they specifically created the column alias that I'm not seeing.
    The code explicitly handles Python version compatibility through proper imports and version checks. The use of Annotated here is safe because it will be properly defined regardless of Python version.
    Delete the comment. The code already properly handles Python version compatibility through explicit version checks and appropriate imports.
4. hamilton/htypes.py:114
  • Draft comment:
    Add Sphinx documentation to explain how Annotated types are resolved (e.g. in a new docs/type_annotations.rst section) to help users understand the visualization and metadata behavior.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_3xrEuMQOW72G2x0l


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Collaborator

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. @elijahbenizzy any thoughts?

Tests seem to be failing due to dask releasing a new version -- will try to see if I can make a fix on a separate branch for them.

@zilto
Copy link
Collaborator

zilto commented Feb 8, 2025

The changes seem very clean. It's reasonable to think that Annotated is never the type of interest to display.

Regarding typing_inspect, there's an open ticket. Would be easiest to remove it as a dependency once Python 3.9 reaches end of life #472

@elijahbenizzy
Copy link
Collaborator

This looks good! Technically this could be construed as backwards incompatible (E.G. if you were using the node -> dict representation), but I'm not particularly concerned about that. Specifically, annotated is metadata that may not be serialized, so I'm OK with it. We also don't make guarantees on the stability of the node to dict function. But if you wanted to be a stickleer about backwards compatibility, you could add a parameter (hide_annotated) that defaults to false.

Up to you -- happy to merge either way!

@cswartzvi
Copy link
Contributor Author

cswartzvi commented Feb 9, 2025

@elijahbenizzy Ah, wow, I didn't even think about that incompatibility. Thinking out loud here... I see that the result of HamiltonNode.to_dict is used in both the mlflow and openlineage plugins as a way (I believe) to recreate a hamilton run. I unfortunately do not use either mlflow or openlineage (yet), but wouldn't including Annotated in the serialized result prevent it from recreating the run (in a type checked manner)?

I completely understand that it is a breaking change that someone out there could be relying on, perhaps it's better to introduce something like as temporary provision:

def get_type_as_string(type_: Type, resolve_annotated: bool = True) -> Optional[str]:
    """Get a string representation of a type.

    The logic supports the evolution of the type system between 3.8 and 3.10.
    :param type_: Any Type object. Typically the node type found at Node.type.
    :param resolve_annotated: Determines if Annotated types should be resolved to their underlying type.
    :return: string representation of the type. An empty string if everything fails.
    """

    if resolve_annotated and _is_annotated_type(type_):
        type_string = get_type_as_string(typing.get_args(type_)[0])
    elif getattr(type_, "__name__", None):
        type_string = type_.__name__
    elif typing_inspect.get_origin(type_):
        base_type = typing_inspect.get_origin(type_)
        type_string = get_type_as_string(base_type)
    elif getattr(type_, "__repr__", None):
        type_string = type_.__repr__()
    else:
        type_string = None

    return type_string

With all that said, if you think that mlflow or openlineage would benefit from having the actual type (vice Annotated), then my vote is to merge as is.

@elijahbenizzy
Copy link
Collaborator

@elijahbenizzy Ah, wow, I didn't even think about that incompatibility. Thinking out loud here... I see that the result of HamiltonNode.to_dict is used in both the mlflow and openlineage plugins as a way (I believe) to recreate a hamilton run. I unfortunately do not use either mlflow or openlineage (yet), but wouldn't including Annotated in the serialized result prevent it from recreating the run (in a type checked manner)?

I completely understand that it is a breaking change that someone out there could be relying on, perhaps it's better to introduce something like as temporary provision:

def get_type_as_string(type_: Type, resolve_annotated: bool = True) -> Optional[str]:
    """Get a string representation of a type.

    The logic supports the evolution of the type system between 3.8 and 3.10.
    :param type_: Any Type object. Typically the node type found at Node.type.
    :param resolve_annotated: Determines if Annotated types should be resolved to their underlying type.
    :return: string representation of the type. An empty string if everything fails.
    """

    if resolve_annotated and _is_annotated_type(type_):
        type_string = get_type_as_string(typing.get_args(type_)[0])
    elif getattr(type_, "__name__", None):
        type_string = type_.__name__
    elif typing_inspect.get_origin(type_):
        base_type = typing_inspect.get_origin(type_)
        type_string = get_type_as_string(base_type)
    elif getattr(type_, "__repr__", None):
        type_string = type_.__repr__()
    else:
        type_string = None

    return type_string

With all that said, if you think that mlflow or openlineage would benefit from having the actual type (vice Annotated), then my vote is to merge as is.

Yeah, so thinking about it, Annotated (without the right information, as it was in the diagram) is a relatively useless type (the prior string), so I think this is strictly better -- we can call it a bug that we fixed?

@cswartzvi
Copy link
Contributor Author

cswartzvi commented Feb 10, 2025

@elijahbenizzy That sounds good to me! I also had an idea (that I mention only for the sake of posterity), that we could separate the Annotated type from the metadata and return both (in a more formalized result) if it was ever needed by a consumer.

@skrawcz
Copy link
Collaborator

skrawcz commented Feb 10, 2025

okay if you rebase from main then dask issues should go away.

@@ -188,6 +194,12 @@ def test_get_type_as_string(type_):
pytest.fail(f"test get_type_as_string raised: {e}")


def test_type_as_string_with_annotated_type():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you feel like doing more testing work we should really test the string output of these, not just that they fail. But that's out of scope.

@elijahbenizzy elijahbenizzy merged commit b0929b0 into DAGWorks-Inc:main Feb 11, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants