Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deduplicate_inputs should not try to group inputs #617

Open
0x26res opened this issue Jan 2, 2024 · 1 comment
Open

deduplicate_inputs should not try to group inputs #617

0x26res opened this issue Jan 2, 2024 · 1 comment
Labels
enhancement New feature or request visualization

Comments

@0x26res
Copy link

0x26res commented Jan 2, 2024

When calling Drive.visualize_execution with the flag deduplicate_inputs=True the resulting graph has got one node for each possible combination of inputs that are connected to the dag.

Current behavior

In the example below, inputs a and b should be indenpent nodes. But there is a is also a node for a and b. So you run the risk of having as many inputs node as there are combination of inputs in the downstream functions.

Screenshots

image

Steps to replicate behavior

def a_plus_b(a: int, b: int) -> int:
    return a + b


def a_times_2(a: int) -> int:
    return a * 2


def b_times_2(b: int) -> int:
    return b * 2


def everything(a_plus_b: int, a_times_2: int, b_times_2: int) -> int:
    return a_plus_b + a_times_2 + b_times_2
def test_wrong_inputs():
    driver = Driver(
        {},
        example_module,
        adapter=SimplePythonGraphAdapter(DictResult()),
    )
    inputs = {"a": 1, "b": 2}
    return driver.visualize_execution(
        final_vars=["everything"],
        inputs=inputs,
        output_file_path="./visualize_inputs.dot",
        deduplicate_inputs=True,
    )

Library & System Information

hamilton==1.42.0

Expected behavior

There should be one node of a, one for b and no node for a and b.

@0x26res 0x26res added the triage label for issues that need to be triaged. label Jan 2, 2024
@zilto
Copy link
Collaborator

zilto commented Jan 2, 2024

(copying my reply from a Slack thread)

What's displayed is the intended behavior of 2 changes compared to the early viz feature, but I overlooked one feature.

Change 1: In a viz, a function node actually has a single "input" node that contains "the set of inputs node" of the function node. Therefore, a_plus_b would always have the box with a and b.
Motivation: this ways, inputs are always locate together to better understand what the node requires (would get very messy otherwise, following lengthy edges)

Change 2: Then, the deduplicate_inputs flag deduplicates "input nodes in the viz" which are actually "the set of inputs node" mentioned in Change 1.
Motivation: If True, you guarantee that inputs are near the function node. If False, you can reduce clutter. It was difficult to pick a default because the best option depends on the specifics of the DAG

Potential solution: The issue raised appears related to Change 1 because there is currently no way to ungroup "the set of inputs node". It shouldn't be difficult to implement an group_inputs flag. Then, the deduplicate_inputs
would work as expected. I now realize that the semantics of deduplicate_inputs are confusing given the grouping feature. If you have suggestions for a better argument name, I'll take it!

@skrawcz skrawcz added enhancement New feature or request and removed triage label for issues that need to be triaged. labels Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request visualization
Projects
None yet
Development

No branches or pull requests

3 participants