Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reintroduce slice fusion #11638

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Reintroduce slice fusion #11638

wants to merge 2 commits into from

Conversation

phofl
Copy link
Collaborator

@phofl phofl commented Jan 7, 2025

  • Closes #xxxx
  • Tests added / passed
  • Passes pre-commit run --all-files

This is mostly what the old implementation did...

I still have to add more docs and probably more tests too, but the general idea is there. We are only looking at fused tasks,
since they might have the pattern we care about that caused pydata/xarray#9926

The way we are currently modifying these fused tasks isn't great though, we are using the dictionary that defines the fused task and replace fused getitem tasks with an alias before modifying the parent task with the new slice, i.e.


inner_graph = {
    "a": Task("a", getitem, TaskRef("original_open_dataset"), slice(0, 100)),
    "b": Task("b", getitem, TaskRef("a"), slice(30, 50)),
}

# translates to

inner_graph = {
    "a": Task("a", getitem, TaskRef("original_open_dataset"), slice(30, 50)),
    "b": Alias("b", "a"),
}

We might want to consider adding an API for this instead of modifying the dictionary inplace? cc @fjetter for thoughts?

Copy link
Contributor

github-actions bot commented Jan 7, 2025

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

     15 files  ± 0       15 suites  ±0   3h 54m 47s ⏱️ + 3m 53s
 12 613 tests + 1   11 450 ✅ + 1   1 163 💤 ±0  0 ❌ ±0 
156 911 runs  +13  140 256 ✅ +14  16 655 💤  - 1  0 ❌ ±0 

Results for commit 21241c4. ± Comparison against base commit d280e16.

@fjetter
Copy link
Member

fjetter commented Jan 8, 2025

We might want to consider adding an API for this instead of modifying the dictionary inplace?

I'm open to adding new APIs as long as they are somewhat general. I wouldn't want to add a Task.maybe_fuse_slices method but a Task.unfuse which returns a list of the nested tasks (i.e. disassemble the dict for you) is something I could see working. Such an API is more costly than the inplace mutation, of course.

Is there any other API you have in mind that would be useful to have?

@fjetter
Copy link
Member

fjetter commented Jan 8, 2025

cc @hendrikmakait who's thinking about slimming down fuse and possibly changing its internal representation

@phofl
Copy link
Collaborator Author

phofl commented Jan 8, 2025

I honestly don't know what a good API here would look like. This all feel like horrible abstraction leaks. We might consider just living with the ugly thing here?

Rebuilding the list of nested tasks just puts the ugliness somewhere else

@fjetter
Copy link
Member

fjetter commented Jan 8, 2025

I find the fuse/unfuse to not be horrible. In fact, I could see this being useful outside of this application, e.g. if we encountered two sequential fused tasks and wanted to fuse those more efficiently like... Task.fuse(*t.unfuse() for t in all_tasks)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants