Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow get_entity_subset_from_asset_graph_subset to stil return EntitySubsets for serialized AssetGraphSubsets when the underlying partitions definition has gotten larger #27470

Merged
merged 1 commit into from
Feb 5, 2025

Conversation

gibsondan
Copy link
Member

@gibsondan gibsondan commented Jan 31, 2025

Summary & Motivation

Fixes an issue where a relatively benign change to a partitions definition like extending the range of the partition keys could cause a backfill to fail due to the partitions not exactly matching. Instead of rejecting the subset as invalid, it returns a corrected subset object with the latest partitions definition in it, as long as it is valid.

How I Tested These Changes

New test case

Changelog

NOCHANGELOG

Comment on lines 94 to 107
if (
not isinstance(self.value, TimeWindowPartitionsSubset)
or not isinstance(partitions_def, TimeWindowPartitionsDefinition)
or current_partitions_def.timezone != partitions_def.timezone
or current_partitions_def.fmt != partitions_def.fmt
or current_partitions_def.cron_schedule != current_partitions_def.cron_schedule
):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isinstance(self.value, TimeWindowPartitionsSubset) check can be removed since it's redundant - this code is already inside an elif block that confirms self.value is a TimeWindowPartitionsSubset. The redundant check will always evaluate to False, making the condition harder to understand.

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

or not isinstance(partitions_def, TimeWindowPartitionsDefinition)
or current_partitions_def.timezone != partitions_def.timezone
or current_partitions_def.fmt != partitions_def.fmt
or current_partitions_def.cron_schedule != current_partitions_def.cron_schedule
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cron_schedule comparison is comparing the schedule with itself (current_partitions_def.cron_schedule != current_partitions_def.cron_schedule). This should be comparing current_partitions_def.cron_schedule with partitions_def.cron_schedule instead.

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

@gibsondan gibsondan changed the title Allow get_entity_subset_from_asset_graph_subsetto stil return EntitySubsets for serialized AssetGraphSubsets when the underlying partitions definition has gotten larger Allow get_entity_subset_from_asset_graph_subset to stil return EntitySubsets for serialized AssetGraphSubsets when the underlying partitions definition has gotten larger Jan 31, 2025
@@ -66,6 +72,60 @@ def is_empty(self) -> bool:
else:
return not self.bool_value

def with_asset_graph_partitions_def(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did not yet change get_entity_subset_from_asset_graph_subset because it has existing callsites in DA that seem to expect it to return None if its invalid - wanted to run the possible gameplan by you there first @OwenKephart since you had mentioned making is_compatible_with_partitions_def more of a DA-specific thing...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's reasonable!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify - you think leaving the method as it is today is reasonable? Or bringing it in line with the changes I made in this PR?

@gibsondan gibsondan requested a review from OwenKephart January 31, 2025 23:24
@gibsondan
Copy link
Member Author

gibsondan commented Jan 31, 2025

Still planning to flesh out the test cases more but I think this is ready for a first look based on our conversation earlier owen (also curious for your thoughts on the inline question).

@@ -66,6 +72,60 @@ def is_empty(self) -> bool:
else:
return not self.bool_value

def with_asset_graph_partitions_def(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the name, maybe just with_current_partitions_def?

I also think this would be a bit cleaner if we passed in the AssetGraphView object (which has access to all of the parameters we're passing in)

Alternatively, we could just make this a private method on AssetGraphView itself, which I might bias towards personally

Copy link
Contributor

@OwenKephart OwenKephart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think the logic makes sense

It's possible that we'll want to refactor this in the future (or create a parallel version of this) that does just return None on incompatibilities instead of erroring, so that we can use this same logic in cases where those incompatibilities can be handled gracefully. However, I can't think of an easy way of doing that while preserving these error messages so this is ok for now.

Had some comments on structure in-line

Copy link
Contributor

@OwenKephart OwenKephart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

…ubsets for serialized AssetGraphSubsets when the underlying partitions definition has gotten larger

> Insert changelog entry or delete this section.
@gibsondan gibsondan merged commit 16ec75b into master Feb 5, 2025
4 of 5 checks passed
@gibsondan gibsondan deleted the viewvalidity branch February 5, 2025 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants