Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Exchange before GroupId to improve Partial Aggregation #24047

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aaneja
Copy link
Contributor

@aaneja aaneja commented Nov 14, 2024

See #23475 for more details

Previously closed PR - #11741

Description

Motivation and Context

See Javadoc of the new AddExchangesBelowPartialAggregationOverGroupIdRuleSet

Impact

Better performance for TPCDS Q22, Q67
See plan diffs (TPCDS SF 1000, unpartitioned) - https://aaneja.github.io/mypages/PR_24047_AddExchangesBelowPartialAggregationOverGroupId_OffVsOn.html

Test Plan

TODO : Add a new planner test

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Added a new optimizer rule to add exchanges below a combination of partial aggregation+ GroupId . Enabled with the boolean session property `enable_forced_exchange_below_group_id`


@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Nov 14, 2024
@aaneja aaneja force-pushed the groupIdExchangeOptimization branch from 95fb49c to 62aaab6 Compare November 14, 2024 12:13
@steveburnett
Copy link
Contributor

Thanks for the release note entry! Minor formatting nits, and include the PR number.

== RELEASE NOTES ==

General Changes
* Add a new optimizer rule to add exchanges below a combination of partial aggregation+ GroupId. Enabled with the boolean session property ``enable_forced_exchange_below_group_id``. :pr:`24047`

@aaneja aaneja force-pushed the groupIdExchangeOptimization branch from 62aaab6 to fa61dfd Compare November 18, 2024 12:58
@aaneja aaneja force-pushed the groupIdExchangeOptimization branch from fa61dfd to 39222b3 Compare January 9, 2025 17:03
@aaneja aaneja force-pushed the groupIdExchangeOptimization branch 2 times, most recently from 86b145a to 5868a2f Compare January 21, 2025 14:34
@aaneja aaneja marked this pull request as ready for review January 21, 2025 14:35
Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high level first pass. Seems good for the most part. I will take another pass and look at the details of the rule tomorrow.

.filter(entry -> entry.getCount() >= groupId.getGroupingSets().size() * GROUPING_SETS_SYMBOL_REQUIRED_FREQUENCY)
.map(Multiset.Entry::getElement)
// And only the symbols used in the aggregation (these are usually all symbols)
.peek(symbol -> verify(groupingKeys.contains(symbol)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want peek+verify here? I'm concerned about the case where verify fails. Will an exception fail the query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the query should fail. I will add a verification message to make this clearer

private static final double GROUPING_SETS_SYMBOL_REQUIRED_FREQUENCY = 0.5;
private static final double ANTI_SKEWNESS_MARGIN = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there any experimentation with these parameters?

Copy link
Contributor Author

@aaneja aaneja Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just lef them as-is while porting the change over from trinodb/trino#105. Test's for this weren't added either, and I could not come up with a good integ test to experiment with these.
As users test this feature out (disabled by default), we can tweak & test

aaneja added a commit to aaneja/mypages that referenced this pull request Jan 24, 2025
@aaneja aaneja force-pushed the groupIdExchangeOptimization branch from 5868a2f to 437a09a Compare January 24, 2025 05:16
return false;
}

return isEnabledAddExchangeBelowGroupId(session);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this should have 3 possible values - ALWAYS, COST_BASED, and NEVER (similar to partial aggregation pushdown). that way someone can enable this if they don't have stats or if the stats estimates are no good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would we re-partition on if ALWAYS is chosen (for the non-trivial case of more than one partition variable) ?

}

@Test
public void testAddExchangesWithoutProjection()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about a withProjection test. Also a test that it doesn't fire if it's disabled, only has one grouping set, has pass through keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'only has one grouping set' , added with this test

withProjection, does not fire if disabled -> will add

only has one grouping set, has pass through keys -> I could not build a use-case where this occurs. My understanding of when this could occur is unclear. Can you help me out with an example ?

Based on: trinodb/trino@dc1d66fb
co-authored-by: Piotr Findeisen <[email protected]>
Based on : trinodb/trino@c573b34
co-authored-by: Lukasz Stec <[email protected]>
Based on: trinodb/trino@29328d3
co-authored-by: praveenkrishna <[email protected]>
@aaneja aaneja force-pushed the groupIdExchangeOptimization branch from 437a09a to 979d204 Compare February 3, 2025 12:47
@steveburnett
Copy link
Contributor

Thanks for the release note entry! Minor formatting nits, and include the PR number.

== RELEASE NOTES ==

General Changes
* Add a new optimizer rule to add exchanges below a combination of partial aggregation+ GroupId. Enabled with the boolean session property ``enable_forced_exchange_below_group_id``. 

The minor formatting nits should still apply, but new release note guidelines as of last week: PR #24354 automatically adds links to this PR to the release notes. Please remove the manual PR link in the following format from the release note entries for this PR.

:pr:`12345`

I have updated the Release Notes Guidelines to remove the examples of manually adding the PR link.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants