Cherry-pick commit Support intermediate aggs in Orca plans (#13707) #741

leborchuk · 2024-11-29T11:08:22Z

Here I cherry-picked commit Jun 28, 2022 Support intermediate aggs in Orca plans (#13707) greenplum-db/gpdb-archive@5bf9fd5

This commit conflicts with 0a2bc5d Move per-agg and per-trans duplicate finding to the planner. - refactoring made by hlinnaka@

The interesting thing is that the commit is from November 24th, 2020 (which is more than a year older than the CBDB fork date) and is absent in the original gpdb repository. Also, I didn't find a PR where it was committed. It seems for me like a CBDB feature.

So I decided to create separate PR only for that commit

Fixes #ISSUE_Number

What does this PR do?

Type of Change

Bug fix (non-breaking change)
New feature (non-breaking change)
Breaking change (fix or feature with breaking changes)
Documentation update

Breaking Changes

Test Plan

Unit tests added/updated
Integration tests added/updated
Passed make installcheck
Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Followed contribution guide
Added/updated documentation
Reviewed code for security implications
Requested review from cloudberry committers

Additional Context

jiaqizho · 2024-12-02T05:46:33Z

need waitting the ic-good-opt-on enable... current CI have not cover the ORCA in ICW tests..

my-ship-it · 2024-12-02T08:33:46Z

need waitting the ic-good-opt-on enable... current CI have not cover the ORCA in ICW tests..

@edespino Ed, appreciate your help on it.

edespino · 2024-12-02T09:04:46Z

need waitting the ic-good-opt-on enable... current CI have not cover the ORCA in ICW tests..

@edespino Ed, appreciate your help on it.

I have a branch with the changes. My repos's workflow is running: https://github.com/edespino/cloudberry/actions/runs/12115945979

edespino · 2024-12-02T09:19:40Z

@my-ship-it & @jiaqizho

My branch has a single commit based on this PR branch that is necessary to enable the ic-good-opt-on test configuration. Please note, it may have diffs that may or may not be related to this PR.

My branch: https://github.com/edespino/cloudberry/tree/refs/heads/CherryPickOrcaFeb2023_3-ic-good-opt-on
My workflow: https://github.com/edespino/cloudberry/actions/runs/12115945979

leborchuk · 2024-12-05T08:28:48Z

It seems to me that the ic-good-opt-off tests are flaky.

@edespino Ed, appreciate you help. ic-good-opt-on tests from Ed branch are green.

my-ship-it · 2024-12-09T05:17:27Z

It seems to me that the ic-good-opt-off tests are flaky.

@edespino Ed, appreciate you help. ic-good-opt-on tests from Ed branch are green.

@leborchuk @jiaqizho We need to investigate what flaky and fix them, thanks!

edespino · 2024-12-09T05:21:14Z

It seems to me that the ic-good-opt-off tests are flaky.

@edespino Ed, appreciate you help. ic-good-opt-on tests from Ed branch are green.

Thanks for flagging this. Since these flaky test failures involve the orca query optimizer behavior, we should first have an engineer familiar with the orca query optimizer review the test issues. Once they can determine whether this is a product issue or an environmental/infrastructure problem, I can assist with any infrastructure-related fixes needed. Please loop me back in after that assessment.

leborchuk · 2024-12-16T14:20:41Z

I see here only failed

test uao_compaction/stats         ... FAILED      333 ms (diff  123 ms)
test uao_compaction/index_stats   ... FAILED      575 ms (diff  139 ms)
test uao_compaction/index         ... ok          161 ms (diff   59 ms)
test uao_compaction/drop_column   ... FAILED      165 ms (diff  122 ms)

tests in Apache Cloudberry Build / ic-good-opt-off (pull_request) workflow run

I'm sure they will succeed if I run them again. (I cannot do it myself; I do not have the rights to launch the tests one more time.)

I opened issue to fix it - #789

jiaqizho · 2024-12-24T05:58:31Z

rebase pls. ic-good-opt-on already added in CI

jiaqizho · 2024-12-24T06:08:59Z

CREATE SEQUENCE complex_seq CACHE 1;
CREATE TABLE complex_ttbl (id INT4 DEFAULT NEXTVAL('complex_seq'), c COMPLEX) DISTRIBUTED BY (c);

INSERT INTO complex_ttbl(c) VALUES (COMPLEX(NEXTVAL('complex_seq'), NEXTVAL('complex_seq')));
INSERT INTO complex_ttbl(c) VALUES (COMPLEX(NEXTVAL('complex_seq'), NEXTVAL('complex_seq')));
INSERT INTO complex_ttbl(c) VALUES (COMPLEX(NEXTVAL('complex_seq'), NEXTVAL('complex_seq')));
INSERT INTO complex_ttbl(c) VALUES (COMPLEX(NEXTVAL('complex_seq'), NEXTVAL('complex_seq')));
INSERT INTO complex_ttbl(c) VALUES (COMPLEX(NEXTVAL('complex_seq'), NEXTVAL('complex_seq')));
INSERT INTO complex_ttbl(c) VALUES (COMPLEX(NEXTVAL('complex_seq'), NEXTVAL('complex_seq')));

INSERT INTO complex_ttbl(c) VALUES (COMPLEX(0, 0));
INSERT INTO complex_ttbl(c) VALUES (COMPLEX(0, -0));
INSERT INTO complex_ttbl(c) VALUES (COMPLEX(-0, 0));
INSERT INTO complex_ttbl(c) VALUES (COMPLEX(-0, -0));
SELECT COUNT(c) = 4 AS tr, COUNT(DISTINCT gp_segment_id) = 1 AS tr 
	FROM complex_ttbl 
	WHERE re(c) = 0 AND im(c) = 0;

hi @leborchuk, check this case pls, i think executor missing some implements..

leborchuk · 2025-01-05T10:36:44Z

Sorry, I tried to understand whats could be wrong but failed. All works fine in my dev environment.

The query

SELECT COUNT(c) = 4 AS tr, COUNT(DISTINCT gp_segment_id) = 1 AS tr 
	FROM complex_ttbl 
	WHERE re(c) = 0 AND im(c) = 0;

works great on current head, and execution plan is the same after I added patch Support intermediate aggs in Orca plans (#13707):

                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=0.00..431.00 rows=1 width=16)
   Output: (count(c) = 4), (count(gp_segment_id) = 1)
   ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=12)
         Output: gp_segment_id, (PARTIAL count(c))
         ->  Partial GroupAggregate  (cost=0.00..431.00 rows=1 width=12)
               Output: gp_segment_id, PARTIAL count(c)
               Group Key: complex_ttbl.gp_segment_id
               ->  Sort  (cost=0.00..431.00 rows=1 width=20)
                     Output: c, gp_segment_id
                     Sort Key: complex_ttbl.gp_segment_id
                     ->  Seq Scan on public.complex_ttbl  (cost=0.00..431.00 rows=1 width=20)
                           Output: c, gp_segment_id
                           Filter: ((re(complex_ttbl.c) = '0'::double precision) AND (im(complex_ttbl.c) = '0'::double precision))
 Optimizer: Pivotal Optimizer (GPORCA)
(14 rows)

The only difference between head and fix I cherry-picked I see in the test query

CREATE TABLE onek (
        unique1         int4,
        unique2         int4,
        two                     int4,
        four            int4,
        ten                     int4,
        twenty          int4,
        hundred         int4,
        thousand        int4,
        twothousand     int4,
        fivethous       int4,
        tenthous        int4,
        odd                     int4,
        even            int4,
        stringu1        name,
        stringu2        name,
        string4         name
);
copy onek from '/home/xifos/git/cloudberry/src/test/regress/data/onek.data';
analyze onek;
SET optimizer_trace_fallback to on;
select ten, count(four), sum(DISTINCT four) from onek
group by ten order by ten;

The new explain plan is (gporca does not fallback to postgres optimizer anymore)

postgres=# explain verbose select ten, count(four), sum(DISTINCT four) from onek
group by ten order by ten;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..431.14 rows=10 width=20)
   Output: ten, (count(four)), (sum(four))
   Merge Key: ten
   ->  Sort  (cost=0.00..431.14 rows=4 width=20)
         Output: ten, (count(four)), (sum(four))
         Sort Key: onek.ten
         ->  Finalize HashAggregate  (cost=0.00..431.14 rows=4 width=20)
               Output: ten, count(four), sum(four)
               Group Key: onek.ten
               ->  Redistribute Motion 3:3  (slice2; segments: 3)  (cost=0.00..431.14 rows=8 width=16)
                     Output: four, ten, (PARTIAL count(four))
                     Hash Key: ten
                     ->  Partial HashAggregate  (cost=0.00..431.14 rows=8 width=16)
                           Output: four, ten, PARTIAL count(four)
                           Group Key: onek.ten, onek.four
                           ->  Redistribute Motion 3:3  (slice3; segments: 3)  (cost=0.00..431.14 rows=8 width=16)
                                 Output: four, ten, (PARTIAL count(four))
                                 Hash Key: ten, four
                                 ->  Streaming Partial HashAggregate  (cost=0.00..431.14 rows=8 width=16)
                                       Output: four, ten, PARTIAL count(four)
                                       Group Key: onek.ten, onek.four
                                       ->  Seq Scan on public.onek  (cost=0.00..431.05 rows=334 width=8)
                                             Output: four, ten
 Optimizer: Pivotal Optimizer (GPORCA)
(24 rows)

jiaqizho · 2025-01-08T02:32:33Z

Would you like rebase again? I'm not sure why CI is not triggered.

Orca in GPDB6 has support for intermediate aggs, which isn't used in postgres. This is useful when we have a DQA and a regular "ride-along" agg. However, we need to differentiate when we should run the combine/final/trans functions when this ride-along agg is present. This commit re-adds support for intermediate aggs. The logic here is the same as 6X, however, instead of explicitly using the aggstage, we use the aggsplit, which is determined from the aggstage. The logic is defined in `AGGSPLIT_INTERNMEDIATE`. The changes in nodeAgg.c are to allow the aggref and aggstate to differ for an aggregate. This is necessary and expected in the case of an intermediate agg, as the loop will iterate over each aggstate->aggs, but the aggsplit can now be different between the aggref and the aggstate. Thus the aggsplit references are also changed to use aggref instead of aggstate.

leborchuk · 2025-01-09T09:26:52Z

rebased )

jiaqizho · 2025-01-10T05:33:58Z

@edespino hi eddie, why this CI checks been skipped?

edespino · 2025-01-15T02:18:01Z

@jiaqizho CI is being skipped because it is used the old PR template.

Additional Context
⚠️ To skip CI: Add [skip ci] to your PR title. Only use when necessary! ⚠️

I will remove it from the body of the PR and CI should start.

jiaqizho · 2025-01-16T09:13:08Z

@leborchuk some of cases failed in ic-good-opt-on. take a look pls. :)

i guess(no verified) we need also cherry-pick these commits(intermedia agg relatived)

0c7e41148c562e7e758748e08e7bf9adde050054 Fix incorrect plan / output in multi stage agg
0f4af19b71f2f31acd28c39d0d0a865cbf1d3bab Fix crash of AggNode in executor casued by ORCA plan (#14577)
701291cbab42f032bdb6ec70023f4b6d2f53abdc Force two-stage local aggregate to remove duplicates

my-ship-it requested a review from jiaqizho December 2, 2024 02:34

avamingli added the cherry-pick cherry-pick upstream commts label Dec 4, 2024

leborchuk force-pushed the CherryPickOrcaFeb2023_3 branch from 71fd840 to c2d4bfa Compare January 4, 2025 18:12

leborchuk force-pushed the CherryPickOrcaFeb2023_3 branch from c2d4bfa to 7ff1872 Compare January 9, 2025 09:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry-pick commit Support intermediate aggs in Orca plans (#13707) #741

Cherry-pick commit Support intermediate aggs in Orca plans (#13707) #741

leborchuk commented Nov 29, 2024 •

edited by edespino

Loading

jiaqizho commented Dec 2, 2024

my-ship-it commented Dec 2, 2024

edespino commented Dec 2, 2024

edespino commented Dec 2, 2024

leborchuk commented Dec 5, 2024

my-ship-it commented Dec 9, 2024

edespino commented Dec 9, 2024

leborchuk commented Dec 16, 2024 •

edited

Loading

jiaqizho commented Dec 24, 2024

jiaqizho commented Dec 24, 2024

leborchuk commented Jan 5, 2025

jiaqizho commented Jan 8, 2025

leborchuk commented Jan 9, 2025

jiaqizho commented Jan 10, 2025

edespino commented Jan 15, 2025

jiaqizho commented Jan 16, 2025 •

edited

Loading

Cherry-pick commit Support intermediate aggs in Orca plans (#13707) #741

Are you sure you want to change the base?

Cherry-pick commit Support intermediate aggs in Orca plans (#13707) #741

Conversation

leborchuk commented Nov 29, 2024 • edited by edespino Loading

What does this PR do?

Type of Change

Breaking Changes

Test Plan

Impact

Checklist

Additional Context

jiaqizho commented Dec 2, 2024

my-ship-it commented Dec 2, 2024

edespino commented Dec 2, 2024

edespino commented Dec 2, 2024

leborchuk commented Dec 5, 2024

my-ship-it commented Dec 9, 2024

edespino commented Dec 9, 2024

leborchuk commented Dec 16, 2024 • edited Loading

jiaqizho commented Dec 24, 2024

jiaqizho commented Dec 24, 2024

leborchuk commented Jan 5, 2025

jiaqizho commented Jan 8, 2025

leborchuk commented Jan 9, 2025

jiaqizho commented Jan 10, 2025

edespino commented Jan 15, 2025

jiaqizho commented Jan 16, 2025 • edited Loading

leborchuk commented Nov 29, 2024 •

edited by edespino

Loading

leborchuk commented Dec 16, 2024 •

edited

Loading

jiaqizho commented Jan 16, 2025 •

edited

Loading