-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-6786] ANY/SOME operator yields multiple rows in correlated queries #4147
[CALCITE-6786] ANY/SOME operator yields multiple rows in correlated queries #4147
Conversation
// only in the case when one of the values is true. | ||
// When true value is absent then we are interested | ||
// only in false value. | ||
builder.sortLimit(0, 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This includes the order by cs limit 1
clause in the correlated query case. Without it, multiple rows are returned, as described in the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this was written by @vvysotskyi about 6 years ago, it would be great if he could comment.
|
EnumerableTableScan(table=[[scott, EMP]]) | ||
EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], DEPTNO=[$t0], $f1=[$t3]) | ||
EnumerableTableScan(table=[[scott, DEPT]]) | ||
EnumerableCalc(expr#0..2=[{inputs}], expr#3=[RAND()], expr#4=[CAST($t3):INTEGER NOT NULL], expr#5=[2], expr#6=[MOD($t4, $t5)], expr#7=[3], expr#8=[=($t6, $t7)], expr#9=[OR($t8, $t2)], SAL=[$t0], $condition=[$t9]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all these now contain correlated subqueries.
I wonder whether these tests expect uncorrelated results. This would indicate that either the decorrelator fails on the new plans, or perhaps it's run too early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding, the plan change is basically that the EnumerableMergeJoin
was replaced with a EnumerableCorrelate
. Aren't they similar in this case?
Naturally, the sort + limit is also introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the plan after this PR is correct.
For decorrelate failures, [CALCITE-6652] reported this issue. I'm trying to submit a PR next month, and after that, the plan should be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that correlated queries in general are much less efficient than uncorrelated ones.
Moreover, the Calcite decorrelator can only decorrelate a limited number of patterns.
That's why a decorrelated plan is preferred to a plan that has correlated subqueries.
However, correctness is more important than performance, so we should take this change if it fixes a correctness bug.
The question was whether we could have both correctness and performance, but that can be part of a separate issue.
No description provided.