Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-6786] ANY/SOME operator yields multiple rows in correlated queries #4147

Merged

Conversation

racevedoo
Copy link
Contributor

No description provided.

// only in the case when one of the values is true.
// When true value is absent then we are interested
// only in false value.
builder.sortLimit(0, 1,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This includes the order by cs limit 1 clause in the correlated query case. Without it, multiple rows are returned, as described in the issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this was written by @vvysotskyi about 6 years ago, it would be great if he could comment.

EnumerableTableScan(table=[[scott, EMP]])
EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], DEPTNO=[$t0], $f1=[$t3])
EnumerableTableScan(table=[[scott, DEPT]])
EnumerableCalc(expr#0..2=[{inputs}], expr#3=[RAND()], expr#4=[CAST($t3):INTEGER NOT NULL], expr#5=[2], expr#6=[MOD($t4, $t5)], expr#7=[3], expr#8=[=($t6, $t7)], expr#9=[OR($t8, $t2)], SAL=[$t0], $condition=[$t9])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these now contain correlated subqueries.
I wonder whether these tests expect uncorrelated results. This would indicate that either the decorrelator fails on the new plans, or perhaps it's run too early.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, the plan change is basically that the EnumerableMergeJoin was replaced with a EnumerableCorrelate. Aren't they similar in this case?

Naturally, the sort + limit is also introduced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the plan after this PR is correct.
For decorrelate failures, [CALCITE-6652] reported this issue. I'm trying to submit a PR next month, and after that, the plan should be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that correlated queries in general are much less efficient than uncorrelated ones.
Moreover, the Calcite decorrelator can only decorrelate a limited number of patterns.
That's why a decorrelated plan is preferred to a plan that has correlated subqueries.
However, correctness is more important than performance, so we should take this change if it fixes a correctness bug.
The question was whether we could have both correctness and performance, but that can be part of a separate issue.

@mihaibudiu mihaibudiu added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Jan 20, 2025
@mihaibudiu mihaibudiu merged commit 3fce658 into apache:main Jan 21, 2025
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants