-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't add unneded null test in sorted query plan for minmax #6442
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6442 +/- ##
==========================================
- Coverage 87.36% 87.32% -0.04%
==========================================
Files 187 187
Lines 41895 41851 -44
Branches 9331 9312 -19
==========================================
- Hits 36602 36548 -54
- Misses 3624 3632 +8
- Partials 1669 1671 +2 ☔ View full report in Codecov by Sentry. |
@antekresic, @fabriziomello: please review this pull request.
|
src/planner/agg_bookend.c
Outdated
/* | ||
* Build "sort IS NOT NULL" expression. Not that target can still be NULL. | ||
* We don't need it if the order is NULLS LAST. | ||
*/ | ||
if (nulls_first) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dependent on if descending or ascending order is used. Do we have tests for all four combinations of ASC/DESC and NULLS FIRST/LAST?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to hit the desc nulls last
case. ts_preprocess_first_last_aggregates
tries the desc nulls first
ordering first, and it always succeeds (no reason why it would not, because it just builds a sorted path). I tried to address this as well by trying both paths and making a cost-based decision, but Postgres has zero cost for is null
clauses, so it doesn't work. I decided to stick to the most simple change in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you cannot hit this case, can you add an Assert
(or Ensure
) to make sure that we do not sort this incorrectly in some yet-to-be-found case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really want to change the ts_preprocess_first_last_aggregates
, because it's an old function that is a copy of Postgres' preprocess_minmax_aggregates
. As I mentioned, I tried to switch it to the cost-based decision, and it didn't work well because of the limitations of the PG cost model. If we limit the changes in this PR to this single if
statement that it's adding, do you think it has correctness problems by itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, the coverage problem that I mentioned exists in the current TimescaleDB code. I don't want to tacke it because it's a different change that is potentially problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct me if I have understood something wrong, but this discussion is based on that you are doing a refactoring to avoid pruning NULL values when not necessary, which would be when you want the first value in an ascending order (with null values sorted after non-null values) and when you want the last value in descending order (with null values sorted before non-null values).
AFAICT, the current code will remove the IS NOT NULL
when nulls_first
is true, which is when reverse
in the calling function is true, which in turn is when the ordering operator is ">" (descending order). This is correct when you want the last value (because null values are treated as larger than other values), but it is not correct when you want the first value (because then you would get a null value where you in the previous code would get a non-null value). The opposite holds when the ordering operator is "<" (ascending order).
If you can either explain to me where I misunderstood the change, add tests to make sure that we cannot pick the wrong value, or add assertions that capture a potential case where we could have picked the wrong value, it would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code path always takes the first value (LIMIT 1
). NULLS LAST
is independent of DESC/ASC, it means that nulls go last, so in this case we don't have to filter out the null values. Like this:
test=# select * from (values (1), (2), (null)) t(x) order by x nulls last;
x
───
1
2
¤
test=# select * from (values (1), (2), (null)) t(x) order by x desc nulls last;
x
───
2
1
¤
nulls last
is specified in SortGroupClause
explicitly, independent of ascending/descending order. So it doesn't matter if the sorting is desc or asc, for the purpose of removing the null test we only have to check if nulls go first or last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The coverage problem is that the calling code currently doesn't exercise the combinations of desc nulls last
and asc nulls first
, so effectively this optimization is only applied for asc nulls last
, i.e. the first
function. But it is still correct for desc nulls last
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the column is a partitioning column we can always skip the NULL check independent of sort attributes as those have NOT NULL constraint
Yeah, I also thought about this, but it is not very straightforward, let's merge this simple optimization for now. |
If we use NULLS LAST, we don't need the null test. This saves some CPU.
If we use NULLS LAST, we don't need the null test. This saves some CPU.
Disable-check: force-changelog-file