-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: don't eagerly materialize fields that the user hasn't asked for #3442
Conversation
if !self.projection_plan.physical_schema.fields.is_empty() { | ||
return Err(Error::invalid_input( | ||
"count_rows should not be called on a plan selecting columns".to_string(), | ||
location!(), | ||
)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little torn on this error. Ideally, we would just silently blank out the projection plan and then create the count plan. However, to do that we either have to clone the scan, which is a pretty big thing to be cloning, or modify the scanner, which would maybe not be what users would expect from count_rows
.
For now, I want to get something out soon, so I'm just raising an error, with the assumption that Scanner::count_rows
is a mostly internal method anyways (users should use Dataset::count_rows
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right that is pretty internal. Plus easy to work around.
// Start with the desired schema | ||
.union_schema(desired_schema) | ||
// Subtract columns that are expensive | ||
.subtract_predicate(|f| !self.is_early_field(f)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do "early" and "eager" mean the same thing in the vocabulary?
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3442 +/- ##
==========================================
- Coverage 78.93% 78.91% -0.03%
==========================================
Files 251 251
Lines 92267 92390 +123
Branches 92267 92390 +123
==========================================
+ Hits 72833 72910 +77
- Misses 16463 16504 +41
- Partials 2971 2976 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
We added logic a while back to eagerly materialize fields if they are narrow and there is a filter. However, we forgot to ensure that those fields are actually part of the final projection. The result is that we end up loading many columns the user doesn't want and then throwing them away.
This fix changes the set of fields we load to only be those that are asked for.