-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize SQL expression comparison by caching hash codes (currently probably not needed) #34149
Comments
Side note: another case that could be interesting is the negation of an expression. Having it "ready to use" would make some predicate-related optimizations faster/cheaper. |
Blocked on making our entire SqlExpression tree immutable (#32927); SelectExpression is currently mutable, and since it can be contained inside most SqlExpressions (thanks @ranma42), we can't cache the hashcode. |
Though on second thought, SelectExpression should never be mutable when it's already composed upon... |
@ranma42 just to continue on my comment just above, I think this should be safe to implement even in the current bits - any case in which a mutable SelectExpression is contained within another expression (except ShapedQueryExpression) should be a violation of our current invariants: only the top-level select in the tree may be mutable. I still absolutely want to make SelectExpression fully immutable (#32927), but that's quite a big task, and I don't think it needs to block this optimization here. |
Note also #19859, which is about not exhaustively calculating hash codes for the entire tree, but just some shallower subset; this would further improve our performance here. |
If that is the case, I believe I can try and implement this. What would be the best way to evaluate the performance? (would a micro-benchmark make sense?) |
Sure, an ad-hoc BenchmarkDotNet benchmark could work - not sure we need to commit it etc. Though this is one of the cases where the benefits depend on the tree depth/complexity which you choose to benchmark, which is a bit arbitrary... I think it's OK to do this improvement in any case? |
For example, would it be legitimate to capture enumerable values and "clone" them into immutable lists during the processing? (I would assume that it should be fine for constants, as non-constants would instead be handled as parameters) |
@ranma42 yes, that makes sense, and unless I'm mistaken we already make that assumption in various places (i.e. SqlConstantExpression already calculates the value's hash code - though I'm indeed not sure how/where that's currently used). In regular compiler-generated incoming expression trees, ConstantExpression is only given for inline literals inside the query, which of course can't be mutated. We indeed generate SqlConstantExpressions later inside the query pipeilne itself, but I think that if we then mutated the value referenced by SqlConstantExpressions, that would be a bug... |
I did a couple of experiments in https://github.com/ranma42/efcore/tree/wip/hash-lazy and https://github.com/ranma42/efcore/tree/wip/hash-eager to evaluate the impact; the results are benchmarks.zip The In addition to that, I implemented some benchmarks that compare other affected operations:
Some notes:
Unsurprisingly, each implementation has its own strengths:
|
@ranma42 do you mind posting the benchmark results here in the issue (possibly inside Thanks for doing this work... Yeah, everything here makes sense of course. What's your opinion on where we should go? I don't have a super clear preference - but my leaning here is maybe slightly towards lazy: at least in the current query pipeline, I think most nodes are unlikely to actually be compared/hashed, since we don't compare for equality that often, and even when we do, most cases will fail early as the trees aren't identical. However, as we introduce more optimizations requiring node comparisons, this may shift the balance towards eager calculation. One possible interesting thing to measure would be to just run the full EF test suite for SQL Server with both systems, and compare the running time (but that requires fully implementing both, which seems like some lost efort). |
I ran the benchmarks on my laptop:
This is a WSL2 on a Windows 11, so scheduling might be kind of messy. main
lazy
eager
Some further details for those inspecting the values:
@roji implementing both is quite trivial (in fact, it's little more than a find & replace 😄). I can try to run them against SqlServer, but for benchmarking I believe Sqlite might be more effective (in-process, faster). |
Thanks. Yeah, if it's trivial for you to run, then it would be nice to see how the functional tests fare on all three scenarios (FWIW both SQL Server and SQLite could be interesting, with the former a bit more real-world). Whatever you feel like doing and have time for :) |
I tried running the EFCore.Sqlite.FunctionalTests and EFCore.SqlServer.FunctionalTests multiple times to collect some data.
All measurements are in seconds. As I mentioned before, my environment is probably not ideal for benchmarking and SqlServer is in general harder to measure reliably (see the standard deviation 😅 ).
Note that this "benchmark" is neither representative of the real-world nor of worst-case behavior (for that, my previous benchmarks show a major improvement in comparison of huge expressions). I am not sure what to make of this; I am somewhat wary of introducing such a change without seeing any improvement. |
Thanks for doing this and posting the results @ranma42! Yeah, I'm slightly surprised as well, but I think this mainly shows that we're simply not doing a lot of optimizations requiring deep comparisons. I suspect that as we start to add more stuff, this will become more relevant. Stuff like #12634 and #34507 specifically comes to mind: these would be very broad optimizations that go over lots of nodes in the tree (i.e. are not subject to a very specific pattern-match that happens beforehand, such as the recent NULLIF and related work in #35327, which only does the deep comparison for CASE/WHEN over equality/inequality). I'd say this... Let's keep this in the backlog for now, and as we implement more equality-based optimizations, we should keep an eye on this and rerun the benchmarks to see where we are. You can keep the branches (linking to them from here would be great) - I think we it's likely we'd find ourselves implementing this later... How does that sound? |
I am not sure this will actually be the case; note that
Yes, additional optimizations might make it more relevant, but it is also likely that the smallish expression used in the test suite will never cause a significant improvement (on average) in caching the hash codes.
LGTM 👍 The branches are mentioned above (https://github.com/ranma42/efcore/tree/compute-hash-lazy and https://github.com/ranma42/efcore/tree/compute-hash-eager). |
You may very well be right... Though it may still be better to do caching to limit the worse-case compilation performance, even if it regresses the average case; I'm thinking about those cases where the expression trees are deep (this is indeed rare), and multiple fragments are equal except for one node somewhere in the bottom. In this sort of worse-case scenario (which is probably pretty contrived), lots of deep equality could become problematic... But anyway, I think we both agree that for now, we should continue with optimizations leveraging deep comparison without worrying about caching hash codes - but that we should keep an eye on it and revisit as needed. I'll keep this issue open in the backlog for now. |
In the query pipeline sometimes expressions are compared for (deep) equality.
This currently is based on a recursive visit of the subtree, which can be very costly if done multiple times while visiting the expression (it is quite easy to construct expression trees that require O(n^2) operations when visited).
This could be improved by computing a hash that filters out most of the inequalities as suggested in #34133 (comment)
The text was updated successfully, but these errors were encountered: