Optimize SQL expression comparison by caching hash codes (currently probably not needed) #34149

ranma42 · 2024-07-03T12:43:22Z

In the query pipeline sometimes expressions are compared for (deep) equality.
This currently is based on a recursive visit of the subtree, which can be very costly if done multiple times while visiting the expression (it is quite easy to construct expression trees that require O(n^2) operations when visited).

This could be improved by computing a hash that filters out most of the inequalities as suggested in #34133 (comment)

ranma42 · 2024-07-03T12:44:19Z

Side note: another case that could be interesting is the negation of an expression. Having it "ready to use" would make some predicate-related optimizations faster/cheaper.

roji · 2024-07-03T15:33:11Z

Blocked on making our entire SqlExpression tree immutable (#32927); SelectExpression is currently mutable, and since it can be contained inside most SqlExpressions (thanks @ranma42), we can't cache the hashcode.

roji · 2024-07-04T06:47:17Z

Blocked on making our entire SqlExpression tree immutable (#32927); SelectExpression is currently mutable, and since it can be contained inside most SqlExpressions (#34133 (comment)), we can't cache the hashcode.

Though on second thought, SelectExpression should never be mutable when it's already composed upon...

roji · 2024-08-22T09:19:01Z

@ranma42 just to continue on my comment just above, I think this should be safe to implement even in the current bits - any case in which a mutable SelectExpression is contained within another expression (except ShapedQueryExpression) should be a violation of our current invariants: only the top-level select in the tree may be mutable.

I still absolutely want to make SelectExpression fully immutable (#32927), but that's quite a big task, and I don't think it needs to block this optimization here.

roji · 2024-08-22T09:22:17Z

Note also #19859, which is about not exhaustively calculating hash codes for the entire tree, but just some shallower subset; this would further improve our performance here.

ranma42 · 2024-08-22T09:45:21Z

@ranma42 just to continue on my comment just above, I think this should be safe to implement even in the current bits - any case in which a mutable SelectExpression is contained within another expression (except ShapedQueryExpression) should be a violation of our current invariants: only the top-level select in the tree may be mutable.

I still absolutely want to make SelectExpression fully immutable (#32927), but that's quite a big task, and I don't think it needs to block this optimization here.

If that is the case, I believe I can try and implement this. What would be the best way to evaluate the performance? (would a micro-benchmark make sense?)

roji · 2024-08-22T11:30:42Z

Sure, an ad-hoc BenchmarkDotNet benchmark could work - not sure we need to commit it etc. Though this is one of the cases where the benefits depend on the tree depth/complexity which you choose to benchmark, which is a bit arbitrary... I think it's OK to do this improvement in any case?

ranma42 · 2024-12-22T22:11:33Z

SqlConstantExpressions can contain any Value. Is it fine to assume that they are immutable? (once they are being processed by the EFCore query pipeline)

For example, would it be legitimate to capture enumerable values and "clone" them into immutable lists during the processing? (I would assume that it should be fine for constants, as non-constants would instead be handled as parameters)

roji · 2024-12-22T22:24:06Z

@ranma42 yes, that makes sense, and unless I'm mistaken we already make that assumption in various places (i.e. SqlConstantExpression already calculates the value's hash code - though I'm indeed not sure how/where that's currently used).

In regular compiler-generated incoming expression trees, ConstantExpression is only given for inline literals inside the query, which of course can't be mutated. We indeed generate SqlConstantExpressions later inside the query pipeilne itself, but I think that if we then mutated the value referenced by SqlConstantExpressions, that would be a bug...

ranma42 · 2024-12-23T10:58:03Z

I did a couple of experiments in https://github.com/ranma42/efcore/tree/wip/hash-lazy and https://github.com/ranma42/efcore/tree/wip/hash-eager to evaluate the impact; the results are benchmarks.zip

The CompareEquals benchmark (not in the attachment) compares different expressions (of varying sizes) and shows that both the lazy and the eager approach are very effective in avoiding needless deep comparison, effectively making the check constant-time when expressions are not equal.

In addition to that, I implemented some benchmarks that compare other affected operations:

Hash measures the time spent to GetHashCode on an expression that has been already created and whose hash has already been computed
Create measures the time spent when creating expressions (and the related memory usage)
CreateAndHash measures the time spent when creating and hashing an expression (this is useful to know the time taken to hash in the lazy case, by difference from the Create benchmark)
CompareSameEquals compares two expressions that are structurally equal, but which share no objects

Some notes:

both lazy and eager result in negligible time being spent in already-computed hashes (as expected)
lazy requires additional 8 bytes per object vs 4 bytes in the eager case
eager hashing makes creation about 40-45% slower
lazy hashing is slower than eager hashing: the create+hash scenario requires more than twice the time of create-only; this is probably caused by the need to visit the expression tree twice (in fact, lazy hashing and no cache have matching times for this)
the overhead of the nullable hash seem to cause a significant regression (about 70%) in the time needed to compare two equal structures

Unsurprisingly, each implementation has its own strengths:

no caching is best when expressions are unlikely to be compared/hashed
eager is best when expressions are likely to be compared/hashed (even just once)
lazy is best when expressions are likely to fall into one of the two categories:
- expressions are compared/hashed multiple times against unequal expressions
- expressions are never compared/hashed

roji · 2024-12-24T11:51:33Z

@ranma42 do you mind posting the benchmark results here in the issue (possibly inside <details>)? It's always nice to just see things directly here etc.

Thanks for doing this work... Yeah, everything here makes sense of course. What's your opinion on where we should go?

I don't have a super clear preference - but my leaning here is maybe slightly towards lazy: at least in the current query pipeline, I think most nodes are unlikely to actually be compared/hashed, since we don't compare for equality that often, and even when we do, most cases will fail early as the trees aren't identical. However, as we introduce more optimizations requiring node comparisons, this may shift the balance towards eager calculation.

One possible interesting thing to measure would be to just run the full EF test suite for SQL Server with both systems, and compare the running time (but that requires fully implementing both, which seems like some lost efort).

ranma42 · 2024-12-24T12:46:34Z

I ran the benchmarks on my laptop:

BenchmarkDotNet v0.14.0, Kali GNU/Linux 2024.4
13th Gen Intel Core i7-13850HX, 1 CPU, 28 logical and 14 physical cores
.NET SDK 10.0.100-alpha.1.24573.1
  [Host]     : .NET 10.0.0 (10.0.24.57009), X64 RyuJIT AVX2
  DefaultJob : .NET 10.0.0 (10.0.24.57009), X64 RyuJIT AVX2

This is a WSL2 on a Windows 11, so scheduling might be kind of messy.
In order to reduce the variance, I set the affinity to only use the P cores.

main

Method	Size	Mean	Error	StdDev	Op/s	Gen0	Gen1	Allocated
Hash	1	22.87 ns	0.243 ns	0.228 ns	43,718,312.6	-	-	-
Create	1	41.38 ns	0.404 ns	0.338 ns	24,163,894.1	0.0117	-	184 B
CreateAndHash	1	60.49 ns	0.974 ns	0.863 ns	16,531,922.8	0.0117	-	184 B
CompareSameEquals	1	24.63 ns	0.325 ns	0.304 ns	40,602,219.0	-	-	-
Hash	10	198.37 ns	2.955 ns	2.619 ns	5,041,006.8	-	-	-
Create	10	266.70 ns	4.667 ns	4.138 ns	3,749,472.6	0.0806	-	1264 B
CreateAndHash	10	455.03 ns	8.318 ns	7.780 ns	2,197,660.2	0.0806	-	1264 B
CompareSameEquals	10	207.53 ns	2.867 ns	2.682 ns	4,818,677.6	-	-	-
Hash	100	2,507.36 ns	32.763 ns	30.646 ns	398,826.1	-	-	-
Create	100	2,481.69 ns	35.566 ns	29.699 ns	402,950.5	0.7668	0.0153	12064 B
CreateAndHash	100	4,982.31 ns	55.039 ns	51.483 ns	200,709.9	0.7629	0.0153	12064 B
CompareSameEquals	100	1,391.11 ns	14.889 ns	13.198 ns	718,850.9	-	-	-
Hash	1000	27,762.86 ns	481.691 ns	427.007 ns	36,019.3	-	-	-
Create	1000	24,970.55 ns	367.987 ns	344.215 ns	40,047.2	7.6294	1.4954	120064 B
CreateAndHash	1000	52,963.58 ns	788.848 ns	699.293 ns	18,880.9	7.6294	1.4648	120064 B
CompareSameEquals	1000	15,234.68 ns	226.824 ns	212.171 ns	65,639.7	-	-	-

lazy

Method	Size	Mean	Error	StdDev	Op/s	Gen0	Gen1	Allocated
Hash	1	0.8337 ns	0.0182 ns	0.0152 ns	1,199,415,629.7	-	-	-
Create	1	41.9007 ns	0.6305 ns	0.5898 ns	23,865,930.9	0.0132	-	208 B
CreateAndHash	1	61.3182 ns	0.9792 ns	0.8680 ns	16,308,364.7	0.0132	-	208 B
CompareSameEquals	1	21.9323 ns	0.3565 ns	0.3334 ns	45,594,911.1	-	-	-
Hash	10	0.8320 ns	0.0470 ns	0.0440 ns	1,201,898,864.0	-	-	-
Create	10	273.2060 ns	5.1626 ns	4.5765 ns	3,660,241.8	0.0911	-	1432 B
CreateAndHash	10	460.5862 ns	5.8491 ns	5.1850 ns	2,171,146.1	0.0911	-	1432 B
CompareSameEquals	10	196.0948 ns	3.6676 ns	3.4307 ns	5,099,575.4	-	-	-
Hash	100	0.8618 ns	0.0371 ns	0.0329 ns	1,160,410,233.5	-	-	-
Create	100	2,575.5453 ns	29.0094 ns	27.1354 ns	388,267.3	0.8698	0.0191	13672 B
CreateAndHash	100	5,131.1822 ns	48.4635 ns	40.4692 ns	194,886.9	0.8698	0.0153	13672 B
CompareSameEquals	100	2,433.2913 ns	38.0084 ns	35.5530 ns	410,966.0	-	-	-
Hash	1000	0.5273 ns	0.0316 ns	0.0296 ns	1,896,286,776.3	-	-	-
Create	1000	25,080.6950 ns	306.3839 ns	286.5917 ns	39,871.3	8.6670	1.9531	136072 B
CreateAndHash	1000	54,738.8424 ns	695.1416 ns	650.2359 ns	18,268.6	8.6670	1.8921	136072 B
CompareSameEquals	1000	25,852.6325 ns	396.2357 ns	351.2526 ns	38,680.8	-	-	-

eager

Method	Size	Mean	Error	StdDev	Median	Op/s	Gen0	Gen1	Allocated
Hash	1	0.0013 ns	0.0053 ns	0.0047 ns	0.0000 ns	799,449,543,540.5	-	-	-
Create	1	62.0325 ns	1.2414 ns	1.2748 ns	61.9899 ns	16,120,569.7	0.0126	-	200 B
CreateAndHash	1	55.9104 ns	1.1136 ns	1.0937 ns	55.6630 ns	17,885,749.5	0.0127	-	200 B
CompareSameEquals	1	23.7113 ns	0.3387 ns	0.3168 ns	23.7238 ns	42,174,041.6	-	-	-
Hash	10	0.0045 ns	0.0109 ns	0.0107 ns	0.0000 ns	224,473,417,467.3	-	-	-
Create	10	396.1307 ns	7.8899 ns	7.3802 ns	394.0148 ns	2,524,419.5	0.0858	-	1352 B
CreateAndHash	10	381.1490 ns	4.5019 ns	3.9908 ns	381.5535 ns	2,623,645.8	0.0858	-	1352 B
CompareSameEquals	10	192.6524 ns	2.3442 ns	2.1928 ns	192.6032 ns	5,190,694.5	-	-	-
Hash	100	0.0029 ns	0.0086 ns	0.0081 ns	0.0000 ns	345,063,648,378.4	-	-	-
Create	100	3,605.7217 ns	44.5567 ns	41.6784 ns	3,590.7824 ns	277,337.0	0.8202	0.0191	12872 B
CreateAndHash	100	3,576.9620 ns	35.8328 ns	33.5181 ns	3,573.0936 ns	279,566.8	0.8202	0.0191	12872 B
CompareSameEquals	100	1,504.7692 ns	22.5100 ns	19.9545 ns	1,504.5868 ns	664,553.7	-	-	-
Hash	1000	0.0106 ns	0.0180 ns	0.0168 ns	0.0000 ns	94,335,649,749.4	-	-	-
Create	1000	36,023.6415 ns	581.7447 ns	544.1643 ns	36,260.6855 ns	27,759.5	8.1177	1.6479	128072 B
CreateAndHash	1000	34,980.4683 ns	484.8567 ns	453.5353 ns	34,912.6100 ns	28,587.4	8.1177	1.6479	128072 B
CompareSameEquals	1000	14,012.0293 ns	114.7950 ns	101.7627 ns	13,998.1726 ns	71,367.3	-	-	-

Some further details for those inspecting the values:

the expression "trees" the benchmark builds are basically lists with the structure (true && (true && (true && ...)))
N = 1 is (true && true)
N = 2 is (true && (true && true))
in general, the expression is made of N operands and N+1 constants (keep that into account to understand my 4/8 bytes per object)

@roji implementing both is quite trivial (in fact, it's little more than a find & replace 😄). I can try to run them against SqlServer, but for benchmarking I believe Sqlite might be more effective (in-process, faster).

roji · 2024-12-24T13:04:34Z

Thanks. Yeah, if it's trivial for you to run, then it would be nice to see how the functional tests fare on all three scenarios (FWIW both SQL Server and SQLite could be interesting, with the former a bit more real-world). Whatever you feel like doing and have time for :)

ranma42 · 2024-12-26T20:28:40Z

I tried running the EFCore.Sqlite.FunctionalTests and EFCore.SqlServer.FunctionalTests multiple times to collect some data.

	Sqlite			Sqlserver
	Main	Lazy	Eager	Main	Lazy	Eager
	228.87	258.05	265.12	637.81	737.05	778.36
	234.41	270.49	267.56	688.09	774.12	783.14
	234.71	274.75	268.68	770.71	785.96	788.48
	249.32	275.30	270.73	773.66	822.89	816.26
	260.91	277.97	274.67	783.01	861.23	831.61
	272.13	287.54	275.55	835.76	891.60	906.10
	284.14	289.38	276.52	932.33	916.53	910.58
	292.41	292.49	278.70	1012.65	956.55	968.55

median	255.12	276.64	272.70	778.34	842.06	823.94
average	257.11	278.25	272.19	804.25	843.24	847.89
stddev	24.18	11.35	4.85	122.20	76.18	71.41

All measurements are in seconds.
The "lazy" column contains measurement from https://github.com/ranma42/efcore/tree/compute-hash-lazy
The "eager" column contains measurement from https://github.com/ranma42/efcore/tree/compute-hash-eager
The "main" column contains measurement from their merge base, i.e. ranma42@c4d6aa0 (it is not the real main branch to get a couple of fixes to the SqlServer baselines, to align a couple of Equals implementation to the usual one and to fix a broken GetHashCode)

As I mentioned before, my environment is probably not ideal for benchmarking and SqlServer is in general harder to measure reliably (see the standard deviation 😅 ).
Given these results I would say that:

no caching is actually quite cheap (the main branch is outperforming both eager and lazy caching)
eager and lazy are barely distinguishable (comparing median or average)
if I had to choose between any of the 3 implementations, the current behavior is probably good enough or even the best for the test suite (with the current set of optimizations)

Note that this "benchmark" is neither representative of the real-world nor of worst-case behavior (for that, my previous benchmarks show a major improvement in comparison of huge expressions).

I am not sure what to make of this; I am somewhat wary of introducing such a change without seeing any improvement.
Maybe an acceptable outcome of this is the knowledge that such a change is reasonably easy to perform when it is deemed convenient?
OTOH I when implementing query optimizations, in some cases Equals was frowned upon (and IIRC in a couple of cases I implemented incomplete optimizations in order to avoid a deep comparison), so I wonder what the best tradeoff is here 🤔

roji · 2024-12-28T14:17:34Z

Thanks for doing this and posting the results @ranma42! Yeah, I'm slightly surprised as well, but I think this mainly shows that we're simply not doing a lot of optimizations requiring deep comparisons. I suspect that as we start to add more stuff, this will become more relevant. Stuff like #12634 and #34507 specifically comes to mind: these would be very broad optimizations that go over lots of nodes in the tree (i.e. are not subject to a very specific pattern-match that happens beforehand, such as the recent NULLIF and related work in #35327, which only does the deep comparison for CASE/WHEN over equality/inequality).

I'd say this... Let's keep this in the backlog for now, and as we implement more equality-based optimizations, we should keep an eye on this and rerun the benchmarks to see where we are. You can keep the branches (linking to them from here would be great) - I think we it's likely we'd find ourselves implementing this later...

How does that sound?

ranma42 · 2024-12-29T22:49:56Z

Thanks for doing this and posting the results @ranma42! Yeah, I'm slightly surprised as well, but I think this mainly shows that we're simply not doing a lot of optimizations requiring deep comparisons. I suspect that as we start to add more stuff, this will become more relevant.

I am not sure this will actually be the case; note that

as soon as the recursive comparison finds a difference, it immediately stops
most expression trees in EFCore are smallish (at least in the test suite; it is possible that it is not representative of actual use as it is mostly aimed at exercising different optimizations/cases/...)

Stuff like #12634 and #34507 specifically comes to mind: these would be very broad optimizations that go over lots of nodes in the tree (i.e. are not subject to a very specific pattern-match that happens beforehand, such as the recent NULLIF and related work in #35327, which only does the deep comparison for CASE/WHEN over equality/inequality).

Yes, additional optimizations might make it more relevant, but it is also likely that the smallish expression used in the test suite will never cause a significant improvement (on average) in caching the hash codes.

I'd say this... Let's keep this in the backlog for now, and as we implement more equality-based optimizations, we should keep an eye on this and rerun the benchmarks to see where we are. You can keep the branches (linking to them from here would be great) - I think we it's likely we'd find ourselves implementing this later...

How does that sound?

LGTM 👍 The branches are mentioned above (https://github.com/ranma42/efcore/tree/compute-hash-lazy and https://github.com/ranma42/efcore/tree/compute-hash-eager).
I will try to periodically rebase them and run them (especially if we add a considerable amount of optimizations that would benefit from them).
On the other side, I will worry less about optimizations that risk being slow because we don't have this ;)

roji · 2024-12-30T15:29:10Z

I am not sure this will actually be the case

You may very well be right... Though it may still be better to do caching to limit the worse-case compilation performance, even if it regresses the average case; I'm thinking about those cases where the expression trees are deep (this is indeed rare), and multiple fragments are equal except for one node somewhere in the bottom. In this sort of worse-case scenario (which is probably pretty contrived), lots of deep equality could become problematic...

But anyway, I think we both agree that for now, we should continue with optimizations leveraging deep comparison without worrying about caching hash codes - but that we should keep an eye on it and revisit as needed. I'll keep this issue open in the backlog for now.

ranma42 added the customer-reported label Jul 3, 2024

ranma42 mentioned this issue Jul 3, 2024

Simplify AND and OR #34133

Merged

roji added area-perf area-query labels Jul 3, 2024

roji added this to the Backlog milestone Jul 3, 2024

roji added the blocked label Jul 3, 2024

roji mentioned this issue Jul 4, 2024

Consider introducing nullability on SqlExpression #33889

Open

roji removed the blocked label Jul 4, 2024

roji changed the title ~~Comparing expressions can be slow~~ Optimize SQL expression comparison by caching hash codes Aug 22, 2024

This was referenced Aug 22, 2024

Query: Support for BETWEEN ... AND ... #12634

Open

Transform multiple ORs into a single SQL IN #34507

Open

roji mentioned this issue Aug 22, 2024

Compilation perf: consider caching hash codes for immutable query expression node types #19860

Closed

roji mentioned this issue Aug 29, 2024

Apply early SqlExpression optimizations (e.g. x AND true -> x) in VisitChildren #34556

Open

ranma42 mentioned this issue Dec 15, 2024

Translate to NULLIF #35327

Open

ranma42 mentioned this issue Dec 22, 2024

Inconsistent Equals/GetHashCode methods in SqlConstantExpression #35372

Open

roji changed the title ~~Optimize SQL expression comparison by caching hash codes~~ Optimize SQL expression comparison by caching hash codes (currently probably not needed) Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize SQL expression comparison by caching hash codes (currently probably not needed) #34149

Optimize SQL expression comparison by caching hash codes (currently probably not needed) #34149

ranma42 commented Jul 3, 2024

ranma42 commented Jul 3, 2024

roji commented Jul 3, 2024 •

edited

Loading

roji commented Jul 4, 2024

roji commented Aug 22, 2024

roji commented Aug 22, 2024

ranma42 commented Aug 22, 2024

roji commented Aug 22, 2024

ranma42 commented Dec 22, 2024

roji commented Dec 22, 2024 •

edited

Loading

ranma42 commented Dec 23, 2024

roji commented Dec 24, 2024

ranma42 commented Dec 24, 2024

roji commented Dec 24, 2024

ranma42 commented Dec 26, 2024 •

edited

Loading

roji commented Dec 28, 2024

ranma42 commented Dec 29, 2024

roji commented Dec 30, 2024

Optimize SQL expression comparison by caching hash codes (currently probably not needed) #34149

Optimize SQL expression comparison by caching hash codes (currently probably not needed) #34149

Comments

ranma42 commented Jul 3, 2024

ranma42 commented Jul 3, 2024

roji commented Jul 3, 2024 • edited Loading

roji commented Jul 4, 2024

roji commented Aug 22, 2024

roji commented Aug 22, 2024

ranma42 commented Aug 22, 2024

roji commented Aug 22, 2024

ranma42 commented Dec 22, 2024

roji commented Dec 22, 2024 • edited Loading

ranma42 commented Dec 23, 2024

roji commented Dec 24, 2024

ranma42 commented Dec 24, 2024

roji commented Dec 24, 2024

ranma42 commented Dec 26, 2024 • edited Loading

roji commented Dec 28, 2024

ranma42 commented Dec 29, 2024

roji commented Dec 30, 2024

roji commented Jul 3, 2024 •

edited

Loading

roji commented Dec 22, 2024 •

edited

Loading

ranma42 commented Dec 26, 2024 •

edited

Loading