Hash aggregate finalization parallelization #4655

benjaminwinger · 2024-12-18T23:27:11Z

Fixes kuzudb/internal#10 and #4547.

I switched back to having the partitioning hash table use linear probing, insert entries until a fixed capacity, and then empty and flush to the global partitions. The aggregation update code only works on the data within the vectors being inserted and requires that they are all available in the hash table at once, so kicking elements out of the hash table would require some complicated changes. I increased the default size of the hash table significantly to compensate as the performance was poor with hash tables of the default size.

It took a while to hunt down the bug in BaseAggregateScan::writeAggregateResultToVector. I've added an extra test to try and get slightly better coverage, but I think we really need some larger tests (the issue was caused by Vector re-use which previously we only would have encountered with results that have greater than DEFAULT_VECTOR_CAPACITY values, some of which must be null).

The performance isn't scaling well with the number of partitions at the moment. On workflows which don't need very many threads it's much faster if built with fewer partitions (e.g. on the query in #4547 I found that the total runtime was about 2x faster (3x for just the the HashAggregate code) with 16 partitions on a machine with 12 threads compared to 256 partitions (that query/dataset also seems to just be getting a maximum of ~14 worker threads, presumably due to the way that the input is being divided up). I've reduced the number of partitions to 128 as a compromise given that's the number of threads on our largest testing machine, but will work on improving that next (it should be possible to do the partitioning logically without physically partitioning the data to improve cache locality).

I've increased the default buffer pool size since anything involving aggregation was requiring a bunch of extra memory for constructing the PartitioningAggregateHashTables, but that should be able to be reverted with the optimization mentioned above as it would also reduce the minimum memory requirements.

Performance

On the query from #4547 (msmarco v2.1 1st segment with the query MATCH (b:doc) WITH tokenize(b.segment) AS tk, OFFSET(ID(b)) AS id UNWIND tk AS t RETURN STEM(t, 'porter'), id, count(*);.
Run on a 128 thread machine (But see earlier note about not scaling past 14 threads).

Before: 36GB peak memory usage; runtime 83.5s
After: 21GB peak memory usage; runtime 63.5s
Limited to 14 partitions: 24GB peak memory usage; runtime 47.8s
(as an example of the performance improvements we might expect to achieve with better cache locality)

Note that memory usage has improved significantly since before the per-thread AggregateHashTables would be filled first, and exist in-memory simultaneously with the final merged global AggregateHashTable. Now the per-thread tables get merged into the global ones in small chunks.

I'd run some benchmarks using queries from ClickBench and seen an improvement of about 2-5x (I'll update with more details later).

codecov · 2024-12-19T00:19:08Z

Codecov Report

Attention: Patch coverage is 94.02985% with 20 lines in your changes missing coverage. Please review.

Project coverage is 86.35%. Comparing base (b70eabf) to head (31d9509).
Report is 13 commits behind head on master.

Files with missing lines	Patch %	Lines
...cessor/operator/aggregate/aggregate_hash_table.cpp	94.00%	9 Missing ⚠️
...rc/processor/operator/aggregate/hash_aggregate.cpp	92.77%	6 Missing ⚠️
src/include/processor/result/factorized_table.h	66.66%	2 Missing ⚠️
src/processor/result/base_hash_table.cpp	86.66%	2 Missing ⚠️
...rocessor/operator/aggregate/aggregate_hash_table.h	94.44%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4655      +/-   ##
==========================================
+ Coverage   86.29%   86.35%   +0.05%     
==========================================
  Files        1396     1397       +1     
  Lines       59848    60082     +234     
  Branches     7372     7387      +15     
==========================================
+ Hits        51645    51881     +236     
+ Misses       8036     8035       -1     
+ Partials      167      166       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

benjaminwinger · 2024-12-19T21:15:34Z

Benchmarks adapted from https://github.com/ClickHouse/ClickBench/, run on a 128 thread runner (2x AMD EPYC 7551)

Query	Baseline	With Changes	DuckDB as a third-party point of reference (equivalent SQL query)
`MATCH (h:hits) WHERE h.SearchPhrase <> '' RETURN h.SearchPhrase, COUNT(*) AS c ORDER BY c DESC LIMIT 10;`	5.1s/4.7s	2.3s/0.64s	0.91s/0.30s
`MATCH (h:hits) WHERE h.SearchPhrase <> '' RETURN h.SearchEngineID, h.SearchPhrase, COUNT(*) AS c ORDER BY c DESC LIMIT 10;`	6.8s/4.4s	2.7s/0.65s	0.90s/0.30s
`MATCH (h:hits) RETURN h.UserID, COUNT() ORDER BY COUNT() DESC LIMIT 10;`	5.7s/5.2s	2.0s/1.0s	0.63s/0.25s
`MATCH (h:hits) RETURN h.UserID, h.SearchPhrase, COUNT() ORDER BY COUNT() DESC LIMIT 10;`	14.4s/9.5s	3.8s/1.8s	1.5s/0.5s

Results are Cold/Hot, where hot is on subsequent queries in the same process. OS VM caches are dropped after each set of queries.

ray6080

Thanks Ben! I have some comments which we've discussed, and you can collapse any of them that you're already working on locally.

src/processor/operator/aggregate/hash_aggregate.cpp

ray6080 · 2024-12-28T07:59:05Z

src/processor/operator/aggregate/aggregate_hash_table.cpp

+    computeVectorHashes(flatKeyVectors, unFlatKeyVectors);
+
+    auto startingNumTuples = getNumEntries();
+    if (startingNumTuples + numFlatTuples > maxNumHashSlots ||


I would extract the if check as a separate function, e.g., requireResize() and refactor it altogether with resizeHashTableIfNecessary.

Actually I have a question here on why we choose to check numTuples here? Ideally we should only check num distinct groups which should happen inside findHashSlots.

findHashSlots currently also inserts into the table, which makes things a little tricky, but I think that could be changed.

But the bigger issue is that while we could do it after figuring out the number of distinct groups, we'd have to re-run findHashSlots if we empty the hash table (even ignoring handling duplicates, the position they are inserted into may not be the same as the position they initially hashed to with linear probing, so if the original position is cleared it would break future lookups of that key).

On the other hand, with the current fixed capacity of 16384 entries (256KB with 16B entries) and always inserting <=2048 tuples at a time, it should always be the load factor check which is causing it to be emptied, so what we could do is assert that we have enough space to hold all of them, and then resize afterwards if we've exceeded the load factor after the insertions. But really all that does is make it resize once we've exceeded the load factor, instead of never exceeding the load factor, and I don't don't really know which would be preferable.
Given we insert up to 2048 at a time into a table holding up to 16384 entries, I think it would mean a load factor of 0.66-0.79 instead of a load factor of 0.54-0.66, noting that in the code we're defining the load factor as 1.5, which I'm fairly sure is inverted and really should be 0.66.

src/processor/operator/aggregate/aggregate_hash_table.cpp

src/include/processor/operator/aggregate/aggregate_hash_table.h

src/processor/operator/aggregate/hash_aggregate.cpp

ray6080 · 2025-01-03T09:29:01Z

src/include/processor/operator/aggregate/aggregate_hash_table.h

+        const std::vector<common::LogicalType>& distinctAggKeyTypes,
+        FactorizedTableSchema tableSchema)
+        : AggregateHashTable(memoryManager, std::move(keyTypes), std::move(payloadTypes),
+              aggregateFunctions, distinctAggKeyTypes, NUM_PARTITIONS * 1024, tableSchema.copy()),


I think we should follow a heuristic rule, which is to keep thread local HT small to fit in cache. Thus, there shouldn't be a fixed HT capacity, instead, we should dynamically calculate it based on aggregation keys and payloads, and cache size of the machine. (For cache size of the machine, one option is to access it through a library, such as libcpuid, the other option is to have a conservativ constant. The constant probably should be fine in many cases.)
For aggregation with lots of keys and payloads, so row_width is large, we may end up with a very small capacity by calculation. To avoid this, we should have a lower bound such as 2048.

I should probably try and find a good benchmark for comparing performance with different row widths, but for now I've got it set to a minimum capacity of 2048 (or whatever fits in one 256KB block, if larger).

src/include/processor/operator/aggregate/aggregate_hash_table.h

src/processor/operator/aggregate/hash_aggregate.cpp

ray6080 · 2025-01-03T09:41:06Z

src/processor/operator/aggregate/hash_aggregate.cpp

+        std::this_thread::sleep_for(std::chrono::microseconds(500));
+        sharedState->tryMergeQueue();
+    }
+    sharedState->finalizeAggregateHashTable();


Can you also think a bit on merging aggregate scan into aggregate? Ideally, we don't need to wait until the next pipeline to start scanning out from aggregate result.

Working on it, though I wonder if this would be better as a separate PR so that the rest of this can be merged first.

benjaminwinger · 2025-01-06T15:21:22Z

Updated performance

For the query on the msmarco dataset in the PR description the runtime is now 44 seconds with a peak memory usage of 21GB.

Clickhouse benchmarks have more modest improvements (compare with #4655 (comment)), and I encountered a segfault that I'm going to look into.

Benchmarks adapted from https://github.com/ClickHouse/ClickBench/, run on a 128 thread runner (2x AMD EPYC 7551)

Query	Baseline	With Changes	DuckDB as a third-party point of reference (equivalent SQL query)
`MATCH (h:hits) WHERE h.SearchPhrase <> '' RETURN h.SearchPhrase, COUNT(*) AS c ORDER BY c DESC LIMIT 10;`	5.1s/4.7s	segfault	0.91s/0.30s
`MATCH (h:hits) WHERE h.SearchPhrase <> '' RETURN h.SearchEngineID, h.SearchPhrase, COUNT(*) AS c ORDER BY c DESC LIMIT 10;`	6.8s/4.4s	2.5s/0.56s	0.90s/0.30s
`MATCH (h:hits) RETURN h.UserID, COUNT() ORDER BY COUNT() DESC LIMIT 10;`	5.7s/5.2s	1.7s/1.0s	0.63s/0.25s
`MATCH (h:hits) RETURN h.UserID, h.SearchPhrase, COUNT() ORDER BY COUNT() DESC LIMIT 10;`	14.4s/9.5s	3.6s/1.5s	1.5s/0.5s

Results are Cold/Hot, where hot is on subsequent queries in the same process. OS VM caches are dropped after each set of queries.

ray6080 · 2025-01-06T15:31:27Z

Updated performance

For the query on the msmarco dataset in the PR description the runtime is now 44 seconds with a peak memory usage of 21GB.

Clickhouse benchmarks have more modest improvements (compare with #4655 (comment)), and I encountered a segfault that I'm going to look into.

Benchmarks adapted from https://github.com/ClickHouse/ClickBench/, run on a 128 thread runner (2x AMD EPYC 7551)

Query Baseline With Changes DuckDB as a third-party point of reference (equivalent SQL query)
MATCH (h:hits) WHERE h.SearchPhrase <> '' RETURN h.SearchPhrase, COUNT(*) AS c ORDER BY c DESC LIMIT 10; 5.1s/4.7s segfault 0.91s/0.30s
MATCH (h:hits) WHERE h.SearchPhrase <> '' RETURN h.SearchEngineID, h.SearchPhrase, COUNT(*) AS c ORDER BY c DESC LIMIT 10; 6.8s/4.4s 2.5s/0.56s 0.90s/0.30s
MATCH (h:hits) RETURN h.UserID, COUNT(*) ORDER BY COUNT(*) DESC LIMIT 10; 5.7s/5.2s 1.7s/1.0s 0.63s/0.25s
MATCH (h:hits) RETURN h.UserID, h.SearchPhrase, COUNT(*) ORDER BY COUNT(*) DESC LIMIT 10; 14.4s/9.5s 3.6s/1.5s 1.5s/0.5s
Results are Cold/Hot, where hot is on subsequent queries in the same process. OS VM caches are dropped after each set of queries.

I wonder if we should also benchmark a query that has a wide row in the aggregation table (more group keys and payloads)?

github-actions · 2025-01-09T23:00:18Z

Benchmark Result

Master commit hash: d5acb796e384d3de68e7be15f6ba5c0ef6bcdc64
Branch commit hash: fc38b22826c4ceacea0156fb71bbf98cdd752f1e

Query Group	Query Name	Mean Time - Commit (ms)	Mean Time - Master (ms)	Diff
aggregation	q24	704.39	655.32	49.07 (7.49%)
aggregation	q28	6164.55	11494.18	-5329.63 (-46.37%)
copy	node-Comment	71224.94	N/A	N/A
copy	node-Forum	5549.13	N/A	N/A
copy	node-Organisation	1241.12	N/A	N/A
copy	node-Person	2261.37	N/A	N/A
copy	node-Place	1183.83	N/A	N/A
copy	node-Post	29960.62	N/A	N/A
copy	node-Tag	1225.78	N/A	N/A
copy	node-Tagclass	1163.81	N/A	N/A
copy	rel-comment-hasCreator	56146.07	N/A	N/A
copy	rel-comment-hasTag	88514.81	N/A	N/A
copy	rel-comment-isLocatedIn	70381.95	N/A	N/A
copy	rel-containerOf	15167.14	N/A	N/A
copy	rel-forum-hasTag	4046.54	N/A	N/A
copy	rel-hasInterest	3146.86	N/A	N/A
copy	rel-hasMember	123782.98	N/A	N/A
copy	rel-hasModerator	1267.49	N/A	N/A
copy	rel-hasType	249.50	N/A	N/A
copy	rel-isPartOf	226.88	N/A	N/A
copy	rel-isSubclassOf	255.10	N/A	N/A
copy	rel-knows	13512.51	N/A	N/A
copy	rel-likes-comment	177292.34	N/A	N/A
copy	rel-likes-post	68811.59	N/A	N/A
copy	rel-organisation-isLocatedIn	272.53	N/A	N/A
copy	rel-person-isLocatedIn	461.85	N/A	N/A
copy	rel-post-hasCreator	14756.73	N/A	N/A
copy	rel-post-hasTag	23521.93	N/A	N/A
copy	rel-post-isLocatedIn	18610.71	N/A	N/A
copy	rel-replyOf-comment	48402.45	N/A	N/A
copy	rel-replyOf-post	37341.62	N/A	N/A
copy	rel-studyAt	847.19	N/A	N/A
copy	rel-workAt	1610.83	N/A	N/A
filter	q14	134.58	138.10	-3.52 (-2.55%)
filter	q15	132.67	139.28	-6.61 (-4.74%)
filter	q16	308.54	317.88	-9.34 (-2.94%)
filter	q17	455.85	454.64	1.20 (0.26%)
filter	q18	1988.97	1954.54	34.43 (1.76%)
filter	zonemap-node	97.37	99.50	-2.13 (-2.14%)
filter	zonemap-node-lhs-cast	97.81	97.78	0.03 (0.03%)
filter	zonemap-node-null	93.73	94.18	-0.45 (-0.48%)
filter	zonemap-rel	5733.97	5740.84	-6.87 (-0.12%)
fixed_size_expr_evaluator	q07	581.96	564.24	17.71 (3.14%)
fixed_size_expr_evaluator	q08	812.93	810.79	2.14 (0.26%)
fixed_size_expr_evaluator	q09	810.38	809.95	0.44 (0.05%)
fixed_size_expr_evaluator	q10	246.47	244.92	1.55 (0.63%)
fixed_size_expr_evaluator	q11	240.40	239.03	1.37 (0.57%)
fixed_size_expr_evaluator	q12	236.51	234.93	1.57 (0.67%)
fixed_size_expr_evaluator	q13	1475.30	1469.46	5.85 (0.40%)
fixed_size_seq_scan	q23	117.59	116.99	0.60 (0.51%)
join	q29	606.01	599.57	6.44 (1.07%)
join	q30	10007.99	11008.46	-1000.47 (-9.09%)
join	q31	8.07	5.94	2.13 (35.81%)
join	SelectiveTwoHopJoin	52.18	55.58	-3.40 (-6.12%)
ldbc_snb_ic	q35	2542.93	2676.02	-133.08 (-4.97%)
ldbc_snb_ic	q36	482.29	473.46	8.84 (1.87%)
ldbc_snb_is	q32	4.10	5.63	-1.53 (-27.21%)
ldbc_snb_is	q33	12.32	13.35	-1.03 (-7.72%)
ldbc_snb_is	q34	1.24	1.20	0.04 (3.61%)
multi-rel	multi-rel-large-scan	1408.77	1300.67	108.10 (8.31%)
multi-rel	multi-rel-lookup	15.62	34.51	-18.89 (-54.73%)
multi-rel	multi-rel-small-scan	102.34	83.97	18.37 (21.88%)
order_by	q25	140.14	144.88	-4.73 (-3.27%)
order_by	q26	457.98	482.19	-24.21 (-5.02%)
order_by	q27	1470.85	1492.70	-21.85 (-1.46%)
recursive_join	recursive-join-bidirection	306.76	283.60	23.16 (8.17%)
recursive_join	recursive-join-dense	7313.79	5441.75	1872.05 (34.40%)
recursive_join	recursive-join-path	24405.65	23804.62	601.03 (2.52%)
recursive_join	recursive-join-sparse	1060.08	1064.18	-4.11 (-0.39%)
recursive_join	recursive-join-trail	7283.97	6066.14	1217.83 (20.08%)
scan_after_filter	q01	181.99	178.64	3.36 (1.88%)
scan_after_filter	q02	166.21	164.87	1.35 (0.82%)
shortest_path_ldbc100	q37	90.52	101.96	-11.44 (-11.22%)
shortest_path_ldbc100	q38	348.49	339.09	9.40 (2.77%)
shortest_path_ldbc100	q39	64.42	66.28	-1.86 (-2.80%)
shortest_path_ldbc100	q40	380.64	355.23	25.41 (7.15%)
var_size_expr_evaluator	q03	2061.24	2056.20	5.04 (0.25%)
var_size_expr_evaluator	q04	2265.40	2237.42	27.98 (1.25%)
var_size_expr_evaluator	q05	2575.03	2626.86	-51.83 (-1.97%)
var_size_expr_evaluator	q06	1327.12	1354.91	-27.80 (-2.05%)
var_size_seq_scan	q19	1456.20	1468.18	-11.99 (-0.82%)
var_size_seq_scan	q20	2655.80	2604.30	51.50 (1.98%)
var_size_seq_scan	q21	2278.85	2309.84	-30.99 (-1.34%)
var_size_seq_scan	q22	129.34	132.66	-3.32 (-2.50%)

github-actions · 2025-01-17T19:36:22Z

Benchmark Result

Master commit hash: bb6d3ec271ffb8e3ca8f7c59f3a873eae2385d77
Branch commit hash: fc3699b405047dd418940c111b24779f20e1c51b

Query Group	Query Name	Mean Time - Commit (ms)	Mean Time - Master (ms)	Diff
aggregation	q24	691.24	648.55	42.68 (6.58%)
aggregation	q28	6104.79	12080.39	-5975.61 (-49.47%)
filter	q14	127.16	137.75	-10.58 (-7.68%)
filter	q15	124.83	135.63	-10.79 (-7.96%)
filter	q16	302.32	317.25	-14.93 (-4.71%)
filter	q17	445.46	461.54	-16.08 (-3.48%)
filter	q18	1884.62	1976.29	-91.67 (-4.64%)
filter	zonemap-node	89.05	99.26	-10.21 (-10.29%)
filter	zonemap-node-lhs-cast	89.30	100.23	-10.93 (-10.90%)
filter	zonemap-node-null	85.14	96.17	-11.03 (-11.47%)
filter	zonemap-rel	5863.11	5826.81	36.29 (0.62%)
fixed_size_expr_evaluator	q07	590.02	580.11	9.91 (1.71%)
fixed_size_expr_evaluator	q08	807.76	809.10	-1.34 (-0.17%)
fixed_size_expr_evaluator	q09	828.11	809.33	18.78 (2.32%)
fixed_size_expr_evaluator	q10	248.32	245.14	3.18 (1.30%)
fixed_size_expr_evaluator	q11	243.15	237.50	5.65 (2.38%)
fixed_size_expr_evaluator	q12	238.92	234.88	4.04 (1.72%)
fixed_size_expr_evaluator	q13	1477.59	1465.19	12.40 (0.85%)
fixed_size_seq_scan	q23	118.74	117.53	1.21 (1.03%)
join	q29	611.38	591.63	19.75 (3.34%)
join	q30	10103.91	9880.31	223.60 (2.26%)
join	q31	7.11	6.95	0.16 (2.31%)
join	SelectiveTwoHopJoin	53.23	58.37	-5.14 (-8.80%)
ldbc_snb_ic	q35	2635.91	2606.92	28.99 (1.11%)
ldbc_snb_ic	q36	478.95	463.38	15.57 (3.36%)
ldbc_snb_is	q32	7.07	6.48	0.59 (9.10%)
ldbc_snb_is	q33	16.15	15.61	0.54 (3.46%)
ldbc_snb_is	q34	1.38	1.33	0.05 (3.73%)
multi-rel	multi-rel-large-scan	1591.61	1604.33	-12.72 (-0.79%)
multi-rel	multi-rel-lookup	59.91	43.77	16.14 (36.86%)
multi-rel	multi-rel-small-scan	1460.83	1449.06	11.77 (0.81%)
order_by	q25	132.99	136.69	-3.70 (-2.71%)
order_by	q26	449.16	465.68	-16.52 (-3.55%)
order_by	q27	1445.21	1479.21	-34.00 (-2.30%)
recursive_join	recursive-join-bidirection	293.78	312.94	-19.16 (-6.12%)
recursive_join	recursive-join-dense	7456.15	7395.65	60.50 (0.82%)
recursive_join	recursive-join-path	23746.27	23451.36	294.90 (1.26%)
recursive_join	recursive-join-sparse	1066.02	1066.37	-0.35 (-0.03%)
recursive_join	recursive-join-trail	7399.47	7348.22	51.25 (0.70%)
scan_after_filter	q01	174.20	178.91	-4.71 (-2.63%)
scan_after_filter	q02	159.95	167.34	-7.39 (-4.42%)
shortest_path_ldbc100	q37	89.33	89.08	0.24 (0.27%)
shortest_path_ldbc100	q38	377.29	355.57	21.72 (6.11%)
shortest_path_ldbc100	q39	63.36	63.84	-0.48 (-0.75%)
shortest_path_ldbc100	q40	478.18	371.90	106.28 (28.58%)
var_size_expr_evaluator	q03	2076.22	2087.31	-11.09 (-0.53%)
var_size_expr_evaluator	q04	2189.65	2269.74	-80.09 (-3.53%)
var_size_expr_evaluator	q05	2590.44	5299.71	-2709.26 (-51.12%)
var_size_expr_evaluator	q06	1316.76	1332.67	-15.92 (-1.19%)
var_size_seq_scan	q19	1449.58	1455.58	-6.01 (-0.41%)
var_size_seq_scan	q20	2778.13	2669.81	108.32 (4.06%)
var_size_seq_scan	q21	2342.94	2310.84	32.10 (1.39%)
var_size_seq_scan	q22	127.73	128.30	-0.57 (-0.44%)

github-actions · 2025-01-20T21:04:00Z

Benchmark Result

Master commit hash: b70eabf68b13f248c788cde379177601b012bb0d
Branch commit hash: 80c3de18d829bde573678142711164f215d1f1b7

Query Group	Query Name	Mean Time - Commit (ms)	Mean Time - Master (ms)	Diff
aggregation	q24	682.94	621.57	61.37 (9.87%)
aggregation	q28	6122.99	12092.78	-5969.79 (-49.37%)
filter	q14	119.61	117.16	2.46 (2.10%)
filter	q15	120.81	124.18	-3.37 (-2.72%)
filter	q16	298.56	296.40	2.16 (0.73%)
filter	q17	443.79	443.24	0.56 (0.13%)
filter	q18	1925.30	1917.28	8.02 (0.42%)
filter	zonemap-node	82.28	80.66	1.62 (2.00%)
filter	zonemap-node-lhs-cast	82.96	83.68	-0.72 (-0.86%)
filter	zonemap-node-null	78.81	80.09	-1.28 (-1.59%)
filter	zonemap-rel	5854.58	5729.78	124.79 (2.18%)
fixed_size_expr_evaluator	q07	564.72	574.77	-10.05 (-1.75%)
fixed_size_expr_evaluator	q08	797.52	822.01	-24.49 (-2.98%)
fixed_size_expr_evaluator	q09	798.31	801.70	-3.40 (-0.42%)
fixed_size_expr_evaluator	q10	231.22	236.43	-5.21 (-2.20%)
fixed_size_expr_evaluator	q11	224.02	229.36	-5.35 (-2.33%)
fixed_size_expr_evaluator	q12	220.24	226.17	-5.93 (-2.62%)
fixed_size_expr_evaluator	q13	1462.18	1467.46	-5.29 (-0.36%)
fixed_size_seq_scan	q23	102.53	111.19	-8.66 (-7.79%)
join	q29	652.34	615.74	36.60 (5.94%)
join	q30	11851.00	10053.85	1797.15 (17.88%)
join	q31	4.47	5.57	-1.10 (-19.78%)
join	SelectiveTwoHopJoin	51.88	53.70	-1.82 (-3.39%)
ldbc_snb_ic	q35	2650.33	2592.62	57.71 (2.23%)
ldbc_snb_ic	q36	462.10	477.79	-15.69 (-3.28%)
ldbc_snb_is	q32	3.92	5.47	-1.56 (-28.44%)
ldbc_snb_is	q33	14.97	10.81	4.15 (38.43%)
ldbc_snb_is	q34	1.41	1.37	0.05 (3.34%)
multi-rel	multi-rel-large-scan	1383.76	1408.12	-24.35 (-1.73%)
multi-rel	multi-rel-lookup	10.43	44.04	-33.61 (-76.32%)
multi-rel	multi-rel-small-scan	94.22	70.10	24.12 (34.40%)
order_by	q25	132.69	127.80	4.89 (3.83%)
order_by	q26	459.48	451.16	8.32 (1.84%)
order_by	q27	1468.18	1486.94	-18.77 (-1.26%)
recursive_join	recursive-join-bidirection	304.01	268.91	35.10 (13.05%)
recursive_join	recursive-join-dense	7360.88	7334.43	26.45 (0.36%)
recursive_join	recursive-join-path	23363.83	23605.45	-241.62 (-1.02%)
recursive_join	recursive-join-sparse	1059.43	1056.81	2.62 (0.25%)
recursive_join	recursive-join-trail	7353.38	7374.37	-21.00 (-0.28%)
scan_after_filter	q01	170.03	165.88	4.15 (2.50%)
scan_after_filter	q02	154.31	151.58	2.73 (1.80%)
shortest_path_ldbc100	q37	90.88	92.98	-2.10 (-2.26%)
shortest_path_ldbc100	q38	365.10	344.89	20.21 (5.86%)
shortest_path_ldbc100	q39	63.59	65.57	-1.98 (-3.02%)
shortest_path_ldbc100	q40	405.74	457.16	-51.42 (-11.25%)
var_size_expr_evaluator	q03	2062.82	2086.24	-23.42 (-1.12%)
var_size_expr_evaluator	q04	2221.76	2226.05	-4.29 (-0.19%)
var_size_expr_evaluator	q05	2624.56	2565.74	58.81 (2.29%)
var_size_expr_evaluator	q06	1322.23	1322.53	-0.31 (-0.02%)
var_size_seq_scan	q19	1443.47	1446.67	-3.21 (-0.22%)
var_size_seq_scan	q20	2686.25	2639.82	46.44 (1.76%)
var_size_seq_scan	q21	2300.29	2275.65	24.64 (1.08%)
var_size_seq_scan	q22	124.44	125.56	-1.12 (-0.89%)

…emory

github-actions · 2025-01-21T02:42:26Z

Benchmark Result

Master commit hash: 81814766536fcd3c7c25cef59f93e07842d5924b
Branch commit hash: c781a3b3ec64c3754ad3da3352c2a9004ad21b2e

Query Group	Query Name	Mean Time - Commit (ms)	Mean Time - Master (ms)	Diff
aggregation	q24	691.38	633.17	58.21 (9.19%)
aggregation	q28	6099.58	11418.56	-5318.98 (-46.58%)
filter	q14	118.09	117.39	0.70 (0.60%)
filter	q15	118.87	123.64	-4.77 (-3.86%)
filter	q16	298.73	295.51	3.22 (1.09%)
filter	q17	440.44	441.08	-0.64 (-0.14%)
filter	q18	1887.49	1888.28	-0.79 (-0.04%)
filter	zonemap-node	80.95	80.72	0.24 (0.29%)
filter	zonemap-node-lhs-cast	80.77	80.53	0.24 (0.29%)
filter	zonemap-node-null	76.81	78.83	-2.02 (-2.57%)
filter	zonemap-rel	5690.07	5692.84	-2.77 (-0.05%)
fixed_size_expr_evaluator	q07	565.90	571.37	-5.46 (-0.96%)
fixed_size_expr_evaluator	q08	797.16	806.05	-8.89 (-1.10%)
fixed_size_expr_evaluator	q09	796.04	799.39	-3.35 (-0.42%)
fixed_size_expr_evaluator	q10	229.46	236.17	-6.71 (-2.84%)
fixed_size_expr_evaluator	q11	221.71	228.95	-7.24 (-3.16%)
fixed_size_expr_evaluator	q12	222.48	226.41	-3.93 (-1.73%)
fixed_size_expr_evaluator	q13	1449.45	1440.14	9.31 (0.65%)
fixed_size_seq_scan	q23	105.32	114.75	-9.42 (-8.21%)
join	q29	644.50	620.26	24.24 (3.91%)
join	q30	10222.70	10228.93	-6.23 (-0.06%)
join	q31	6.06	7.02	-0.96 (-13.62%)
join	SelectiveTwoHopJoin	56.14	54.58	1.56 (2.87%)
ldbc_snb_ic	q35	2525.27	2502.96	22.32 (0.89%)
ldbc_snb_ic	q36	484.89	496.80	-11.91 (-2.40%)
ldbc_snb_is	q32	5.48	6.13	-0.65 (-10.63%)
ldbc_snb_is	q33	13.91	17.09	-3.17 (-18.57%)
ldbc_snb_is	q34	1.48	1.47	0.02 (1.18%)
multi-rel	multi-rel-large-scan	1365.79	1388.85	-23.05 (-1.66%)
multi-rel	multi-rel-lookup	33.41	32.57	0.84 (2.59%)
multi-rel	multi-rel-small-scan	97.94	95.06	2.88 (3.03%)
order_by	q25	119.52	123.51	-4.00 (-3.24%)
order_by	q26	456.15	443.72	12.43 (2.80%)
order_by	q27	1450.57	1460.16	-9.60 (-0.66%)
recursive_join	recursive-join-bidirection	308.28	282.90	25.38 (8.97%)
recursive_join	recursive-join-dense	7382.69	7375.60	7.09 (0.10%)
recursive_join	recursive-join-path	23412.80	23552.68	-139.88 (-0.59%)
recursive_join	recursive-join-sparse	1062.87	1059.17	3.70 (0.35%)
recursive_join	recursive-join-trail	7348.56	7321.35	27.21 (0.37%)
scan_after_filter	q01	163.59	163.57	0.02 (0.01%)
scan_after_filter	q02	148.29	148.39	-0.10 (-0.07%)
shortest_path_ldbc100	q37	87.40	95.93	-8.52 (-8.88%)
shortest_path_ldbc100	q38	374.94	406.03	-31.09 (-7.66%)
shortest_path_ldbc100	q39	62.41	64.24	-1.83 (-2.85%)
shortest_path_ldbc100	q40	463.38	462.06	1.31 (0.28%)
var_size_expr_evaluator	q03	2068.52	2062.08	6.44 (0.31%)
var_size_expr_evaluator	q04	2206.72	2244.61	-37.90 (-1.69%)
var_size_expr_evaluator	q05	2612.32	2528.43	83.88 (3.32%)
var_size_expr_evaluator	q06	1318.65	1315.94	2.71 (0.21%)
var_size_seq_scan	q19	1438.78	1457.88	-19.11 (-1.31%)
var_size_seq_scan	q20	2617.67	2641.56	-23.89 (-0.90%)
var_size_seq_scan	q21	2267.13	2270.09	-2.96 (-0.13%)
var_size_seq_scan	q22	125.23	125.99	-0.76 (-0.61%)

ray6080

Looks great Ben. Have some minor comments.

Also can you update the numbers here with the latest? I don' think we have the seg fault now right? #4655 (comment)

src/include/processor/operator/aggregate/hash_aggregate.h

src/include/common/utils.h

test/test_files/agg/hash_large.test

src/include/processor/operator/aggregate/hash_aggregate.h

ray6080 · 2025-01-21T06:01:40Z

src/processor/operator/aggregate/aggregate_hash_table.cpp

+        auto sourcePos = sourceStartOffset + idx;
+        memcpy(slot.entry, sourceTable.getTuple(sourcePos),
+            getTableSchema()->getNumBytesPerTuple());
+        // TODO: Ideally we should actually copy the overflow so that the original overflow data can


What's preventing this TODO? Are you going to address it separately? or not really looking forward to address it any soon?

It might have a significant impact on performance, so I wasn't rushing to complete it, but it probably should be addressed.

I think this would only work if we can get the overflow to support concurrent appends. At this point in the code that wouldn't matter, but we still wouldn't be able to free the overflow unless we also copy the overflow when partitioning.
Concurrent appends should be possible, but I was seeing a reasonably large difference in runtime, so maybe best to just leave it for now (as big as 0.5s->0.9s on one query with long strings).

github-actions · 2025-01-21T16:14:26Z

Benchmark Result

Master commit hash: bf9e8715410ea73ae063229ba470ef977d795857
Branch commit hash: 8f17e4242f3973db548b773b0e70a0904f88d97f

Query Group	Query Name	Mean Time - Commit (ms)	Mean Time - Master (ms)	Diff
aggregation	q24	688.99	633.96	55.03 (8.68%)
aggregation	q28	6118.08	11648.07	-5529.98 (-47.48%)
filter	q14	126.21	120.14	6.07 (5.05%)
filter	q15	127.01	116.28	10.73 (9.23%)
filter	q16	304.36	300.51	3.85 (1.28%)
filter	q17	449.02	441.25	7.78 (1.76%)
filter	q18	1946.12	1887.94	58.18 (3.08%)
filter	zonemap-node	89.95	81.93	8.02 (9.79%)
filter	zonemap-node-lhs-cast	91.28	82.20	9.08 (11.05%)
filter	zonemap-node-null	87.07	77.66	9.41 (12.12%)
filter	zonemap-rel	5730.10	5780.30	-50.20 (-0.87%)
fixed_size_expr_evaluator	q07	584.02	565.12	18.90 (3.35%)
fixed_size_expr_evaluator	q08	816.72	793.90	22.82 (2.87%)
fixed_size_expr_evaluator	q09	814.43	792.07	22.35 (2.82%)
fixed_size_expr_evaluator	q10	249.20	229.73	19.48 (8.48%)
fixed_size_expr_evaluator	q11	241.45	222.85	18.60 (8.35%)
fixed_size_expr_evaluator	q12	238.89	218.54	20.35 (9.31%)
fixed_size_expr_evaluator	q13	1463.62	1447.63	15.99 (1.10%)
fixed_size_seq_scan	q23	131.07	107.81	23.27 (21.58%)
join	q29	648.98	600.65	48.34 (8.05%)
join	q30	10247.70	9938.11	309.59 (3.12%)
join	q31	5.42	4.91	0.50 (10.28%)
join	SelectiveTwoHopJoin	50.83	58.13	-7.30 (-12.56%)
ldbc_snb_ic	q35	2628.70	2574.55	54.15 (2.10%)
ldbc_snb_ic	q36	480.91	481.53	-0.63 (-0.13%)
ldbc_snb_is	q32	3.91	6.58	-2.68 (-40.68%)
ldbc_snb_is	q33	11.72	16.52	-4.80 (-29.04%)
ldbc_snb_is	q34	1.41	1.37	0.04 (2.84%)
multi-rel	multi-rel-large-scan	1547.62	1357.83	189.79 (13.98%)
multi-rel	multi-rel-lookup	20.95	33.43	-12.48 (-37.33%)
multi-rel	multi-rel-small-scan	100.71	68.70	32.00 (46.58%)
order_by	q25	133.29	126.30	6.99 (5.53%)
order_by	q26	454.36	443.82	10.54 (2.38%)
order_by	q27	1457.34	1459.24	-1.91 (-0.13%)
recursive_join	recursive-join-bidirection	279.69	293.45	-13.77 (-4.69%)
recursive_join	recursive-join-dense	7377.48	6816.15	561.33 (8.24%)
recursive_join	recursive-join-path	23373.46	23584.47	-211.01 (-0.89%)
recursive_join	recursive-join-sparse	1059.51	1067.98	-8.48 (-0.79%)
recursive_join	recursive-join-trail	7347.64	7345.45	2.19 (0.03%)
scan_after_filter	q01	172.10	167.60	4.50 (2.68%)
scan_after_filter	q02	157.16	149.10	8.06 (5.41%)
shortest_path_ldbc100	q37	91.93	92.98	-1.04 (-1.12%)
shortest_path_ldbc100	q38	383.90	373.31	10.58 (2.83%)
shortest_path_ldbc100	q39	61.23	68.03	-6.80 (-10.00%)
shortest_path_ldbc100	q40	457.23	391.89	65.34 (16.67%)
var_size_expr_evaluator	q03	2065.52	2074.08	-8.56 (-0.41%)
var_size_expr_evaluator	q04	2238.00	2209.73	28.26 (1.28%)
var_size_expr_evaluator	q05	2539.53	2663.48	-123.95 (-4.65%)
var_size_expr_evaluator	q06	1318.69	1341.79	-23.11 (-1.72%)
var_size_seq_scan	q19	1439.01	1437.36	1.65 (0.11%)
var_size_seq_scan	q20	2644.07	2695.85	-51.78 (-1.92%)
var_size_seq_scan	q21	2276.46	2277.15	-0.69 (-0.03%)
var_size_seq_scan	q22	126.89	123.80	3.09 (2.50%)

github-actions · 2025-01-22T15:29:24Z

Benchmark Result

Master commit hash: f33c700493fb063a15e2346839adf5bca918324d
Branch commit hash: bd55e412850a632f45e57336f0c92f1f174bf5cf

Query Group	Query Name	Mean Time - Commit (ms)	Mean Time - Master (ms)	Diff
aggregation	q24	690.44	657.28	33.16 (5.05%)
aggregation	q28	6088.84	11216.95	-5128.11 (-45.72%)
filter	q14	126.80	144.53	-17.73 (-12.27%)
filter	q15	127.34	148.36	-21.02 (-14.17%)
filter	q16	303.71	320.43	-16.72 (-5.22%)
filter	q17	445.38	469.32	-23.94 (-5.10%)
filter	q18	1923.05	1947.25	-24.20 (-1.24%)
filter	zonemap-node	90.84	105.04	-14.20 (-13.52%)
filter	zonemap-node-lhs-cast	89.39	107.68	-18.29 (-16.98%)
filter	zonemap-node-null	85.12	101.23	-16.11 (-15.91%)
filter	zonemap-rel	5825.98	5908.43	-82.45 (-1.40%)
fixed_size_expr_evaluator	q07	571.15	587.58	-16.42 (-2.80%)
fixed_size_expr_evaluator	q08	801.70	817.30	-15.61 (-1.91%)
fixed_size_expr_evaluator	q09	804.28	818.55	-14.28 (-1.74%)
fixed_size_expr_evaluator	q10	237.83	253.37	-15.53 (-6.13%)
fixed_size_expr_evaluator	q11	229.70	244.93	-15.23 (-6.22%)
fixed_size_expr_evaluator	q12	226.25	242.01	-15.75 (-6.51%)
fixed_size_expr_evaluator	q13	1453.53	1469.38	-15.85 (-1.08%)
fixed_size_seq_scan	q23	111.22	128.71	-17.49 (-13.59%)
join	q29	613.69	593.41	20.29 (3.42%)
join	q30	10078.72	10138.13	-59.40 (-0.59%)
join	q31	7.81	8.44	-0.63 (-7.46%)
join	SelectiveTwoHopJoin	55.21	57.11	-1.90 (-3.33%)
ldbc_snb_ic	q35	2588.39	2704.02	-115.63 (-4.28%)
ldbc_snb_ic	q36	495.23	454.99	40.24 (8.85%)
ldbc_snb_is	q32	3.65	6.29	-2.64 (-41.96%)
ldbc_snb_is	q33	15.37	15.35	0.03 (0.17%)
ldbc_snb_is	q34	1.44	1.31	0.12 (9.34%)
multi-rel	multi-rel-large-scan	1317.76	1427.65	-109.89 (-7.70%)
multi-rel	multi-rel-lookup	11.56	8.94	2.63 (29.38%)
multi-rel	multi-rel-small-scan	95.26	92.32	2.94 (3.18%)
order_by	q25	132.21	144.36	-12.16 (-8.42%)
order_by	q26	454.69	477.17	-22.48 (-4.71%)
order_by	q27	1459.67	1482.99	-23.32 (-1.57%)
recursive_join	recursive-join-bidirection	298.29	300.64	-2.35 (-0.78%)
recursive_join	recursive-join-dense	5773.47	7402.46	-1628.98 (-22.01%)
recursive_join	recursive-join-path	23027.48	23583.05	-555.56 (-2.36%)
recursive_join	recursive-join-sparse	1059.24	1059.89	-0.65 (-0.06%)
recursive_join	recursive-join-trail	5977.64	7361.63	-1383.99 (-18.80%)
scan_after_filter	q01	171.69	186.89	-15.20 (-8.13%)
scan_after_filter	q02	156.62	175.50	-18.89 (-10.76%)
shortest_path_ldbc100	q37	83.46	89.07	-5.61 (-6.29%)
shortest_path_ldbc100	q38	383.28	387.00	-3.72 (-0.96%)
shortest_path_ldbc100	q39	63.50	69.29	-5.80 (-8.37%)
shortest_path_ldbc100	q40	466.92	433.42	33.50 (7.73%)
var_size_expr_evaluator	q03	2081.04	2081.77	-0.74 (-0.04%)
var_size_expr_evaluator	q04	2262.96	2204.12	58.83 (2.67%)
var_size_expr_evaluator	q05	2638.86	2633.92	4.93 (0.19%)
var_size_expr_evaluator	q06	1334.12	1346.18	-12.07 (-0.90%)
var_size_seq_scan	q19	1462.43	1468.66	-6.24 (-0.42%)
var_size_seq_scan	q20	2756.83	2823.73	-66.90 (-2.37%)
var_size_seq_scan	q21	2313.92	2394.63	-80.71 (-3.37%)
var_size_seq_scan	q22	125.86	129.54	-3.68 (-2.84%)

github-actions · 2025-01-22T16:48:38Z

Benchmark Result

Master commit hash: 99dd5dca0fa03b0ca9af007f794b2e85c497c9a5
Branch commit hash: 29cad62b8e34b059b18b69301257a7b9e67c4885

Query Group	Query Name	Mean Time - Commit (ms)	Mean Time - Master (ms)	Diff
aggregation	q24	689.11	639.72	49.39 (7.72%)
aggregation	q28	6090.98	11523.78	-5432.79 (-47.14%)
filter	q14	126.34	129.41	-3.08 (-2.38%)
filter	q15	124.35	127.07	-2.72 (-2.14%)
filter	q16	301.19	305.87	-4.68 (-1.53%)
filter	q17	448.13	446.23	1.90 (0.43%)
filter	q18	1927.88	1926.06	1.83 (0.09%)
filter	zonemap-node	89.15	88.56	0.58 (0.66%)
filter	zonemap-node-lhs-cast	88.67	89.70	-1.03 (-1.15%)
filter	zonemap-node-null	86.39	85.53	0.86 (1.01%)
filter	zonemap-rel	5790.68	5818.59	-27.91 (-0.48%)
fixed_size_expr_evaluator	q07	571.82	583.91	-12.08 (-2.07%)
fixed_size_expr_evaluator	q08	804.49	808.83	-4.34 (-0.54%)
fixed_size_expr_evaluator	q09	806.07	810.24	-4.17 (-0.51%)
fixed_size_expr_evaluator	q10	237.46	248.17	-10.71 (-4.31%)
fixed_size_expr_evaluator	q11	230.47	240.45	-9.98 (-4.15%)
fixed_size_expr_evaluator	q12	228.05	236.17	-8.12 (-3.44%)
fixed_size_expr_evaluator	q13	1452.10	1458.38	-6.28 (-0.43%)
fixed_size_seq_scan	q23	112.61	122.21	-9.60 (-7.86%)
join	q29	627.41	613.07	14.34 (2.34%)
join	q30	11015.71	10249.64	766.07 (7.47%)
join	q31	5.63	4.49	1.14 (25.52%)
join	SelectiveTwoHopJoin	63.15	53.25	9.90 (18.59%)
ldbc_snb_ic	q35	2593.60	2620.03	-26.44 (-1.01%)
ldbc_snb_ic	q36	475.72	451.14	24.58 (5.45%)
ldbc_snb_is	q32	3.03	6.09	-3.07 (-50.35%)
ldbc_snb_is	q33	10.79	14.13	-3.34 (-23.62%)
ldbc_snb_is	q34	1.45	1.37	0.09 (6.32%)
multi-rel	multi-rel-large-scan	1317.62	1410.09	-92.47 (-6.56%)
multi-rel	multi-rel-lookup	31.88	33.52	-1.64 (-4.89%)
multi-rel	multi-rel-small-scan	86.32	91.85	-5.53 (-6.02%)
order_by	q25	131.45	134.38	-2.94 (-2.19%)
order_by	q26	453.97	471.36	-17.39 (-3.69%)
order_by	q27	1473.78	1470.36	3.42 (0.23%)
recursive_join	recursive-join-bidirection	288.62	276.38	12.24 (4.43%)
recursive_join	recursive-join-dense	5404.12	7357.94	-1953.82 (-26.55%)
recursive_join	recursive-join-path	23256.53	23584.85	-328.33 (-1.39%)
recursive_join	recursive-join-sparse	1063.06	1065.69	-2.63 (-0.25%)
recursive_join	recursive-join-trail	5852.36	7328.53	-1476.17 (-20.14%)
scan_after_filter	q01	168.68	170.29	-1.60 (-0.94%)
scan_after_filter	q02	157.98	156.97	1.01 (0.64%)
shortest_path_ldbc100	q37	91.94	89.83	2.10 (2.34%)
shortest_path_ldbc100	q38	408.89	372.42	36.48 (9.80%)
shortest_path_ldbc100	q39	64.54	63.85	0.69 (1.08%)
shortest_path_ldbc100	q40	357.97	474.96	-116.99 (-24.63%)
var_size_expr_evaluator	q03	2103.10	2087.52	15.58 (0.75%)
var_size_expr_evaluator	q04	2256.63	2233.12	23.51 (1.05%)
var_size_expr_evaluator	q05	2639.71	2644.47	-4.76 (-0.18%)
var_size_expr_evaluator	q06	1328.85	1326.85	2.00 (0.15%)
var_size_seq_scan	q19	1462.37	1451.82	10.55 (0.73%)
var_size_seq_scan	q20	2755.56	2717.07	38.49 (1.42%)
var_size_seq_scan	q21	2309.22	2299.77	9.46 (0.41%)
var_size_seq_scan	q22	126.14	126.94	-0.80 (-0.63%)

benjaminwinger · 2025-01-22T21:19:12Z

Edit: Missed one thing when rebasing (one of the changes from #4709 needed to be done to a function newly added in this PR), so these benchmarks aren't fully up to date, however I'm not convinced that the change is significant. There was a large improvement the first time I ran the criterion benchmark again, and then it regressed in the other direction to be more or less the same as the initial benchmarks when I ran it again without changes

Updated performance (again)

For the query on the msmarco dataset in the PR description the runtime is now 41 seconds with a peak memory usage of 21GB.

Clickhouse benchmarks have more modest improvements (compare with #4655 (comment)), and I encountered a segfault that I'm going to look into.

Benchmarks adapted from https://github.com/ClickHouse/ClickBench/, run on a 128 thread runner (2x AMD EPYC 7551)

Query	Baseline	With Changes (128 threads)	DuckDB as a third-party point of reference (equivalent SQL query)
`MATCH (h:hits) WHERE h.SearchPhrase <> '' RETURN h.SearchPhrase, COUNT(*) AS c ORDER BY c DESC LIMIT 10;`	5.1s/4.7s	2.1s/0.46s	0.91s/0.30s
`MATCH (h:hits) WHERE h.SearchPhrase <> '' RETURN h.SearchEngineID, h.SearchPhrase, COUNT(*) AS c ORDER BY c DESC LIMIT 10;`	6.8s/4.4s	2.4s/0.56s	0.90s/0.30s
`MATCH (h:hits) RETURN h.UserID, COUNT() ORDER BY COUNT() DESC LIMIT 10;`	5.7s/5.2s	1.5s/1.0s	0.63s/0.25s
`MATCH (h:hits) RETURN h.UserID, h.SearchPhrase, COUNT() ORDER BY COUNT() DESC LIMIT 10;`	14.4s/9.5s	3.7s/1.2s	1.5s/0.5s

Results are Cold/Hot, where hot is on subsequent queries in the same process. OS VM caches are dropped after each set of queries.

Performance scaling

Below are some benchmarks showing scaling. Violin plots to show the distribution; the queries were run 10 times with some warmups. The results show some discrepancies from the above results, which might be a measuring issue, but I think do a good job of showing how well it scales. The click benchmarks were done through the python API and the below were through the rust API using criterion.rs to produce the plots.
Note that the thread count is cut off on all but the third query due to length, but including the query in the benchmark name was the easiest way of getting it into the plot.

github-actions · 2025-01-22T22:24:13Z

Benchmark Result

Master commit hash: 99dd5dca0fa03b0ca9af007f794b2e85c497c9a5
Branch commit hash: 830c9a3c793e6b3dcaff8f20e015329b3e2b4cc5

Query Group	Query Name	Mean Time - Commit (ms)	Mean Time - Master (ms)	Diff
aggregation	q24	694.22	639.72	54.50 (8.52%)
aggregation	q28	6079.83	11523.78	-5443.95 (-47.24%)
filter	q14	125.77	129.41	-3.64 (-2.81%)
filter	q15	130.33	127.07	3.27 (2.57%)
filter	q16	301.51	305.87	-4.36 (-1.43%)
filter	q17	445.86	446.23	-0.38 (-0.08%)
filter	q18	1908.36	1926.06	-17.70 (-0.92%)
filter	zonemap-node	89.81	88.56	1.24 (1.40%)
filter	zonemap-node-lhs-cast	90.46	89.70	0.76 (0.85%)
filter	zonemap-node-null	87.50	85.53	1.98 (2.31%)
filter	zonemap-rel	5800.37	5818.59	-18.23 (-0.31%)
fixed_size_expr_evaluator	q07	574.87	583.91	-9.03 (-1.55%)
fixed_size_expr_evaluator	q08	804.72	808.83	-4.11 (-0.51%)
fixed_size_expr_evaluator	q09	806.01	810.24	-4.23 (-0.52%)
fixed_size_expr_evaluator	q10	239.44	248.17	-8.72 (-3.51%)
fixed_size_expr_evaluator	q11	232.24	240.45	-8.21 (-3.41%)
fixed_size_expr_evaluator	q12	229.12	236.17	-7.05 (-2.98%)
fixed_size_expr_evaluator	q13	1459.31	1458.38	0.93 (0.06%)
fixed_size_seq_scan	q23	112.45	122.21	-9.76 (-7.99%)
join	q29	638.53	613.07	25.46 (4.15%)
join	q30	10234.82	10249.64	-14.82 (-0.14%)
join	q31	7.48	4.49	2.99 (66.68%)
join	SelectiveTwoHopJoin	54.05	53.25	0.80 (1.51%)
ldbc_snb_ic	q35	2556.07	2620.03	-63.96 (-2.44%)
ldbc_snb_ic	q36	485.42	451.14	34.28 (7.60%)
ldbc_snb_is	q32	7.45	6.09	1.35 (22.19%)
ldbc_snb_is	q33	10.26	14.13	-3.87 (-27.41%)
ldbc_snb_is	q34	1.79	1.37	0.43 (31.21%)
multi-rel	multi-rel-large-scan	1512.33	1410.09	102.24 (7.25%)
multi-rel	multi-rel-lookup	21.31	33.52	-12.21 (-36.42%)
multi-rel	multi-rel-small-scan	96.78	91.85	4.93 (5.37%)
order_by	q25	134.93	134.38	0.55 (0.41%)
order_by	q26	452.46	471.36	-18.90 (-4.01%)
order_by	q27	1457.35	1470.36	-13.01 (-0.89%)
recursive_join	recursive-join-bidirection	283.84	276.38	7.46 (2.70%)
recursive_join	recursive-join-dense	7386.71	7357.94	28.77 (0.39%)
recursive_join	recursive-join-path	23689.31	23584.85	104.46 (0.44%)
recursive_join	recursive-join-sparse	1061.67	1065.69	-4.02 (-0.38%)
recursive_join	recursive-join-trail	7348.71	7328.53	20.18 (0.28%)
scan_after_filter	q01	173.65	170.29	3.37 (1.98%)
scan_after_filter	q02	158.32	156.97	1.35 (0.86%)
shortest_path_ldbc100	q37	98.75	89.83	8.92 (9.93%)
shortest_path_ldbc100	q38	410.41	372.42	38.00 (10.20%)
shortest_path_ldbc100	q39	63.11	63.85	-0.74 (-1.16%)
shortest_path_ldbc100	q40	461.54	474.96	-13.41 (-2.82%)
var_size_expr_evaluator	q03	2078.28	2087.52	-9.24 (-0.44%)
var_size_expr_evaluator	q04	2263.20	2233.12	30.08 (1.35%)
var_size_expr_evaluator	q05	2636.16	2644.47	-8.31 (-0.31%)
var_size_expr_evaluator	q06	1325.93	1326.85	-0.92 (-0.07%)
var_size_seq_scan	q19	1461.23	1451.82	9.41 (0.65%)
var_size_seq_scan	q20	2731.86	2717.07	14.79 (0.54%)
var_size_seq_scan	q21	2288.25	2299.77	-11.51 (-0.50%)
var_size_seq_scan	q22	128.13	126.94	1.19 (0.94%)

benjaminwinger changed the title ~~Hash aggregate parallelization 2~~ Hash aggregate parallelization Dec 18, 2024

benjaminwinger changed the title ~~Hash aggregate parallelization~~ Hash aggregate finalization parallelization Dec 18, 2024

benjaminwinger force-pushed the hash-aggregate-parallelization-2 branch from bc28abe to 141a9b1 Compare December 18, 2024 23:33

This comment was marked as duplicate.

Sign in to view

ray6080 requested review from ray6080 and acquamarin December 19, 2024 07:04

benjaminwinger force-pushed the hash-aggregate-parallelization-2 branch from 141a9b1 to 4e52c10 Compare December 19, 2024 14:37

This comment was marked as duplicate.

Sign in to view

benjaminwinger force-pushed the hash-aggregate-parallelization-2 branch from 4e52c10 to 0312ecf Compare December 19, 2024 19:16

This comment was marked as outdated.

Sign in to view

benjaminwinger force-pushed the hash-aggregate-parallelization-2 branch from 0312ecf to 1590c35 Compare December 19, 2024 21:15

This comment was marked as outdated.

Sign in to view

benjaminwinger force-pushed the hash-aggregate-parallelization-2 branch from 1590c35 to 85edfff Compare December 20, 2024 16:53

This comment was marked as outdated.

Sign in to view

ray6080 reviewed Jan 3, 2025

View reviewed changes

benjaminwinger force-pushed the hash-aggregate-parallelization-2 branch from ba7e260 to 6814b68 Compare January 6, 2025 15:24

This comment was marked as outdated.

Sign in to view

ray6080 removed the request for review from acquamarin January 6, 2025 16:26

This comment was marked as outdated.

Sign in to view

benjaminwinger force-pushed the hash-aggregate-parallelization-2 branch 2 times, most recently from 3e5305b to eeec63a Compare January 8, 2025 21:27

This comment was marked as outdated.

Sign in to view

benjaminwinger force-pushed the hash-aggregate-parallelization-2 branch from eeec63a to 32b124c Compare January 9, 2025 15:18

This comment was marked as outdated.

Sign in to view

benjaminwinger force-pushed the hash-aggregate-parallelization-2 branch from 32b124c to d3355bb Compare January 9, 2025 22:14

benjaminwinger force-pushed the hash-aggregate-parallelization-2 branch 4 times, most recently from 1e0e0b2 to ada17f0 Compare January 17, 2025 18:48

Clean up some selectionvector code in the AggregateHashTable

ba2c757

benjaminwinger force-pushed the hash-aggregate-parallelization-2 branch 2 times, most recently from 180f882 to b6dd897 Compare January 20, 2025 15:09

Hash Aggregation parallelization

02def71

benjaminwinger force-pushed the hash-aggregate-parallelization-2 branch from b6dd897 to 02def71 Compare January 20, 2025 15:55

benjaminwinger force-pushed the hash-aggregate-parallelization-2 branch from 43570b4 to 3e6d83a Compare January 20, 2025 22:32

Fix data race and fix the large hash aggregate test to consume less m…

d6da832

…emory

benjaminwinger force-pushed the hash-aggregate-parallelization-2 branch from 3e6d83a to d6da832 Compare January 21, 2025 02:11

ray6080 approved these changes Jan 21, 2025

View reviewed changes

benjaminwinger force-pushed the hash-aggregate-parallelization-2 branch from 8632fce to 7da5e47 Compare January 21, 2025 22:28

This comment was marked as outdated.

Sign in to view

benjaminwinger force-pushed the hash-aggregate-parallelization-2 branch from 7da5e47 to d8e8f69 Compare January 22, 2025 14:54

benjaminwinger force-pushed the hash-aggregate-parallelization-2 branch from d8e8f69 to 7527417 Compare January 22, 2025 16:13

Review fixes

31d9509

benjaminwinger force-pushed the hash-aggregate-parallelization-2 branch from 7527417 to 31d9509 Compare January 22, 2025 21:50

benjaminwinger merged commit 10c3956 into master Jan 22, 2025
25 checks passed

benjaminwinger deleted the hash-aggregate-parallelization-2 branch January 22, 2025 23:16

benjaminwinger mentioned this pull request Jan 22, 2025

0.8.0 Release #4567

Open

73 tasks

Hash aggregate finalization parallelization #4655

Hash aggregate finalization parallelization #4655

Conversation

benjaminwinger commented Dec 18, 2024

Performance

This comment was marked as duplicate.

codecov bot commented Dec 19, 2024 • edited Loading

Codecov Report

This comment was marked as duplicate.

This comment was marked as outdated.

benjaminwinger commented Dec 19, 2024

This comment was marked as outdated.

This comment was marked as outdated.

ray6080 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjaminwinger commented Jan 6, 2025

Updated performance

ray6080 commented Jan 6, 2025 • edited Loading

Updated performance

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

github-actions bot commented Jan 9, 2025

Benchmark Result

github-actions bot commented Jan 17, 2025

Benchmark Result

github-actions bot commented Jan 20, 2025

Benchmark Result

github-actions bot commented Jan 21, 2025

Benchmark Result

ray6080 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 21, 2025

Benchmark Result

This comment was marked as outdated.

github-actions bot commented Jan 22, 2025

Benchmark Result

github-actions bot commented Jan 22, 2025

Benchmark Result

benjaminwinger commented Jan 22, 2025 • edited Loading

Updated performance (again)

Performance scaling

github-actions bot commented Jan 22, 2025

Benchmark Result

codecov bot commented Dec 19, 2024 •

edited

Loading

ray6080 commented Jan 6, 2025 •

edited

Loading

benjaminwinger commented Jan 22, 2025 •

edited

Loading