Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index entry serialization #4686

Merged
merged 10 commits into from
Jan 8, 2025
Merged

Index entry serialization #4686

merged 10 commits into from
Jan 8, 2025

Conversation

acquamarin
Copy link
Collaborator

@acquamarin acquamarin commented Jan 7, 2025

This PR supports serializing index catalog entries to disk.
General idea:
Serialization:
We serialize the full index entries to disk.
Deserialization:
We only deserialize the base class and save the auxiliary information in an array.
When the extension is loaded, it is going to deserialize the auxiliary information stored in the array and recreate the index.

Copy link

codecov bot commented Jan 7, 2025

Codecov Report

Attention: Patch coverage is 25.00000% with 33 lines in your changes missing coverage. Please review.

Project coverage is 86.17%. Comparing base (080cbc0) to head (bb8dc8c).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/catalog/catalog_entry/index_catalog_entry.cpp 0.00% 21 Missing ⚠️
...nclude/catalog/catalog_entry/index_catalog_entry.h 0.00% 6 Missing ⚠️
src/catalog/catalog_entry/catalog_entry.cpp 0.00% 3 Missing ⚠️
src/catalog/catalog.cpp 50.00% 2 Missing ⚠️
src/include/common/serializer/buffered_reader.h 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4686      +/-   ##
==========================================
- Coverage   86.22%   86.17%   -0.06%     
==========================================
  Files        1369     1371       +2     
  Lines       58232    58270      +38     
  Branches     7206     7206              
==========================================
+ Hits        50213    50214       +1     
- Misses       7855     7892      +37     
  Partials      164      164              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@acquamarin acquamarin force-pushed the index-entry-serialization branch from 0731b7b to 7feb3cd Compare January 8, 2025 01:51
Copy link

github-actions bot commented Jan 8, 2025

Benchmark Result

Master commit hash: 0c6a394a32b9092e874b144516d9366a706e7d0b
Branch commit hash: 6b61e714fcc69ac1f80a6e7d8a5bc361b01a9a9e

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 647.34 666.67 -19.32 (-2.90%)
aggregation q28 12634.46 11065.46 1569.00 (14.18%)
copy node-Comment 72654.50 N/A N/A
copy node-Forum 5595.89 N/A N/A
copy node-Organisation 1236.70 N/A N/A
copy node-Person 2003.18 N/A N/A
copy node-Place 1156.93 N/A N/A
copy node-Post 29437.27 N/A N/A
copy node-Tag 1247.89 N/A N/A
copy node-Tagclass 1131.00 N/A N/A
copy rel-comment-hasCreator 57629.73 N/A N/A
copy rel-comment-hasTag 89031.03 N/A N/A
copy rel-comment-isLocatedIn 72820.37 N/A N/A
copy rel-containerOf 15097.80 N/A N/A
copy rel-forum-hasTag 4002.93 N/A N/A
copy rel-hasInterest 3096.23 N/A N/A
copy rel-hasMember 117475.55 N/A N/A
copy rel-hasModerator 1297.11 N/A N/A
copy rel-hasType 293.48 N/A N/A
copy rel-isPartOf 251.96 N/A N/A
copy rel-isSubclassOf 256.06 N/A N/A
copy rel-knows 13575.54 N/A N/A
copy rel-likes-comment 174871.56 N/A N/A
copy rel-likes-post 71899.10 N/A N/A
copy rel-organisation-isLocatedIn 252.45 N/A N/A
copy rel-person-isLocatedIn 467.67 N/A N/A
copy rel-post-hasCreator 14442.46 N/A N/A
copy rel-post-hasTag 22404.31 N/A N/A
copy rel-post-isLocatedIn 18314.80 N/A N/A
copy rel-replyOf-comment 49884.16 N/A N/A
copy rel-replyOf-post 36786.32 N/A N/A
copy rel-studyAt 779.38 N/A N/A
copy rel-workAt 1641.12 N/A N/A
filter q14 128.33 135.68 -7.34 (-5.41%)
filter q15 132.37 137.85 -5.48 (-3.97%)
filter q16 303.87 311.89 -8.01 (-2.57%)
filter q17 445.70 458.85 -13.16 (-2.87%)
filter q18 1939.80 1951.28 -11.48 (-0.59%)
filter zonemap-node 89.18 97.66 -8.48 (-8.69%)
filter zonemap-node-lhs-cast 91.52 99.19 -7.67 (-7.73%)
filter zonemap-node-null 86.46 95.14 -8.68 (-9.12%)
filter zonemap-rel 5687.05 5771.06 -84.02 (-1.46%)
fixed_size_expr_evaluator q07 572.66 598.74 -26.09 (-4.36%)
fixed_size_expr_evaluator q08 803.57 820.53 -16.96 (-2.07%)
fixed_size_expr_evaluator q09 801.54 830.88 -29.33 (-3.53%)
fixed_size_expr_evaluator q10 236.44 257.47 -21.03 (-8.17%)
fixed_size_expr_evaluator q11 231.52 251.51 -19.99 (-7.95%)
fixed_size_expr_evaluator q12 226.23 247.01 -20.78 (-8.41%)
fixed_size_expr_evaluator q13 1466.42 1487.80 -21.38 (-1.44%)
fixed_size_seq_scan q23 115.93 134.46 -18.53 (-13.78%)
join q29 615.94 618.88 -2.94 (-0.48%)
join q30 10659.82 10218.46 441.36 (4.32%)
join q31 5.59 8.03 -2.44 (-30.35%)
join SelectiveTwoHopJoin 53.51 53.38 0.13 (0.24%)
ldbc_snb_ic q35 2544.33 2584.14 -39.82 (-1.54%)
ldbc_snb_ic q36 461.22 437.49 23.72 (5.42%)
ldbc_snb_is q32 3.60 7.15 -3.55 (-49.70%)
ldbc_snb_is q33 9.90 17.29 -7.39 (-42.74%)
ldbc_snb_is q34 1.32 1.32 0.00 (0.10%)
multi-rel multi-rel-large-scan 1356.17 1370.98 -14.81 (-1.08%)
multi-rel multi-rel-lookup 19.67 21.75 -2.08 (-9.56%)
multi-rel multi-rel-small-scan 68.79 93.15 -24.36 (-26.15%)
order_by q25 145.35 140.08 5.27 (3.76%)
order_by q26 456.61 466.46 -9.85 (-2.11%)
order_by q27 1461.55 1488.59 -27.04 (-1.82%)
recursive_join recursive-join-bidirection 290.45 290.18 0.27 (0.09%)
recursive_join recursive-join-dense 7396.94 7491.91 -94.97 (-1.27%)
recursive_join recursive-join-path 23623.51 24141.69 -518.19 (-2.15%)
recursive_join recursive-join-sparse 1068.27 1075.83 -7.56 (-0.70%)
recursive_join recursive-join-trail 7301.20 7408.65 -107.45 (-1.45%)
scan_after_filter q01 170.88 183.67 -12.79 (-6.96%)
scan_after_filter q02 157.75 169.27 -11.52 (-6.81%)
shortest_path_ldbc100 q37 95.23 84.03 11.20 (13.33%)
shortest_path_ldbc100 q38 348.34 373.50 -25.16 (-6.74%)
shortest_path_ldbc100 q39 63.07 59.41 3.65 (6.15%)
shortest_path_ldbc100 q40 452.17 415.67 36.50 (8.78%)
var_size_expr_evaluator q03 2051.10 2096.91 -45.80 (-2.18%)
var_size_expr_evaluator q04 2269.05 2260.65 8.40 (0.37%)
var_size_expr_evaluator q05 2605.38 2665.41 -60.03 (-2.25%)
var_size_expr_evaluator q06 1326.23 1349.59 -23.36 (-1.73%)
var_size_seq_scan q19 1447.00 1483.26 -36.26 (-2.44%)
var_size_seq_scan q20 2614.76 2691.53 -76.77 (-2.85%)
var_size_seq_scan q21 2276.64 2302.03 -25.40 (-1.10%)
var_size_seq_scan q22 126.70 129.64 -2.94 (-2.27%)

@ray6080 ray6080 self-requested a review January 8, 2025 04:12
Copy link
Contributor

@ray6080 ray6080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also put a high level description of how the serialize/deSerialize of index entries work for extensions?

Otherwise looks good to me. See some comments below.

src/include/common/serializer/buffered_reader.h Outdated Show resolved Hide resolved
extension/fts/test/test_files/fts_small.test Show resolved Hide resolved
src/catalog/catalog_entry/index_catalog_entry.cpp Outdated Show resolved Hide resolved
extension/fts/src/fts_extension.cpp Show resolved Hide resolved
@acquamarin acquamarin merged commit 6c78434 into master Jan 8, 2025
22 checks passed
@acquamarin acquamarin deleted the index-entry-serialization branch January 8, 2025 21:11
Copy link

github-actions bot commented Jan 8, 2025

Benchmark Result

Master commit hash: decc6d5823880dbb9ae7194d12d608decb7ec5a8
Branch commit hash: ca59ba3cbbea5c714ae4ea0ed2c96fa60fba75d5

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 649.35 643.95 5.40 (0.84%)
aggregation q28 11901.10 11660.75 240.35 (2.06%)
copy node-Comment 73667.68 68451.44 5216.24 (7.62%)
copy node-Forum 5695.32 5488.48 206.84 (3.77%)
copy node-Organisation 1224.71 1208.26 16.45 (1.36%)
copy node-Person 2294.77 2139.74 155.03 (7.25%)
copy node-Place 1219.10 1147.13 71.97 (6.27%)
copy node-Post 28926.21 30123.40 -1197.19 (-3.97%)
copy node-Tag 1258.68 1226.60 32.08 (2.62%)
copy node-Tagclass 1144.80 1173.21 -28.41 (-2.42%)
copy rel-comment-hasCreator 55767.81 57910.42 -2142.61 (-3.70%)
copy rel-comment-hasTag 89295.38 90209.68 -914.30 (-1.01%)
copy rel-comment-isLocatedIn 73000.89 71367.13 1633.76 (2.29%)
copy rel-containerOf 14534.76 15333.46 -798.70 (-5.21%)
copy rel-forum-hasTag 4051.38 4014.23 37.15 (0.93%)
copy rel-hasInterest 3088.07 3075.03 13.04 (0.42%)
copy rel-hasMember 119759.28 123809.85 -4050.57 (-3.27%)
copy rel-hasModerator 1334.27 1345.72 -11.45 (-0.85%)
copy rel-hasType 294.15 284.08 10.07 (3.54%)
copy rel-isPartOf 289.25 325.02 -35.77 (-11.01%)
copy rel-isSubclassOf 250.69 271.51 -20.82 (-7.67%)
copy rel-knows 13562.07 13915.39 -353.32 (-2.54%)
copy rel-likes-comment 178418.43 181294.75 -2876.32 (-1.59%)
copy rel-likes-post 70683.92 67694.13 2989.79 (4.42%)
copy rel-organisation-isLocatedIn 247.91 244.20 3.71 (1.52%)
copy rel-person-isLocatedIn 436.71 513.90 -77.19 (-15.02%)
copy rel-post-hasCreator 14506.86 14630.62 -123.76 (-0.85%)
copy rel-post-hasTag 22760.86 22502.55 258.31 (1.15%)
copy rel-post-isLocatedIn 18048.07 18303.12 -255.05 (-1.39%)
copy rel-replyOf-comment 52930.79 47388.20 5542.59 (11.70%)
copy rel-replyOf-post 38754.74 37680.53 1074.21 (2.85%)
copy rel-studyAt 800.99 761.69 39.30 (5.16%)
copy rel-workAt 1543.69 1564.97 -21.28 (-1.36%)
filter q14 128.67 129.08 -0.41 (-0.32%)
filter q15 131.88 125.65 6.23 (4.96%)
filter q16 312.24 309.40 2.84 (0.92%)
filter q17 452.98 446.79 6.19 (1.39%)
filter q18 1992.42 1934.79 57.63 (2.98%)
filter zonemap-node 90.47 89.14 1.32 (1.49%)
filter zonemap-node-lhs-cast 90.56 89.31 1.25 (1.40%)
filter zonemap-node-null 86.33 85.19 1.14 (1.34%)
filter zonemap-rel 5785.17 5756.87 28.29 (0.49%)
fixed_size_expr_evaluator q07 572.40 576.03 -3.63 (-0.63%)
fixed_size_expr_evaluator q08 820.00 803.63 16.37 (2.04%)
fixed_size_expr_evaluator q09 806.27 807.16 -0.89 (-0.11%)
fixed_size_expr_evaluator q10 237.05 239.15 -2.10 (-0.88%)
fixed_size_expr_evaluator q11 230.60 233.61 -3.01 (-1.29%)
fixed_size_expr_evaluator q12 227.13 226.41 0.73 (0.32%)
fixed_size_expr_evaluator q13 1489.18 1459.02 30.16 (2.07%)
fixed_size_seq_scan q23 115.22 108.29 6.93 (6.40%)
join q29 598.53 617.79 -19.26 (-3.12%)
join q30 10566.24 10120.48 445.76 (4.40%)
join q31 5.08 7.32 -2.24 (-30.56%)
join SelectiveTwoHopJoin 53.37 52.58 0.79 (1.50%)
ldbc_snb_ic q35 2578.32 2506.95 71.38 (2.85%)
ldbc_snb_ic q36 436.66 469.19 -32.53 (-6.93%)
ldbc_snb_is q32 4.54 3.87 0.67 (17.41%)
ldbc_snb_is q33 12.25 13.72 -1.48 (-10.77%)
ldbc_snb_is q34 1.39 1.50 -0.11 (-7.13%)
multi-rel multi-rel-large-scan 1324.72 1302.00 22.72 (1.74%)
multi-rel multi-rel-lookup 18.02 24.38 -6.36 (-26.08%)
multi-rel multi-rel-small-scan 77.65 76.69 0.96 (1.25%)
order_by q25 132.04 129.16 2.88 (2.23%)
order_by q26 450.77 448.84 1.93 (0.43%)
order_by q27 1477.69 1456.60 21.09 (1.45%)
recursive_join recursive-join-bidirection 303.72 281.94 21.78 (7.72%)
recursive_join recursive-join-dense 7387.19 7322.47 64.72 (0.88%)
recursive_join recursive-join-path 23975.75 24035.02 -59.27 (-0.25%)
recursive_join recursive-join-sparse 1063.37 1056.00 7.37 (0.70%)
recursive_join recursive-join-trail 7353.57 7294.89 58.69 (0.80%)
scan_after_filter q01 173.90 170.53 3.37 (1.98%)
scan_after_filter q02 159.36 155.05 4.31 (2.78%)
shortest_path_ldbc100 q37 83.50 90.35 -6.84 (-7.58%)
shortest_path_ldbc100 q38 370.96 371.71 -0.75 (-0.20%)
shortest_path_ldbc100 q39 62.00 61.64 0.36 (0.58%)
shortest_path_ldbc100 q40 421.27 400.90 20.36 (5.08%)
var_size_expr_evaluator q03 2103.97 2059.42 44.55 (2.16%)
var_size_expr_evaluator q04 2313.80 2222.65 91.14 (4.10%)
var_size_expr_evaluator q05 2662.32 2606.73 55.58 (2.13%)
var_size_expr_evaluator q06 1322.09 1317.42 4.68 (0.36%)
var_size_seq_scan q19 1447.37 1444.53 2.84 (0.20%)
var_size_seq_scan q20 2617.18 2679.22 -62.03 (-2.32%)
var_size_seq_scan q21 2314.97 2317.01 -2.04 (-0.09%)
var_size_seq_scan q22 129.18 126.87 2.30 (1.82%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants