Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In memory mode #4012

Merged
merged 19 commits into from
Aug 8, 2024
Merged

In memory mode #4012

merged 19 commits into from
Aug 8, 2024

Conversation

ray6080
Copy link
Contributor

@ray6080 ray6080 commented Aug 2, 2024

Description

Add the support for in-memory mode. #1816.

Feature

When the database path is omitted, set to empty or set to :memory: (follow duck's convention here), the database will be open under in-memory mode.

The main differences between on-disk and in-memory mode are:
Under on-disk mode, all data will be persistent on disk. All transactions are logged in WAL, in which changes will be merged into database files during checkpoint.
While under the in-memory mode, there is no writes to WAL, no data is persistent to disk and CHECKPOINT will do nothing. All data are lost when the process finishes.

Restrictions on in-memory mode:

  • The database cannot be open as read-only.
  • When use Httpfs extension, we don't support file cache.
  • Attaching an in-memory database is not allowed. (maybe we should allow?)

Implementation

After MVCC changes, supporting in-mem mode becomes much easier.
New pages in BMFileHandle are pinned in BM and only unpinned when BMFileHandle gets destructed. Accesses to these pages directly grab frames from BM.

Benchmark

Did a few simple benchmarks on ldbc-100.

COPY

| table       | in mem (ms) | on disk (ms) |
| ----------- | ----------- | ------------ |
| Comment     | 14879       | 34696        |
| Person      | 513         | 749          |
| knows       | 1781        | 2123         |
| likeComment | 29246       | 34581        |
| space usage | in-mem mode | on-disk mode |
| ----------- | ----------- | ------------ |
| mem         | 56.7GB      | 32.2GB       |
| disk        | 0           | 29GB         |

Read only queries

Q1: match (c:Comment) return min(c.ID), min(c.creationDate), min(c.locationIP), min(c.browserUsed), min(c.content), min(c.length)
Q2: MATCH (a:Person)-[:knows]->(b:Person)-[:knows]->(c:Person) RETURN MIN(a.birthday), MIN(b.birthday), MIN(c.birthday)
Q3: MATCH (a:Person)-[r:knows* 2..2]->(b:Person) RETURN COUNT(*);

| query | in-mem | on-disk (cold) | on-disk (warm) |
| ----- | ------ | -------------- | -------------- |
| Q1    | 1896   | 5330           | 1791           |
| Q2    | 1004   | 948            | 893            |
| Q3    | 72133  | 84419          | 

Insertions

1M node insertions.

CREATE NODE TABLE Person (id INT64, name STRING, age INT64, net_worth FLOAT, PRIMARY KEY (id));

| 1M txn in-mem (ms) | 1M txn on-disk (ms) | 1 txn in-mem (ms) | 1 txn on-disk (ms) |
| ------------------ | ------------------- | ---------------- | ------------------- |
| 47814              | 79313               | 38895            | 40054               |

TODO

  • rewrite copy to / export tests (copy to should not rely on db path)

@ray6080 ray6080 force-pushed the in-mem branch 2 times, most recently from abdccb1 to ea4bd38 Compare August 5, 2024 05:06
Copy link

codecov bot commented Aug 5, 2024

Codecov Report

Attention: Patch coverage is 79.21687% with 69 lines in your changes missing coverage. Please review.

Project coverage is 85.11%. Comparing base (7139a6a) to head (53cb599).

Files Patch % Lines
test/test_runner/test_parser.cpp 60.00% 5 Missing and 3 partials ⚠️
src/storage/buffer_manager/bm_file_handle.cpp 74.07% 7 Missing ⚠️
src/storage/storage_structure/db_file_utils.cpp 50.00% 7 Missing ⚠️
src/storage/storage_structure/disk_array.cpp 83.33% 4 Missing ⚠️
test/c_api/database_test.cpp 50.00% 3 Missing and 1 partial ⚠️
test/include/graph_test/base_graph_test.h 66.66% 3 Missing and 1 partial ⚠️
test/include/test_helper/test_helper.h 75.00% 3 Missing and 1 partial ⚠️
test/main/system_config_test.cpp 0.00% 3 Missing and 1 partial ⚠️
tools/shell/shell_runner.cpp 0.00% 4 Missing ⚠️
src/main/attached_database.cpp 0.00% 3 Missing ⚠️
... and 13 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4012      +/-   ##
==========================================
- Coverage   85.16%   85.11%   -0.05%     
==========================================
  Files        1297     1297              
  Lines       50536    50597      +61     
  Branches     6956     6977      +21     
==========================================
+ Hits        43037    43067      +30     
- Misses       7368     7388      +20     
- Partials      131      142      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Aug 5, 2024

Benchmark Result

Master commit hash: b88aca112024631118b667bbe41441a542347475
Branch commit hash: dbcad479a81b8c0cf81e0db298e6d1ccc9c977fe

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 676.92 684.84 -7.92 (-1.16%)
aggregation q28 12077.59 12119.40 -41.81 (-0.34%)
filter q14 152.92 160.90 -7.98 (-4.96%)
filter q15 155.75 159.48 -3.72 (-2.34%)
filter q16 333.31 334.71 -1.40 (-0.42%)
filter q17 475.59 481.08 -5.49 (-1.14%)
filter q18 1982.65 1960.56 22.09 (1.13%)
fixed_size_expr_evaluator q07 562.34 571.87 -9.53 (-1.67%)
fixed_size_expr_evaluator q08 771.73 786.68 -14.95 (-1.90%)
fixed_size_expr_evaluator q09 774.25 785.94 -11.69 (-1.49%)
fixed_size_expr_evaluator q10 265.05 273.14 -8.09 (-2.96%)
fixed_size_expr_evaluator q11 259.70 267.41 -7.72 (-2.89%)
fixed_size_expr_evaluator q12 259.62 266.59 -6.96 (-2.61%)
fixed_size_expr_evaluator q13 1494.06 1504.77 -10.71 (-0.71%)
fixed_size_seq_scan q23 142.46 150.88 -8.42 (-5.58%)
join q31 49.56 52.73 -3.17 (-6.02%)
ldbc_snb_ic q35 3793.59 3690.90 102.69 (2.78%)
ldbc_snb_ic q36 132.46 130.27 2.19 (1.68%)
ldbc_snb_is q32 9.89 9.66 0.23 (2.42%)
ldbc_snb_is q33 94.75 97.21 -2.47 (-2.54%)
ldbc_snb_is q34 84.20 79.92 4.28 (5.36%)
multi-rel multi-rel-large-scan 3828.36 2777.88 1050.47 (37.82%)
multi-rel multi-rel-lookup 66.03 80.21 -14.18 (-17.67%)
multi-rel multi-rel-small-scan 60.80 62.56 -1.76 (-2.82%)
order_by q25 152.16 164.96 -12.80 (-7.76%)
order_by q26 473.94 499.50 -25.56 (-5.12%)
order_by q27 1427.65 1432.58 -4.94 (-0.34%)
scan_after_filter q01 197.95 207.78 -9.83 (-4.73%)
scan_after_filter q02 186.42 196.56 -10.14 (-5.16%)
shortest_path_ldbc100 q39 92.16 93.81 -1.64 (-1.75%)
var_size_expr_evaluator q03 2108.80 2097.71 11.08 (0.53%)
var_size_expr_evaluator q04 2309.78 2317.44 -7.66 (-0.33%)
var_size_expr_evaluator q05 2645.02 2575.45 69.57 (2.70%)
var_size_expr_evaluator q06 1350.41 1367.18 -16.77 (-1.23%)
var_size_seq_scan q19 1484.59 1498.41 -13.82 (-0.92%)
var_size_seq_scan q20 3260.70 3216.88 43.82 (1.36%)
var_size_seq_scan q21 2493.18 2528.56 -35.39 (-1.40%)
var_size_seq_scan q22 136.16 138.40 -2.24 (-1.62%)

@ray6080 ray6080 marked this pull request as ready for review August 5, 2024 18:53
Copy link

github-actions bot commented Aug 5, 2024

Benchmark Result

Master commit hash: b88aca112024631118b667bbe41441a542347475
Branch commit hash: c48b1f2d286adfcbc109782a1a1e2005a1b682f4

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 677.91 684.84 -6.93 (-1.01%)
aggregation q28 10844.93 12119.40 -1274.48 (-10.52%)
filter q14 151.38 160.90 -9.52 (-5.92%)
filter q15 157.96 159.48 -1.51 (-0.95%)
filter q16 334.77 334.71 0.06 (0.02%)
filter q17 473.33 481.08 -7.75 (-1.61%)
filter q18 1969.92 1960.56 9.36 (0.48%)
fixed_size_expr_evaluator q07 570.67 571.87 -1.20 (-0.21%)
fixed_size_expr_evaluator q08 782.41 786.68 -4.27 (-0.54%)
fixed_size_expr_evaluator q09 789.39 785.94 3.45 (0.44%)
fixed_size_expr_evaluator q10 265.31 273.14 -7.83 (-2.87%)
fixed_size_expr_evaluator q11 261.40 267.41 -6.01 (-2.25%)
fixed_size_expr_evaluator q12 258.75 266.59 -7.84 (-2.94%)
fixed_size_expr_evaluator q13 1494.14 1504.77 -10.63 (-0.71%)
fixed_size_seq_scan q23 147.48 150.88 -3.40 (-2.25%)
join q31 48.63 52.73 -4.10 (-7.77%)
ldbc_snb_ic q35 3771.96 3690.90 81.05 (2.20%)
ldbc_snb_ic q36 130.52 130.27 0.25 (0.19%)
ldbc_snb_is q32 10.50 9.66 0.84 (8.70%)
ldbc_snb_is q33 90.24 97.21 -6.97 (-7.17%)
ldbc_snb_is q34 99.81 79.92 19.89 (24.89%)
multi-rel multi-rel-large-scan 2765.13 2777.88 -12.75 (-0.46%)
multi-rel multi-rel-lookup 63.83 80.21 -16.38 (-20.42%)
multi-rel multi-rel-small-scan 51.05 62.56 -11.52 (-18.41%)
order_by q25 159.12 164.96 -5.84 (-3.54%)
order_by q26 473.48 499.50 -26.02 (-5.21%)
order_by q27 1432.02 1432.58 -0.57 (-0.04%)
scan_after_filter q01 197.14 207.78 -10.64 (-5.12%)
scan_after_filter q02 185.58 196.56 -10.97 (-5.58%)
shortest_path_ldbc100 q39 93.69 93.81 -0.12 (-0.13%)
var_size_expr_evaluator q03 2094.65 2097.71 -3.06 (-0.15%)
var_size_expr_evaluator q04 2312.42 2317.44 -5.02 (-0.22%)
var_size_expr_evaluator q05 2640.44 2575.45 64.99 (2.52%)
var_size_expr_evaluator q06 1344.85 1367.18 -22.33 (-1.63%)
var_size_seq_scan q19 1487.80 1498.41 -10.61 (-0.71%)
var_size_seq_scan q20 3243.07 3216.88 26.19 (0.81%)
var_size_seq_scan q21 2546.06 2528.56 17.50 (0.69%)
var_size_seq_scan q22 137.23 138.40 -1.17 (-0.84%)

Copy link

github-actions bot commented Aug 6, 2024

Benchmark Result

Master commit hash: b88aca112024631118b667bbe41441a542347475
Branch commit hash: 39ff58799bd145d2d8523d1744d1a8475a150ba7

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 676.60 684.84 -8.24 (-1.20%)
aggregation q28 11309.67 12119.40 -809.73 (-6.68%)
filter q14 150.49 160.90 -10.41 (-6.47%)
filter q15 154.57 159.48 -4.91 (-3.08%)
filter q16 335.27 334.71 0.56 (0.17%)
filter q17 472.29 481.08 -8.79 (-1.83%)
filter q18 1977.09 1960.56 16.53 (0.84%)
fixed_size_expr_evaluator q07 567.45 571.87 -4.42 (-0.77%)
fixed_size_expr_evaluator q08 783.53 786.68 -3.15 (-0.40%)
fixed_size_expr_evaluator q09 785.93 785.94 -0.01 (-0.00%)
fixed_size_expr_evaluator q10 265.17 273.14 -7.97 (-2.92%)
fixed_size_expr_evaluator q11 261.02 267.41 -6.40 (-2.39%)
fixed_size_expr_evaluator q12 258.30 266.59 -8.29 (-3.11%)
fixed_size_expr_evaluator q13 1497.28 1504.77 -7.49 (-0.50%)
fixed_size_seq_scan q23 147.14 150.88 -3.74 (-2.48%)
join q31 49.87 52.73 -2.86 (-5.43%)
ldbc_snb_ic q35 3716.71 3690.90 25.81 (0.70%)
ldbc_snb_ic q36 129.18 130.27 -1.09 (-0.84%)
ldbc_snb_is q32 10.48 9.66 0.82 (8.50%)
ldbc_snb_is q33 96.89 97.21 -0.32 (-0.33%)
ldbc_snb_is q34 91.84 79.92 11.92 (14.92%)
multi-rel multi-rel-large-scan 2770.63 2777.88 -7.25 (-0.26%)
multi-rel multi-rel-lookup 78.84 80.21 -1.37 (-1.71%)
multi-rel multi-rel-small-scan 53.81 62.56 -8.76 (-14.00%)
order_by q25 158.26 164.96 -6.70 (-4.06%)
order_by q26 475.42 499.50 -24.08 (-4.82%)
order_by q27 1430.93 1432.58 -1.65 (-0.12%)
scan_after_filter q01 197.93 207.78 -9.85 (-4.74%)
scan_after_filter q02 185.97 196.56 -10.59 (-5.39%)
shortest_path_ldbc100 q39 94.39 93.81 0.58 (0.62%)
var_size_expr_evaluator q03 2101.85 2097.71 4.14 (0.20%)
var_size_expr_evaluator q04 2311.33 2317.44 -6.11 (-0.26%)
var_size_expr_evaluator q05 2646.82 2575.45 71.37 (2.77%)
var_size_expr_evaluator q06 1349.62 1367.18 -17.55 (-1.28%)
var_size_seq_scan q19 1483.23 1498.41 -15.17 (-1.01%)
var_size_seq_scan q20 3240.09 3216.88 23.22 (0.72%)
var_size_seq_scan q21 2497.03 2528.56 -31.53 (-1.25%)
var_size_seq_scan q22 136.23 138.40 -2.17 (-1.57%)

src/storage/storage_structure/db_file_utils.cpp Outdated Show resolved Hide resolved
@@ -33,6 +34,11 @@ class DBFileUtils {
public:
constexpr static common::page_idx_t NULL_PAGE_IDX = common::INVALID_PAGE_IDX;

static uint8_t* pinPage(BMFileHandle& fileHandle, common::page_idx_t pageIdx,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned that this adds new ways of doing pinning/optimistic reads without making it obvious what the differences are at a glance.
Maybe BufferManager::pin/BufferManager::optimisticRead should be private, or maybe we shouldn't even be passing around the BufferManager at all for these file operations. The BMFileHandle already stores a pointer to the BufferManager, so we could have a BMFileHandle::pin(page_idx_t, PageReadPolicy) function instead that handles the memory mode. While that doesn't really make the differences any clearer, it at least would make the BMFileHandle functions the more obvious choice, and we could remove the BufferManager arguments from DBFileUtils and avoid passing it around as much and make the BufferManager more of an internal detail that you shouldn't need to interact with when working with file handles (though #3743 will still mean that the MemoryManager needs to get passed around, but with this removed maybe we can even remove MemoryManager::getBufferManager).

const auto frame = dataFH->getBM()->getFrame(*dataFH, startPageIdx);
memcpy(frame, buffer, bufferSize);
} else {
dataFH->getFileInfo()->writeFile(buffer, bufferSize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should move away from exposing the FileInfo through the file handle and add a write function that more or less is equivalent to this block (i.e. memcpy in in-memory-mode and use the FileInfo otherwise).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separating this to another PR for refactoring.

Copy link

github-actions bot commented Aug 7, 2024

Benchmark Result

Master commit hash: 7139a6a92aaf66166602c58f0afedc83fab783c4
Branch commit hash: 4fcf1c03843d33267630e5ff42d2509440ff11f0

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 675.89 675.13 0.75 (0.11%)
aggregation q28 11371.01 11729.63 -358.62 (-3.06%)
filter q14 151.43 150.67 0.76 (0.51%)
filter q15 155.72 151.85 3.87 (2.55%)
filter q16 353.45 331.11 22.34 (6.75%)
filter q17 473.49 475.82 -2.33 (-0.49%)
filter q18 1958.42 1961.21 -2.79 (-0.14%)
fixed_size_expr_evaluator q07 562.34 566.85 -4.52 (-0.80%)
fixed_size_expr_evaluator q08 776.39 776.11 0.28 (0.04%)
fixed_size_expr_evaluator q09 771.09 779.11 -8.02 (-1.03%)
fixed_size_expr_evaluator q10 264.16 266.72 -2.56 (-0.96%)
fixed_size_expr_evaluator q11 259.52 260.32 -0.80 (-0.31%)
fixed_size_expr_evaluator q12 259.19 258.80 0.39 (0.15%)
fixed_size_expr_evaluator q13 1497.15 1501.45 -4.30 (-0.29%)
fixed_size_seq_scan q23 143.71 140.44 3.28 (2.33%)
join q31 48.79 48.92 -0.13 (-0.26%)
ldbc_snb_ic q35 3705.18 3567.69 137.50 (3.85%)
ldbc_snb_ic q36 121.94 131.64 -9.70 (-7.37%)
ldbc_snb_is q32 10.27 10.29 -0.02 (-0.18%)
ldbc_snb_is q33 100.53 94.38 6.15 (6.51%)
ldbc_snb_is q34 95.47 89.88 5.59 (6.22%)
multi-rel multi-rel-large-scan 3084.11 2761.02 323.09 (11.70%)
multi-rel multi-rel-lookup 82.36 72.11 10.25 (14.21%)
multi-rel multi-rel-small-scan 66.65 71.32 -4.66 (-6.54%)
order_by q25 157.18 157.09 0.10 (0.06%)
order_by q26 483.78 480.98 2.80 (0.58%)
order_by q27 1428.66 1424.57 4.09 (0.29%)
scan_after_filter q01 198.68 197.52 1.16 (0.59%)
scan_after_filter q02 185.98 185.85 0.13 (0.07%)
shortest_path_ldbc100 q39 78.18 87.51 -9.33 (-10.66%)
var_size_expr_evaluator q03 2078.34 2070.41 7.93 (0.38%)
var_size_expr_evaluator q04 2271.37 2265.51 5.86 (0.26%)
var_size_expr_evaluator q05 2551.50 2627.90 -76.40 (-2.91%)
var_size_expr_evaluator q06 1355.44 1357.48 -2.04 (-0.15%)
var_size_seq_scan q19 1484.42 1489.80 -5.38 (-0.36%)
var_size_seq_scan q20 3139.92 3142.24 -2.32 (-0.07%)
var_size_seq_scan q21 2451.15 2411.98 39.17 (1.62%)
var_size_seq_scan q22 135.26 134.69 0.57 (0.42%)

@@ -32,63 +32,59 @@ class PageState {

PageState() { stateAndVersion.store(EVICTED << NUM_BITS_TO_SHIFT_FOR_STATE); }

inline uint64_t getState() const { return getState(stateAndVersion.load()); }
inline static uint64_t getState(uint64_t stateAndVersion) {
uint64_t getState() const { return getState(stateAndVersion.load()); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably align this uint64_t with a typedef

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do this in a separate refactoring PR on BMFileHandle along with this comment #4012 (comment).

src/include/storage/buffer_manager/bm_file_handle.h Outdated Show resolved Hide resolved
class WALReplayer;
class WAL {
friend class WALReplayer;

public:
WAL(const std::string& directory, bool readOnly, BufferManager& bufferManager,
common::VirtualFileSystem* vfs, main::ClientContext* context);
WAL(const std::string& directory, bool readOnly, common::VirtualFileSystem* vfs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if u are passing around clientContext, u probably don't need to pass around vfs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I need to discuss with @acquamarin regarding to how vfs should make use of ClientContext. We are passing both in several different places, which should refactored altogether if possible.

src/main/database.cpp Show resolved Hide resolved
@@ -118,7 +119,7 @@ void Database::addExtensionOption(std::string name, LogicalTypeID type, Value de
extensionOptions->addExtensionOption(name, type, std::move(defaultValue));
}

ExtensionOption* Database::getExtensionOption(std::string name) {
ExtensionOption* Database::getExtensionOption(std::string name) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

src/storage/wal/wal.cpp Show resolved Hide resolved
@ray6080 ray6080 merged commit 6ab3e4b into master Aug 8, 2024
25 of 26 checks passed
@ray6080 ray6080 deleted the in-mem branch August 8, 2024 02:14
@andyfengHKU andyfengHKU mentioned this pull request Aug 13, 2024
ted-wq-x pushed a commit to ted-wq-x/kuzu that referenced this pull request Nov 14, 2024
* draft: in memory mode

* tools for in mem

* fix

* wip: adapt tests for in mem

* skip reloaddb for in mem db; fixes; add SKIP_IN_MEM to testing framework

* clean tests

* add clang in mem ci workflow

* Run clang-format

* update tests; rework DBConfig::isDBPathInMemory

* update api tests

* update tests

* bump extension version

* add kuzu api

* add ku_destroy in c api test

* update tests

* rework BMFileHandle interfaces

* fix clang tidy; skip force checkpoint test for in mem mode

* Run clang-format

* fix clang tidy

---------

Co-authored-by: CI Bot <[email protected]>

(cherry picked from commit 6ab3e4b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants