Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add a debug page with basic information about split store #10182

Merged
merged 2 commits into from
Nov 16, 2023

Conversation

jancionear
Copy link
Contributor

Add a new debug page: /debug/pages/split_store
This page provides basic information about the state of split store.
An example report looks like this:

image

Fixes: #9549

Add a debug endpoint which provides information about
the split store.
I reused the implementation that is used for the jsonrpc endpoint
`EXPERIMENTAL_split_storage_info`.

This new endpoint will be used by a debug page to display information
about the split store.
Add a new page: /debug/pages/split_store.
This page provides basic information about the state of split store.
An example report looks like this:
```
Cold head height: null
Final head height: 1831
Head height: 1833
Hot db kind: RPC
```

Fixes: near#9549
@jancionear jancionear requested a review from a team as a code owner November 15, 2023 14:56
@jancionear jancionear requested a review from akhi3030 November 15, 2023 14:56
@jancionear
Copy link
Contributor Author

Tbf I'm not entirely sure about this solution. I still don't understand what the cold store does, why is it even needed? I looked around a bit but couldn't find a clear explanation.
But I put something together, I think it fixes #9549 as the issue only asked to display the cold store head.
I'd appreciate feedback.

@jancionear jancionear changed the title Split store page @jancionear feat: add a debug page with basic information about split store Nov 15, 2023
@jancionear jancionear changed the title @jancionear feat: add a debug page with basic information about split store feat: add a debug page with basic information about split store Nov 15, 2023
@wacban wacban requested review from wacban and posvyatokum November 15, 2023 15:11
@akhi3030
Copy link
Collaborator

Looks like wacban is on top of this. Removing myself from reviewers.

@akhi3030 akhi3030 removed their request for review November 15, 2023 15:14
@wacban
Copy link
Contributor

wacban commented Nov 15, 2023

I can't find good documentation on that topic but perhaps @posvyatokum or @andrei-near will know.
TLDR

  • regular nodes only store the most recent chain history
  • archival nodes store full chain history
  • legacy archival nodes store everything in a single RocksDB, on a single, huge and expensive SSD disk
  • split store archival nodes store the most recent history and the rest of the history on two separate disks.
  • the hot disk is typically SSD, it's nice and fast, the cold disk is typically HDD, it's big and slow but cheap
  • benefit 1 - hot being small, is much faster and thanks to this the node can keep up with the network
  • benefit 2 - massive cost saving thanks to HDD being much cheaper, especially for such large volumes

Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +46 to +48
// Implementing From<RpcSplitStorageInfoError> for RpcStatusError causes cargo to spit out hundreds
// of lines of compilation errors. I don't want to spend time debugging this, so let's use this function instead.
// It's good enough.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol
if you want we can try together, for the sake of learing rust but as you said it's just fine as is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh I'm fine with leaving it as is. The errors looked pretty intimidating, and there is no point in spending mental energy on trying to figure out what it is complaining about. "Focus on what matters", right? :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd even argue that into_rpc_status_error is more readable than just into :P

@jancionear
Copy link
Contributor Author

I can't find good documentation on that topic but perhaps @posvyatokum or @andrei-near will know.
TLDR
regular nodes only store the most recent chain history
archival nodes store full chain history
legacy archival nodes store everything in a single RocksDB, on a single, huge and expensive SSD disk
split store archival nodes store the most recent history and the rest of the history on two separate disks.
the hot disk is typically SSD, it's nice and fast, the cold disk is typically HDD, it's big and slow but cheap
benefit 1 - hot being small, is much faster and thanks to this the node can keep up with the network
benefit 2 - massive cost saving thanks to HDD being much cheaper, especially for such large volumes

Ahh, thanks, it starts making sense now :)

I think I saw something similar before, but it was done using something like lvm-cache/bcachefs. The SSD and HDD were connected into a single logical device, which used the SSD as a cache under the hood. All of the frequently accessed data was kept on the SSD automatically, while the cold data stayed on the HDD. The plus was that it didn't require any modifications in the application, but I have no idea if that would work with RocksDB.

@jancionear jancionear added this pull request to the merge queue Nov 16, 2023
Merged via the queue into near:master with commit bfb3b58 Nov 16, 2023
@jancionear jancionear deleted the split-store-page branch November 16, 2023 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[first-issue] Create a simple split-store debug page
3 participants