Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(splitstore): merry christmas lotus! Remove ~120 G from lotus datastore #12803

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

ZenGround0
Copy link
Contributor

@ZenGround0 ZenGround0 commented Dec 25, 2024

Related Issues

#10554

Proposed Changes

As part of the christmas spirit I wanted to do something that everyone in the network would see as a gift. This year I decided to get us the gift of 100 fewer gigs in the lotus datastore

Basically today we hold onto two copies of the state. One original in the cold store and one hot copy. As far as I can tell there is zero reason to do this.

This change puts the snapshot directly into the hotstore since most of us are syncing from snapshots now.

Additional Info

As far as I can tell the original add snapshot to cold then warmup hot was meant to fit the broader pattern of the splitstore working with existing large datastores. However in today's environment the default and most commonly used pattern of operation is to sync from a snapshot and run in discard mode.

Somebody tell me if this breaks something? I'm running pretty good on mainnet right now.

Note: in the first version I removed the warmup procedure in favor of a config for setting full warmup. At this point I think that is the wrong move. We would pay for additional complexity of one config entry to skip warmup work at startup. This work is cheap enough to run once and by including it we automatically handle all upgrade paths to using splitstore without any user complexity. And in the common case we'll walk the chain and do has checks, no blockstore puts required which will be much cheaper than the current warmup.

Checklist

Before you mark the PR ready for review, please make sure that:

@ZenGround0 ZenGround0 marked this pull request as ready for review December 25, 2024 20:01
@ZenGround0 ZenGround0 requested a review from rvagg December 25, 2024 20:01
@@ -64,7 +64,7 @@ type LockedRepo interface {
// The supplied context must only be used to initialize the blockstore.
// The implementation should not retain the context for usage throughout
// the lifecycle.
Blockstore(ctx context.Context, domain BlockstoreDomain) (blockstore.Blockstore, error)
Blockstore(ctx context.Context, domain BlockstoreDomain) (blockstore.Blockstore, func() error, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to document what the returned func is in the comment, I had to go to the bottom of the only function that implements it to find the closer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or type it, type CloserFn func() error

fsr.bsErr = err
return
}
func (fsr *fsLockedRepo) Blockstore(ctx context.Context, domain BlockstoreDomain) (blockstore.Blockstore, func() error, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document the returned func here, or type it

@@ -52,7 +52,7 @@ var dealLabelCmd = &cli.Command{

defer lkrepo.Close() //nolint:errcheck

bs, err := lkrepo.Blockstore(ctx, repo.UniversalBlockstore)
bs, _, err := lkrepo.Blockstore(ctx, repo.UniversalBlockstore)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be using the closer in all of these?

return bs, nil
}

func SplitBlockstore(cfg *config.Chainstore) func(lc fx.Lifecycle, r repo.LockedRepo, ds dtypes.MetadataDS, cold dtypes.ColdBlockstore, hot dtypes.HotBlockstore) (dtypes.SplitBlockstore, error) {
return func(lc fx.Lifecycle, r repo.LockedRepo, ds dtypes.MetadataDS, cold dtypes.ColdBlockstore, hot dtypes.HotBlockstore) (dtypes.SplitBlockstore, error) {
func SplitBlockstore(cfg *config.Chainstore) func(lc fx.Lifecycle, r repo.LockedRepo, ds dtypes.MetadataDS, cold dtypes.ColdBlockstore, hot dtypes.HotBlockstore) (dtypes.SplitBlockstore, func() error, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func SplitBlockstore(cfg *config.Chainstore) func(lc fx.Lifecycle, r repo.LockedRepo, ds dtypes.MetadataDS, cold dtypes.ColdBlockstore, hot dtypes.HotBlockstore) (dtypes.SplitBlockstore, func() error, error) {
func SplitBlockstore(cfg *config.Chainstore) func(lc fx.Lifecycle, r repo.LockedRepo, ds dtypes.MetadataDS, cold dtypes.ColdBlockstore, hot dtypes.HotBlockstore) (dtypes.SplitBlockstore, error) {

I think this is both unnecessary and could cause a problem. It ends up passing func(...) (dtypes.SplitBlockstore, func() error, error) to fx.Provide which takes all arguments except the last (if it's an error) to be types to feed into the DI system to be available to match to anything that wants it. So we end up giving it both a dtypes.SplitBlockstore and a (noop) func() error. I'm not sure if fx is going to use it if it's not got a named type, but it's in the DI system at least and perhaps someone ends up adding some code that requires a func() error and magically gets this noop one and then can't figure out why it's not doing what they expect.

@rvagg
Copy link
Member

rvagg commented Jan 13, 2025

Well, it's already been quite an education trying to get my head around what this is doing and why it would work. Some things I've discovered that I didn't really understand previously:

  • ./datastore/chain is the 120G that we're "saving"
  • We're likely saving a fair bit less than that because: (a) we still need all the chain, regardless of where it's stored, and (b) it's only the churned state that would have been GCd that we'd be saving, there's a lot of static state that either never changes or won't change for a long time that we won't be saving—but the savings would add up over time for a splitstore node.
    • My chain is 120G but my splitstore is 62G. If we consider that a typical snapshot gives me approximately what splitstore's compaction gives me in terms of historical state, then I'm only wasting somewhere below 62G (chain growth is included in there so it is less), because a lot of that 120G is still useful - the chain, and static state. And I've been running this splitstore node since May.
  • This original behaviour looks to me like it comes from the original work being split between these two PRs: segregate chain and state blockstores #5695 & hot/cold blockstore segregation (aka. splitstore) #4992 - the former introducing the "split" blockstore that was intended to be able to have multiple backing stores for different purposes (a pipe dream ever-noticeable through Lotus code), this PR handled the Import case. Then the second PR introduced the hot store with GC, but it didn't touch the import case.
  • One of my main questions was whether this would impact the chain history that we need to have available - does moving it out of the universal store and into the hot store make it a candidate for GC? The answer is no: SplitStore#compact() is quite clear that it walks the chain, protects the block headers and even the tipsetkeys, but it only walks 4 finalities of stateroots (I wish this number were configurable, that'd be so much more helpful than the other splitstore config values).
  • SplitStore almost always will look in hot first and then look in cold, so it's this second read where we find most of the historic stuff that got imported when you started.

So I think this should be perfectly fine to do.

But, I wonder if it might be good to ship this along with a super-simple lotus-shed that you can run on a closed repo that would manually copy everything from the existing cold store into the hot store. Then you can take advantage of this without a resync. If that's too tricky then it's probably not worth it but it might be neat to ship this with that option.

@@ -9,6 +9,7 @@

# UNRELEASED

- Sync snapshots directly to hotstore. Save 120 G of lotus disk space. ([filecoin-project/lotus#12800](https://github.com/filecoin-project/lotus/pull/12803))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my comments, "120 G" is overselling it by quite a bit, we need to nuance this and probably not even put a number on it.

Copy link
Contributor Author

@ZenGround0 ZenGround0 Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically correct @rvagg but this is not really the christmas spirit. 🎅 😢

I really don't buy that this is significantly overselling it. Its just the difference in startup sizes immediately after syncing from snapshot and a lot of us do that pretty often. Plus it looks like its also exactly the high water mark which matters because that's the value that matters when you need to get bigger disks.

I guess we should have a graph of hotstore size over time to quantify it. After 8 months your hot store has leaned down as the state has replaced itself but its a slow process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it more I don't understand how you're seeing the numbers you do and the original claim seems to be incorrect. See latest comment in main thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you just missed the full copy of all cold objects to hot in the warmup procedure: https://github.com/filecoin-project/lotus/blob/master/blockstore/splitstore/splitstore_warmup.go#L89-L109

@rvagg
Copy link
Member

rvagg commented Jan 13, 2025

Another question I have about this is that might be important: is it going to increase CPU usage for hotstore compaction? Do we take shortcuts if we find that a CID is in the coldstore and essentially skip over it, while being more rigorous about hotstore objects? I don't think so but it's not crystal clear to me that this is the case.

@ZenGround0
Copy link
Contributor Author

ZenGround0 commented Jan 13, 2025

@rvagg your numbers aren't squaring up with my understanding of the system. Snapshot size is 1 state + 2.2 finalities of churn. After compaction we should have 1 state + 4 finalities of churn. So the hotstore should never be below snapshot size aka 120 G. In discussion above there is an assumption that data in the universal store is "useful" but I don't think that's ever the case. The discard store is backed by the universal store but 1) its read only and nothing is ever flushed to it 2) everything in the universal store is copied to the hotstore upon warmup. So the only time the universal store uniquely persists a block it is after that block has been discarded from hotstore and is definitionally not useful.

My claims above about the churn being slowly replaced explaining your datastore size are incorrect. The latest 4 finalities of state should always be kept in the hotstore.

Since you're seeing 60 G splitstore dir clearly I'm off about something here, do you see what I'm missing?

@rvagg
Copy link
Member

rvagg commented Jan 14, 2025

yeah, ok, warmup is something else... so I understand it less than I thought I did. I'll have to budget some time to dig further next week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🔎 Awaiting review
Development

Successfully merging this pull request may close these issues.

2 participants