From 615a92f3e4f7b05e74762f587ccb889398b40874 Mon Sep 17 00:00:00 2001 From: Andrea Date: Fri, 15 Nov 2024 15:54:34 +0100 Subject: [PATCH] Add reference implementation for Flat state to resharding NEP (#575) --- neps/nep-0568.md | 85 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) diff --git a/neps/nep-0568.md b/neps/nep-0568.md index 7627a75d4..720c165c9 100644 --- a/neps/nep-0568.md +++ b/neps/nep-0568.md @@ -225,6 +225,91 @@ During a resharding event, at the boundary of the epoch, when we need to split t ![Hybrid MemTrie diagram](assets/nep-0568/NEP-HybridMemTrie.png) +### State Storage - Flat State + +Resharding Flat State is a time consuming operation and it runs in parallel with block processing for several block heights. +Thus, there are a few important aspects to consider during implementation: +- Flat State's own status should be resilient to application crashes. +- The parent shard's Flat State should be split at the correct block height. +- New shards' Flat States should eventually converge to same representation the chain is using to process blocks (MemTries). +- Resharding should work correctly in the presence of chain forks. +- Retired shards are cleaned up. + +Note that the Flat States of the newly created shards won't be available until resharding is completed. This is fine because the temporary MemTries are +built instantly and they can satisfy all block processing needs. + +The main component responsible to carry out resharding on Flat State is the [FlatStorageResharder](https://github.com/near/nearcore/blob/f4e9dd5d6e07089dfc789221ded8ec83bfe5f6e8/chain/chain/src/flat_storage_resharder.rs#L68). + +#### Flat State's status persistence +Every shard Flat State has a status associated to it and stored in the database, called `FlatStorageStatus`. We propose to extend the existing object +by adding the new enum variant named `FlatStorageStatus::Resharding`. This approach has two benefits. First, the progress of any Flat State resharding is +persisted to disk, which makes the operation resilient to a node crash or restart. Second, resuming resharding on node restart shares the same code path as Flat +State creation (see `FlatStorageShardCreator`), reducing the code duplication factor. + +`FlatStorageStatus` is updated at every committable step of resharding. The commit points are the following: +- Beginning of resharding or, in other words, the last block of the old shard layout. +- Scheduling of the _"split parent shard"_ task. +- Execution, cancellation or failure of the _"split parent shard"_ task. +- Execution or failure of any _"child catchup"_ task. + +#### Splitting a shard Flat State + +When, at the end of an epoch, the shard layout changes we identify a so called _resharding block_ that corresponds to the last block of the current epoch. +A task to split the parent shard's Flat State is scheduled to happen after the _resharding block_ becomes final. The reason to wait for the finality condition +is to avoid a split on a block that might be excluded from the canonical chain; needless to say, such situation would lock the node +into an erroneous state. + +Inside the split task we iterate over the Flat State and copy each element into either child. This routine is performed in batches in order to lessen the performance +impact on the node. + +Finally, if the split completes successfully, the parent shard Flat State is removed from the database and the children Flat States enter a catch-up phase. + +One current technical limitation is that, upon a node crash or restart, the _"split parent shard"_ task will start copying all elements again from the beginning. + +A reference implementation of splitting a Flat State can be found in [FlatStorageResharder::split_shard_task](https://github.com/near/nearcore/blob/fecce019f0355cf89b63b066ca206a3cdbbdffff/chain/chain/src/flat_storage_resharder.rs#L295). + +#### Values assignment from parent to child shards +Key-value pairs in the parent shard Flat State are inherited by children according to the rules stated below. + +Elements inherited by the child shard which tracks the `account_id` contained in the key: +- `ACCOUNT` +- `CONTRACT_DATA` +- `CONTRACT_CODE` +- `ACCESS_KEY` +- `RECEIVED_DATA` +- `POSTPONED_RECEIPT_ID` +- `PENDING_DATA_COUNT` +- `POSTPONED_RECEIPT` +- `PROMISE_YIELD_RECEIPT` + +Elements inherited by both children: +- `DELAYED_RECEIPT_OR_INDICES` +- `PROMISE_YIELD_INDICES` +- `PROMISE_YIELD_TIMEOUT` +- `BANDWIDTH_SCHEDULER_STATE` + +Elements inherited only be the lowest index child: +- `BUFFERED_RECEIPT_INDICES ` +- `BUFFERED_RECEIPT` + +#### Bring children shards up to date with the chain's head +Children shards' Flat States build a complete view of their content at the height of the `resharding block` sometime during the new epoch +after resharding. At that point in time many new blocks have been processed already, and these will most likely contain updates for the new shards. A catch-up step is necessary to apply all Flat State deltas accumulated until now. + +This phase of resharding doesn't have to take extra steps to handle chain forks. On one hand, the catch-up task doesn't start until the parent shard +splitting is done, and at such point we know the `resharding block` is final; on the other hand, Flat State deltas are capable of handling forks automatically. + +The catch-up task commits to the database "batches" of Flat State deltas. If the application crashes or restarts the task will resume from where it left. + +Once all Flat State deltas are applied, the child shard's status is changed to `Ready` and clean up of Flat State deltas leftovers is performed. + +A reference implementation of the catch-up task can be found in [FlatStorageResharder::shard_catchup_task](https://github.com/near/nearcore/blob/fecce019f0355cf89b63b066ca206a3cdbbdffff/chain/chain/src/flat_storage_resharder.rs#L564). + +#### Failure of Flat State resharding + +In the current proposal any failure during Flat State resharding is considered non-recoverable. +`neard` will attempt resharding again on restart, but no automatic recovery is implemented. + ### State Storage - State mapping To enable efficient shard state management during resharding, Resharding V3 uses the `DBCol::ShardUIdMapping` column.