Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[config change] DAS filestore trie layout migrator and expiry, fixes: NIT-2538, NIT-2537 #2385

Merged
merged 21 commits into from
Jul 9, 2024

Conversation

Tristan-Wilson
Copy link
Member

@Tristan-Wilson Tristan-Wilson commented Jun 12, 2024

This PR is to make the daserver LocalFileStorageService backend support expiry and future-proof the layout by avoiding having too many files directly under any one directory. It changes the layout from a flat layout to a trie based layout and it now has two top-level directories, by-data-hash and by-expiry-timestamp.

The previous filesystem layout for the data stored by the das LocalFileStorageService was simply:
<datadir>/1b7a84fcdb5f467ec4889e99dba58b2e6d56e0154538ee2a3083aa1582ab833e
where the data hash is the dastree.Hash of the data. This has already grown to contain 70K+ files.

In the future for AnyTrust chains we will be configuring a default of a batch every 15 seconds for fast withdrawals support, with some chains going down to once a second. This gives us a worst case of:
(3600*24*365) seconds in a year * 1 batch per second = 31536000 batches per year

From experience we should try to avoid having directories with more than 65k (uint16 max) files, so 2 levels of nesting based on the first two bytes will suffice to have an average of 481 files per directory.
31536000 batches per year / 256 directories at first level / 256 directories at second level = 481.20 batches per year per directory at second level

In the new filesystem layout, each data item is stored as follows:
<datadir>/by-data-hash/1b/7a/1b7a84fcdb5f467ec4889e99dba58b2e6d56e0154538ee2a3083aa1582ab833e
They are now stored in the directory by-data-hash, with a trie-based directory structure based on the first two leading octets of the data hash.

If expiry is enabled for the LocalFileStorageService, hard links to the by-data-hash files are stored with the following layout:

<datadir>/by-expiry-timestamp/171702/5712/1b7a84fcdb5f467ec4889e99dba58b2e6d56e0154538ee2a3083aa1582ab833e -hard-link-to-> <datadir>/by-data-hash/1b/7a/1b7a84fcdb5f467ec4889e99dba58b2e6d56e0154538ee2a3083aa1582ab833e

They are stored in a directory hierarchy starting with by-expiry-timestamp, expiry unix timestamp / 10000, expiry unix timestamp % 10000, then a symlink to the original by-data-hash file. 10000 seconds is ~2.7 hours. If a batch is posted every 1 second then each directory should have at most 10k files. This also handles multiple batches with the same timestamp. Since the by-expiry-timestamp directory will only be used when expiry is enabled, there is no concern of the first level (higher order) of directories growing too much.

Using named by-expiry-timestamp and by-data-hash directories allows us to version the data layout and add different indices in future if needed.

The daserver migrates from the legacy layout on startup. The by-expiry-timestamp index will only be created if expiry is enabled on the LocalFileStorageService. Migrated batch files will be assigned an expiry time by adding the --data-availability.local-file-storage.max-retention to the min of creation time of the file. Migration will be indicated by initially calling the directories by-data-hash-migrating and by-expiry-timestamp-migrating, and using a rename operation when complete.

Batch files synced with the --data-availability.rest-aggregator.sync-to-storage.eager are assigned an expiry time based on the block their batch was posted to L1 in, plus --data-availability.rest-aggregator.sync-to-storage.retention-period.

Added options:

      --data-availability.local-file-storage.enable-expiry                                         enable expiry of batches
      --data-availability.local-file-storage.max-retention duration                                store requests with expiry times farther in the future th
an max-retention will be rejected (default 504h0m0s)

Testing done

Added new unit tests for migration and expiry with the new layout.

Tested migration by starting daserver for Arb Nova with the data directory of an Arb Nova daserver mirror, using the following configuration:

{
  "data-availability": {
    "enable": true,
    "local-file-storage": {
      "data-dir": "your data dir",
      "enable": true,
      "enable-expiry": true,
      "max-retention": "12h0m"
    },
    "parent-chain-connection-attempts": 15,
    "parent-chain-node-url": "l1 url",
    "rest-aggregator": {
      "enable": true,
      "online-url-list": "https://nova.arbitrum.io/das-servers",
      "sync-to-storage": {
        "check-already-exists": false,
        "delay-on-error": "10ms",
        "eager": true,
        "eager-lower-bound-block": "15025611",
        "ignore-write-errors": false,
        "parent-chain-blocks-per-read": "100",
        "state-dir": "your state dir",
        "sync-expired-data": true,
        "retention-period": "11h0m"
      }
    },
    "sequencer-inbox-address": "0x211e1c4c7f1bf5351ac850ed10fd68cffcf6c21b"
  },
  "enable-rest": true,
  "enable-rpc": false,
  "log-level": "INFO",
  "log-type": "plaintext",
  "rest-addr": "localhost",
  "rest-port": "9877",
  "rpc-addr": "localhost",
  "rpc-port": "9876"
}
INFO [06-12|13:11:03.669] Local file store legacy layout migration complete migratedFiles=22280 skippedExpiredFiles=51908 removedFiles=74188 duration=2.4990
81602s
INFO [06-12|13:11:03.717] local file store pruned expired batches  count=10 pruneTil=2024-06-12T13:11:03-0700 duration=47.157458ms
INFO [06-12|13:11:03.982] REST Aggregator URLs                     urls="[https://arbitrum-anytrust-das.infura.io/mainnet/das-mirror https://das-mirror.nova-mainnet.quiknode.pro https://nova.arbitrum.io/west-das-mirror https://rpc.das.arbitrum.gcda.xyz https://arbitrum-das.seadn.io https://nova.arbitrum.io/west-das-data https://daserver-mainnet.arbitrum.p2p.org]"
INFO [06-12|13:11:03.982] Couldn't open sync state file, using default sync start block number err="open <datadir>/metadata/nextBlockNumberV2: no such file or directory" path=<datadir>/metadata/nextBlockNumberV2 default=15,025,611
INFO [06-12|13:11:03.982] Starting REST server                     addr=localhost port=9877 revision=7596da6-modified vcs.time=2024-06-12T18:00:18Z
INFO [06-12|13:16:03.860] local file store pruned expired batches  count=1851 pruneTil=2024-06-12T13:16:03-0700 duration=140.64027ms
INFO [06-12|13:21:04.018] local file store pruned expired batches  count=2504 pruneTil=2024-06-12T13:21:03-0700 duration=156.807305ms
INFO [06-12|13:26:04.188] local file store pruned expired batches  count=2495 pruneTil=2024-06-12T13:26:04-0700 duration=167.008812ms
INFO [06-12|13:31:04.350] local file store pruned expired batches  count=2527 pruneTil=2024-06-12T13:31:04-0700 duration=157.993311ms
INFO [06-12|13:36:04.505] local file store pruned expired batches  count=2510 pruneTil=2024-06-12T13:36:04-0700 duration=153.79782ms
INFO [06-12|13:41:04.653] local file store pruned expired batches  count=2454 pruneTil=2024-06-12T13:41:04-0700 duration=146.262306ms
INFO [06-12|13:46:04.799] local file store pruned expired batches  count=2473 pruneTil=2024-06-12T13:46:04-0700 duration=143.844132ms

Make it so that the l1SyncService can sync expired data and set also set
sensible retention.
Also fix unsetting the migrating flag after migration is done.
We can have batches with identical data, so the trieLayout needs to be
able to handle being requested to store the same batch multiple times
with different expiry times.

The way it works now is that the leaf files in by-expiry-timestamp are
hard links to the batch files in by-data-hash. The link count is used to
know when there are no more links in by-expiry-timestamp pointing to the
batch file and that it can be deleted.
@cla-bot cla-bot bot added the s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA. label Jun 12, 2024
@Tristan-Wilson Tristan-Wilson requested a review from kasey June 13, 2024 04:58
@Tristan-Wilson Tristan-Wilson marked this pull request as ready for review June 13, 2024 05:53
@Tristan-Wilson Tristan-Wilson requested a review from magicxyyz June 13, 2024 19:22
Base automatically changed from das-fix-l1SyncService to master June 14, 2024 21:43
@Tristan-Wilson Tristan-Wilson changed the title DAS filestore trie layout migrator and expiry [config change] DAS filestore trie layout migrator and expiry Jun 26, 2024
Copy link
Contributor

@ganeshvanahalli ganeshvanahalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (Nice work!)

@Tristan-Wilson Tristan-Wilson changed the title [config change] DAS filestore trie layout migrator and expiry [config change] DAS filestore trie layout migrator and expiry, fixes: NIT-2538 Jun 26, 2024
@Tristan-Wilson Tristan-Wilson changed the title [config change] DAS filestore trie layout migrator and expiry, fixes: NIT-2538 [config change] DAS filestore trie layout migrator and expiry, fixes: NIT-2538, NIT-2537 Jun 26, 2024
@joshuacolvin0 joshuacolvin0 enabled auto-merge July 9, 2024 22:14
@joshuacolvin0 joshuacolvin0 merged commit e23af60 into master Jul 9, 2024
11 checks passed
@joshuacolvin0 joshuacolvin0 deleted the das-filestore branch July 9, 2024 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design-approved s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants