Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Hard fork data migration #14288

Merged
merged 14 commits into from
Nov 2, 2023
Merged

Conversation

psteckler
Copy link
Member

@psteckler psteckler commented Oct 6, 2023

RFC on getting data from the mainnet archive db to a db using the berkeley schema.

The branch feature/berkeley-db-migrator is used in #12906.

The branch feature/add-berkeley-account-tables is used in #14339.

Part of #12676.

@psteckler
Copy link
Member Author

!ci-build-me

@@ -0,0 +1,141 @@
p## Summary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to delete the p here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops

@psteckler
Copy link
Member Author

!ci-build-me

These applications can be run in sequence to get a fully-migrated
database. They should be able to work incrementally, so that part of
the mainnet database can be migrated and, as new blocks are added on
mainnet, the new data in the databannnnnse can be migrated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, again.

@kaozenn
Copy link
Member

kaozenn commented Oct 10, 2023

  • After the HF (Hard Fork), will the "bucket for storing migrated database dumps" become the new reference bucket for retrieving Archive DB Dumps?

  • Post HF, could we designate an Archive Dump as a reference for data retrieval? Applications such as Rosetta would not only need to be refactored to read from the new schema but also to pull data from the new S3 Bucket.

@psteckler
Copy link
Member Author

After the HF (Hard Fork), will the "bucket for storing migrated database dumps" become the new reference bucket for retrieving Archive DB Dumps?

I would expect there will be a archive dump cron job for the new mainnet, which will write to the existing mina-archive-dumps bucket (or maybe some new one). It won't be the bucket described here.

Post HF, could we designate an Archive Dump as a reference for data retrieval? Applications such as Rosetta would not only need to be refactored to read from the new schema but also to pull data from the new S3 Bucket.

The current mainnet archive dumps are named mainnet-archive-dump-<DATE>_nnnn.sql. That naming convention could continue, or a new name could be chosen. As mentioned, the same bucket could continue to be used, or a new one created, if desired.

The current bucket is in Google Cloud Storage, not S3, which is an Amazon product.

Copy link
Member

@kantp kantp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.


How do we limit the migration to the final block of mainnet? There could be
flags to the migration apps to stop at a given state hash or height.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest ending migration at slot_tx_end in RFC 51, #14138. We'll only have empty blocks for fork resolution after that anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, in the first-phase migration app, in #12906, I've add a --end-global-slot command-line arg, and tested that. The second-phase app doesn't need to worry about the end slot, because it can only process what the first-phase app has produced.

If you omit that arg, the app will migrate only canonical blocks. My understanding is that there may be some pending blocks to be migration, so if you do provide that arg, the app will migrate both pending and canonical (but not orphaned) blocks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the unresolved question, because I think this command-line arg solves the problem.

@deepthiskumar
Copy link
Member

!ci-build-me

@deepthiskumar
Copy link
Member

!approved-for-mainnet

@psteckler psteckler merged commit 2d815c9 into berkeley Nov 2, 2023
1 check passed
@psteckler psteckler deleted the rfc/hard-fork-data-migration branch November 2, 2023 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants