-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sequencer-relayer)!: submit blobs directly to celestia app #963
feat(sequencer-relayer)!: submit blobs directly to celestia app #963
Conversation
crates/astria-sequencer-relayer/src/relayer/celestia_client/builder.rs
Outdated
Show resolved
Hide resolved
impl Display for GrpcResponseError { | ||
fn fmt(&self, formatter: &mut Formatter<'_>) -> fmt::Result { | ||
write!( | ||
formatter, | ||
"status: {}, message: {}, metadata: {:?}", | ||
self.0.code(), | ||
self.0.message(), | ||
self.0.metadata(), | ||
) | ||
} | ||
} | ||
|
||
impl From<Status> for GrpcResponseError { | ||
fn from(status: Status) -> Self { | ||
Self(status) | ||
} | ||
} | ||
|
||
impl std::error::Error for GrpcResponseError { | ||
fn source(&self) -> Option<&(dyn std::error::Error + 'static)> { | ||
self.0.source() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost equivalent to just decorating the GrpcResponseError
definition with #[error(transparent)]
. However, the Display
impl for Status
delegates to the Debug
impls of its member fields including the details: Bytes
field and is overly verbose. Hence I just hand-rolled these trait impls.
e315a03
to
42d3bea
Compare
8a09c2b
to
c00ed38
Compare
aa3b6ae
to
07ab634
Compare
07ab634
to
848972e
Compare
848972e
to
56858c6
Compare
56858c6
to
6dda430
Compare
crates/astria-sequencer-relayer/src/relayer/celestia_client/mod.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work, thank you for the PR.
I only have relatively minor comments, none of which I consider stopping this PR from progressing.
I have 2 big concerns, both of which I consider outside the scope of this PR:
- My main concern is with
confirm_submission
never returning with a successful result and spinning indefinitely. At the moment it will simply block indefinitely without resource. How do we decide if we should finally abort the submission and try again? Does Celestia ever report "didn't include your blobs and never will, try again"? - The fee calculation is risky in that it could return very strange results without us understanding what's happening. This could break silently with Celestia updating their logs in such a way that the parsing logic still returns a number, but one that makes no sense.
crates/astria-sequencer-relayer/src/relayer/celestia_client/builder.rs
Outdated
Show resolved
Hide resolved
crates/astria-sequencer-relayer/src/relayer/celestia_client/builder.rs
Outdated
Show resolved
Hide resolved
crates/astria-sequencer-relayer/src/relayer/celestia_client/builder.rs
Outdated
Show resolved
Hide resolved
/// An error while encoding a Bech32 string. | ||
#[derive(Error, Clone, Debug)] | ||
#[error(transparent)] | ||
pub(in crate::relayer) struct Bech32EncodeError(#[from] bech32::EncodeError); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the advantage of wrapping this as a transparent instead of directly using bech32::EncodeError
(for example, so that BuilderError::EncodeAddress { address: AccountId, source: bech32::EncodeError }
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to avoid having a third party type in our public Rust API. That would make our API vulnerable to version changes in the dependency. I approached the celestia_client
module as if it were a standalone crate, since we spoke about maybe moving in that direction in the future.
crates/astria-sequencer-relayer/src/relayer/celestia_client/mod.rs
Outdated
Show resolved
Hide resolved
crates/astria-sequencer-relayer/src/relayer/celestia_client/mod.rs
Outdated
Show resolved
Hide resolved
crates/astria-sequencer-relayer/src/relayer/celestia_client/mod.rs
Outdated
Show resolved
Hide resolved
crates/astria-sequencer-relayer/src/relayer/celestia_client/mod.rs
Outdated
Show resolved
Hide resolved
// How long to wait after starting `confirm_submission` before starting to log errors. | ||
const START_LOGGING_DELAY: Duration = Duration::from_secs(15); | ||
// The minimum duration between logging errors. | ||
const LOG_ERROR_INTERVAL: Duration = Duration::from_secs(5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: to some extent this requires self.get_tx
to return in time. If it never returns we will never get any logs (however, this is an issue with all of our RPCs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, agreed, but outside the scope of this PR I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approval on behalf of infra / api, have not vetted rust changes.
## Summary A new configuration env var has been added to allow filtering data submission to a subset of rollups. ## Background This allows more flexibility in what is actually submitted to the Celestia chain. ## Changes A new env var was added, mapped to a new field `Config::rollup_id_filter`. This represents a list of rollups whose transactions should be included in blobs submitted to Celestia. Rollups not in the list do not have their data submitted. However, if the list is empty, no filtering is applied; all rollups have their data submitted. The collection of filtered rollup IDs is passed down to the `write::conversion::convert` function where sequencer blocks are converted to blobs. The blobs are filtered inside this function, partly since this is relatively early in the flow, but also so that the `ConversionInfo` struct can carry information about blobs being excluded. This data is logged at the start of `write::submit_blobs` in an info-level log message. It might be worthwhile adding metrics around the number of excluded rollups per submission, but I didn't see a clear benefit to that and it can be added in a follow-up PR if required. ## Testing - Added unit tests covering parsing of the config string to a collection of `RollupId`s. - ~Added a black box test where a sequencer block is constructed with several transactions, half from included rollups and half from excluded ones. Existing black box tests ensure that when the filter is empty, no filtering is done.~ These all got removed in #963, and should be restored as part of #1008. ## Breaking Changelist - Added `ASTRIA_SEQUENCER_RELAYER_ONLY_INCLUDE_ROLLUPS` which can be empty or can be a comma-separated list.
## Summary The black box tests were removed in a recent PR. This PR reinstates them using the new mock gRPC framework `astria-grpc-mock`. ## Background The tests would have needed heavy refactoring in #963, and we wanted to avoid adding a lot more code to that already-large PR. We decided to temporarily delete them and reinstate them using `astria-grpc-mock` to mock responses from the Celestia app. ## Changes The tests removed in #963 and a single test briefly added in #1001 then removed again have been restored. I also added a new test to check the relayer shuts down in a timely manner. Previously the tests leaned heavily on counts of blobs received by the mock Celestia app. The current tests retain these checks, but also query the `/status` endpoint of the sequencer-relayer to confirm its state. There was also a single test previously which checked the state of the postsubmit.json file. I didn't think this was necessary given that we're querying the state in a more "black box" way now (via the http server), but I didn't remove the function to perform this check (`TestSequencerRelayer::assert_state_files_are_as_expected`) pending a decision on whether to reinstate that check or not inside the `later_height_in_state_leads_to_expected_relay` test. As @SuperFluffy predicted, all the new protobuf packages added in #963 had to be added to the `pbjson_build` builder to generate the serde impls required for using them in `astria-grpc-mock`. This accounts for the vast majority of the new LoC in this PR. I made a few small changes to the mock framework: - added `Mock::up_to_n_times` to avoid failures due to a single-use mock response being sent multiple times - added `DynamicResponse` to support constructing mock responses which can be passed the relevant gRPC request - changed `MockGuard::wait_until_satisfied` to take `self` by ref rather than value, since if it consumes self and `wait_until_satisfied` times out, the `MockGuard` panics in its `Drop` impl, meaning we don't get a good indication from e.g. a `tokio::time::timeout` as to what went wrong Most of the new functionality lives in `MockCelestiaAppServer` and `TestSequencerRelayer`. For the former, all gRPCs except for `BroadcastTx` and `GetTx` are set up to always return valid-enough responses - i.e. these don't need to be mounted individually in every test case. In the case of `MockCelestiaAppServer`, the new functionality is a combination of support for mounting mock responses and functions with timeouts to query the `/status`, `/healthz` and `/readyz` http endpoints of the sequencer-relayer. ## Testing These changes are tests. Closes #1008.
Summary
The sequencer-relayer has been updated to submit blobs directly to the Celestia app via its gRPC interface rather than using the Celestia node as a proxy.
Background
There have been issues experienced in the past when using the Celestia node for blob submissions related to sequences and timeouts. We want to avoid such issues by "cutting out the middle man" and controlling the flow of submissions directly inside the relayer.
Changes
This PR adds a lot of new proto files and their corresponding generated Rust sources, although they have been reduced to the minimal set of messages/rpcs required for this new feature.
Aside from these, most of the new code has been isolated to a new module
celestia_client
inside therelayer
module. It should be reasonably self-contained, with a view to possibly extracting it to a separate crate in the future, and/or providing a generic client API should we wish to add handling for more chains in the future.At a high level, the new client is constructed via a builder, which may need multiple attempts to create the client in the case that the Celestia app is unreachable. Retries are controlled via
tryhard
, maxing out at 30s and with infinite attempts. A shutdown signal can cancel this, allowing for graceful exit. The builder retrieves initial info from the Celestia app (e.g. chain ID, gas costs). The immutable chain ID is held as a member of the client, but the other values (being variable) are cached in theState
struct and can be retrieved and cached again should an attempt to submit blobs fail.The
CelestiaClient
only has one public method:try_submit
. This is called in the existingrelayer::write::submit_with_retry
function in much the same way as previously. The main difference here is that the client'stry_submit
method is also passed a tokio watch receiver containing the optional last error. When first submitting a given batch of blobs, the watch channel holdsNone
. On each failed attempt, theon_retry
closure sends the error returned bytry_submit
. On a fresh batch of blobs, a new watch channel is constructed, again initialized withNone
. The purpose of providing the last error to the client is that it can be interrogated for a failure due to the fee being too low. Such an error holds information on the required fee, and on reattempt, the client uses this value in preference to its own calculated fee.The general flow inside
try_submit
is as follows:BroadcastTx
methodGetTx
methodTesting
Most of the new functionality has unit tests. The
CelestiaClient
mainly has member methods calling gRPC requests, however handling the responses is delegated to free functions. As well as code clarity, this allows for comprehensive unit tests of these functions without the need for running/mocking a Celestia server. However, there are still missing tests for thetry_submit
method. That might be best done via the newastria-grpc-mock
crate, or at a higher level by refactoring the existing black box tests (almost all of which have been removed for now).I have run the sequencer against a local network running on k8s and observed the blob submission to behave as expected via the logging. This includes temporarily changing the fee to a low value for the first submission attempt, which caused the retry behaviour to identify and use the correct fee from the error response of the first attempt. I have also had the smoke test pass locally.
Breaking Changelist
ASTRIA_SEQUENCER_RELAYER_CELESTIA_BEARER_TOKEN
has been replaced byASTRIA_SEQUENCER_RELAYER_CELESTIA_APP_KEY_FILE
which expects a path to a hex-encoded secp256k1 secret key corresponding to an account on the Celestia chain.ASTRIA_SEQUENCER_RELAYER_CELESTIA_ENDPOINT
has been replaced byASTRIA_SEQUENCER_RELAYER_CELESTIA_APP_GRPC_ENDPOINT
which represents the endpoint of the gRPC server on the Celestia app.Further Discussion
GetTx
after a successfulBroadcastTx
is hard-coded to one second. Seems like this should be configurable via an env var. [Discussed, and the period is now variable depending upon the result ofGetTx
. Also, given that the Celestia app will likely be managed by us, we can control its rate-limits, meaning there's less likelihood of needing to change the polling period in the relayer.]Related Issues
Closes #921.