feat(sequencer-relayer)!: submit blobs directly to celestia app #963

Fraser999 · 2024-04-16T21:24:32Z

Summary

The sequencer-relayer has been updated to submit blobs directly to the Celestia app via its gRPC interface rather than using the Celestia node as a proxy.

Background

There have been issues experienced in the past when using the Celestia node for blob submissions related to sequences and timeouts. We want to avoid such issues by "cutting out the middle man" and controlling the flow of submissions directly inside the relayer.

Changes

This PR adds a lot of new proto files and their corresponding generated Rust sources, although they have been reduced to the minimal set of messages/rpcs required for this new feature.

Aside from these, most of the new code has been isolated to a new module celestia_client inside the relayer module. It should be reasonably self-contained, with a view to possibly extracting it to a separate crate in the future, and/or providing a generic client API should we wish to add handling for more chains in the future.

At a high level, the new client is constructed via a builder, which may need multiple attempts to create the client in the case that the Celestia app is unreachable. Retries are controlled via tryhard, maxing out at 30s and with infinite attempts. A shutdown signal can cancel this, allowing for graceful exit. The builder retrieves initial info from the Celestia app (e.g. chain ID, gas costs). The immutable chain ID is held as a member of the client, but the other values (being variable) are cached in the State struct and can be retrieved and cached again should an attempt to submit blobs fail.

The CelestiaClient only has one public method: try_submit. This is called in the existing relayer::write::submit_with_retry function in much the same way as previously. The main difference here is that the client's try_submit method is also passed a tokio watch receiver containing the optional last error. When first submitting a given batch of blobs, the watch channel holds None. On each failed attempt, the on_retry closure sends the error returned by try_submit. On a fresh batch of blobs, a new watch channel is constructed, again initialized with None. The purpose of providing the last error to the client is that it can be interrogated for a failure due to the fee being too low. Such an error holds information on the required fee, and on reattempt, the client uses this value in preference to its own calculated fee.

The general flow inside try_submit is as follows:

if the previous submit attempt failed, fetch the cost info from the Celestia app and cache it
fetch our account number and sequence number from the Celestia app
create, sign and send the blob transaction via the Celestia BroadcastTx method
poll the Celestia app once per second via the GetTx method

Testing

Most of the new functionality has unit tests. The CelestiaClient mainly has member methods calling gRPC requests, however handling the responses is delegated to free functions. As well as code clarity, this allows for comprehensive unit tests of these functions without the need for running/mocking a Celestia server. However, there are still missing tests for the try_submit method. That might be best done via the new astria-grpc-mock crate, or at a higher level by refactoring the existing black box tests (almost all of which have been removed for now).

I have run the sequencer against a local network running on k8s and observed the blob submission to behave as expected via the logging. This includes temporarily changing the fee to a low value for the first submission attempt, which caused the retry behaviour to identify and use the correct fee from the error response of the first attempt. I have also had the smoke test pass locally.

Breaking Changelist

ASTRIA_SEQUENCER_RELAYER_CELESTIA_BEARER_TOKEN has been replaced by ASTRIA_SEQUENCER_RELAYER_CELESTIA_APP_KEY_FILE which expects a path to a hex-encoded secp256k1 secret key corresponding to an account on the Celestia chain.
ASTRIA_SEQUENCER_RELAYER_CELESTIA_ENDPOINT has been replaced by ASTRIA_SEQUENCER_RELAYER_CELESTIA_APP_GRPC_ENDPOINT which represents the endpoint of the gRPC server on the Celestia app.

Further Discussion

integration testing/black box testing. [Discussed, agreed to restore them in follow-up. Ticketed at Restore black box tests removed from sequencer-relayer #1008.]
the polling period for GetTx after a successful BroadcastTx is hard-coded to one second. Seems like this should be configurable via an env var. [Discussed, and the period is now variable depending upon the result of GetTx. Also, given that the Celestia app will likely be managed by us, we can control its rate-limits, meaning there's less likelihood of needing to change the polling period in the relayer.]

Related Issues

Closes #921.

crates/astria-sequencer-relayer/src/relayer/celestia_client/builder.rs

Fraser999 · 2024-04-16T21:41:21Z

crates/astria-sequencer-relayer/src/relayer/celestia_client/error.rs

+impl Display for GrpcResponseError {
+    fn fmt(&self, formatter: &mut Formatter<'_>) -> fmt::Result {
+        write!(
+            formatter,
+            "status: {}, message: {}, metadata: {:?}",
+            self.0.code(),
+            self.0.message(),
+            self.0.metadata(),
+        )
+    }
+}
+
+impl From<Status> for GrpcResponseError {
+    fn from(status: Status) -> Self {
+        Self(status)
+    }
+}
+
+impl std::error::Error for GrpcResponseError {
+    fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
+        self.0.source()
+    }
+}


This is almost equivalent to just decorating the GrpcResponseError definition with #[error(transparent)]. However, the Display impl for Status delegates to the Debug impls of its member fields including the details: Bytes field and is overly verbose. Hence I just hand-rolled these trait impls.

crates/astria-sequencer-relayer/src/relayer/celestia_client/mod.rs

SuperFluffy

This is great work, thank you for the PR.

I only have relatively minor comments, none of which I consider stopping this PR from progressing.

I have 2 big concerns, both of which I consider outside the scope of this PR:

My main concern is with confirm_submission never returning with a successful result and spinning indefinitely. At the moment it will simply block indefinitely without resource. How do we decide if we should finally abort the submission and try again? Does Celestia ever report "didn't include your blobs and never will, try again"?
The fee calculation is risky in that it could return very strange results without us understanding what's happening. This could break silently with Celestia updating their logs in such a way that the parsing logic still returns a number, but one that makes no sense.

crates/astria-sequencer-relayer/src/config.rs

crates/astria-sequencer-relayer/src/relayer/celestia_client/builder.rs

SuperFluffy · 2024-04-25T15:18:10Z

crates/astria-sequencer-relayer/src/relayer/celestia_client/builder.rs

+/// An error while encoding a Bech32 string.
+#[derive(Error, Clone, Debug)]
+#[error(transparent)]
+pub(in crate::relayer) struct Bech32EncodeError(#[from] bech32::EncodeError);


What's the advantage of wrapping this as a transparent instead of directly using bech32::EncodeError (for example, so that BuilderError::EncodeAddress { address: AccountId, source: bech32::EncodeError }?

It's to avoid having a third party type in our public Rust API. That would make our API vulnerable to version changes in the dependency. I approached the celestia_client module as if it were a standalone crate, since we spoke about maybe moving in that direction in the future.

crates/astria-sequencer-relayer/src/relayer/celestia_client/mod.rs

SuperFluffy · 2024-04-25T16:58:13Z

crates/astria-sequencer-relayer/src/relayer/celestia_client/mod.rs

+        // How long to wait after starting `confirm_submission` before starting to log errors.
+        const START_LOGGING_DELAY: Duration = Duration::from_secs(15);
+        // The minimum duration between logging errors.
+        const LOG_ERROR_INTERVAL: Duration = Duration::from_secs(5);


Just a note: to some extent this requires self.get_tx to return in time. If it never returns we will never get any logs (however, this is an issue with all of our RPCs).

Yup, agreed, but outside the scope of this PR I believe.

joroshiba

approval on behalf of infra / api, have not vetted rust changes.

## Summary A new configuration env var has been added to allow filtering data submission to a subset of rollups. ## Background This allows more flexibility in what is actually submitted to the Celestia chain. ## Changes A new env var was added, mapped to a new field `Config::rollup_id_filter`. This represents a list of rollups whose transactions should be included in blobs submitted to Celestia. Rollups not in the list do not have their data submitted. However, if the list is empty, no filtering is applied; all rollups have their data submitted. The collection of filtered rollup IDs is passed down to the `write::conversion::convert` function where sequencer blocks are converted to blobs. The blobs are filtered inside this function, partly since this is relatively early in the flow, but also so that the `ConversionInfo` struct can carry information about blobs being excluded. This data is logged at the start of `write::submit_blobs` in an info-level log message. It might be worthwhile adding metrics around the number of excluded rollups per submission, but I didn't see a clear benefit to that and it can be added in a follow-up PR if required. ## Testing - Added unit tests covering parsing of the config string to a collection of `RollupId`s. - ~Added a black box test where a sequencer block is constructed with several transactions, half from included rollups and half from excluded ones. Existing black box tests ensure that when the filter is empty, no filtering is done.~ These all got removed in #963, and should be restored as part of #1008. ## Breaking Changelist - Added `ASTRIA_SEQUENCER_RELAYER_ONLY_INCLUDE_ROLLUPS` which can be empty or can be a comma-separated list.

@SuperFluffy

## Summary The black box tests were removed in a recent PR. This PR reinstates them using the new mock gRPC framework `astria-grpc-mock`. ## Background The tests would have needed heavy refactoring in #963, and we wanted to avoid adding a lot more code to that already-large PR. We decided to temporarily delete them and reinstate them using `astria-grpc-mock` to mock responses from the Celestia app. ## Changes The tests removed in #963 and a single test briefly added in #1001 then removed again have been restored. I also added a new test to check the relayer shuts down in a timely manner. Previously the tests leaned heavily on counts of blobs received by the mock Celestia app. The current tests retain these checks, but also query the `/status` endpoint of the sequencer-relayer to confirm its state. There was also a single test previously which checked the state of the postsubmit.json file. I didn't think this was necessary given that we're querying the state in a more "black box" way now (via the http server), but I didn't remove the function to perform this check (`TestSequencerRelayer::assert_state_files_are_as_expected`) pending a decision on whether to reinstate that check or not inside the `later_height_in_state_leads_to_expected_relay` test. As @SuperFluffy predicted, all the new protobuf packages added in #963 had to be added to the `pbjson_build` builder to generate the serde impls required for using them in `astria-grpc-mock`. This accounts for the vast majority of the new LoC in this PR. I made a few small changes to the mock framework: - added `Mock::up_to_n_times` to avoid failures due to a single-use mock response being sent multiple times - added `DynamicResponse` to support constructing mock responses which can be passed the relevant gRPC request - changed `MockGuard::wait_until_satisfied` to take `self` by ref rather than value, since if it consumes self and `wait_until_satisfied` times out, the `MockGuard` panics in its `Drop` impl, meaning we don't get a good indication from e.g. a `tokio::time::timeout` as to what went wrong Most of the new functionality lives in `MockCelestiaAppServer` and `TestSequencerRelayer`. For the former, all gRPCs except for `BroadcastTx` and `GetTx` are set up to always return valid-enough responses - i.e. these don't need to be mounted individually in every test case. In the case of `MockCelestiaAppServer`, the new functionality is a combination of support for mounting mock responses and functions with timeouts to query the `/status`, `/healthz` and `/readyz` http endpoints of the sequencer-relayer. ## Testing These changes are tests. Closes #1008.

github-actions bot added conductor pertaining to the astria-conductor crate proto pertaining to the Astria Protobuf spec sequencer-relayer pertaining to the astria-sequencer-relayer crate cd labels Apr 16, 2024

Fraser999 had a problem deploying to BUF April 16, 2024 21:24 — with GitHub Actions Failure

Fraser999 had a problem deploying to BUF April 16, 2024 21:26 — with GitHub Actions Failure

Fraser999 commented Apr 16, 2024

View reviewed changes

crates/astria-sequencer-relayer/src/relayer/celestia_client/builder.rs Outdated Show resolved Hide resolved

Fraser999 commented Apr 16, 2024

View reviewed changes

Fraser999 force-pushed the 921-write-to-celestia-app branch from e315a03 to 42d3bea Compare April 16, 2024 22:13

Fraser999 had a problem deploying to BUF April 16, 2024 22:13 — with GitHub Actions Failure

Fraser999 had a problem deploying to BUF April 17, 2024 22:22 — with GitHub Actions Failure

Fraser999 force-pushed the 921-write-to-celestia-app branch from 8a09c2b to c00ed38 Compare April 18, 2024 12:15

Fraser999 had a problem deploying to BUF April 18, 2024 12:16 — with GitHub Actions Failure

Fraser999 added 4 commits April 18, 2024 23:59

submit blobs directly to celestia app

22ffdf0

update charts

d92b3a0

update charts (wip) and add buf.lock files

42fbba2

further chart changes

79cee2c

Fraser999 force-pushed the 921-write-to-celestia-app branch from aa3b6ae to 07ab634 Compare April 18, 2024 23:12

Fraser999 had a problem deploying to BUF April 18, 2024 23:13 — with GitHub Actions Failure

Fraser999 force-pushed the 921-write-to-celestia-app branch from 07ab634 to 848972e Compare April 18, 2024 23:22

Fraser999 had a problem deploying to BUF April 18, 2024 23:22 — with GitHub Actions Failure

Fraser999 force-pushed the 921-write-to-celestia-app branch from 848972e to 56858c6 Compare April 18, 2024 23:23

Fraser999 had a problem deploying to BUF April 18, 2024 23:24 — with GitHub Actions Failure

more chart updates

6dda430

Fraser999 force-pushed the 921-write-to-celestia-app branch from 56858c6 to 6dda430 Compare April 18, 2024 23:24

Fraser999 had a problem deploying to BUF April 18, 2024 23:24 — with GitHub Actions Failure

Fraser999 marked this pull request as ready for review April 18, 2024 23:25

Fraser999 requested review from a team as code owners April 18, 2024 23:25

circumvent tracing debug linter for trace-level logs missed on last pass

d87df0e

Fraser999 had a problem deploying to BUF April 25, 2024 01:19 — with GitHub Actions Failure

format proto files

4857cf8

Fraser999 had a problem deploying to BUF April 25, 2024 01:22 — with GitHub Actions Failure

Fraser999 mentioned this pull request Apr 25, 2024

Restore black box tests removed from sequencer-relayer #1008

Closed

SuperFluffy reviewed Apr 25, 2024

View reviewed changes

crates/astria-sequencer-relayer/src/relayer/celestia_client/mod.rs Outdated Show resolved Hide resolved

Fraser999 added 2 commits April 25, 2024 14:19

remove relayer black box tests

0f13de6

add comments around trace-level logs

97d0f96

Fraser999 had a problem deploying to BUF April 25, 2024 13:22 — with GitHub Actions Failure

Fraser999 mentioned this pull request Apr 25, 2024

Remove validator key from sequencer-relayer #1010

Closed

avoid unchecked casting of block height to u64

9fca78c

Fraser999 had a problem deploying to BUF April 25, 2024 14:03 — with GitHub Actions Failure

SuperFluffy approved these changes Apr 25, 2024

View reviewed changes

bump relayer chart version and exclude vendored protos from CI trigger

0919982

github-actions bot added the ci issues that are related to ci and github workflows label Apr 25, 2024

Fraser999 had a problem deploying to BUF April 25, 2024 19:17 — with GitHub Actions Failure

Fraser999 added 3 commits April 25, 2024 20:26

update chart lockfile

5a23be6

update run_checker

eb27260

update sequencer chart version

92fb0bb

joroshiba approved these changes Apr 25, 2024

View reviewed changes

several cleanups/nitpicks

d4f9895

Fraser999 added this pull request to the merge queue Apr 25, 2024

Merged via the queue into astriaorg:main with commit 65a22ce Apr 25, 2024
36 checks passed

Fraser999 deleted the 921-write-to-celestia-app branch April 25, 2024 22:20

This was referenced Apr 25, 2024

Consider how to regression test fee extraction from raw log message #1013

Open

feat(sequencer-relayer)!: provide filter for rollups #1001

Merged

Fraser999 mentioned this pull request May 1, 2024

test(sequencer-relayer): reinstate black box tests #1033

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sequencer-relayer)!: submit blobs directly to celestia app #963

feat(sequencer-relayer)!: submit blobs directly to celestia app #963

Fraser999 commented Apr 16, 2024 •

edited

Loading

Fraser999 Apr 16, 2024

SuperFluffy left a comment

SuperFluffy Apr 25, 2024

Fraser999 Apr 25, 2024

SuperFluffy Apr 25, 2024

Fraser999 Apr 25, 2024

joroshiba left a comment

feat(sequencer-relayer)!: submit blobs directly to celestia app #963

feat(sequencer-relayer)!: submit blobs directly to celestia app #963

Conversation

Fraser999 commented Apr 16, 2024 • edited Loading

Summary

Background

Changes

Testing

Breaking Changelist

Further Discussion

Related Issues

Fraser999 Apr 16, 2024

Choose a reason for hiding this comment

SuperFluffy left a comment

Choose a reason for hiding this comment

SuperFluffy Apr 25, 2024

Choose a reason for hiding this comment

Fraser999 Apr 25, 2024

Choose a reason for hiding this comment

SuperFluffy Apr 25, 2024

Choose a reason for hiding this comment

Fraser999 Apr 25, 2024

Choose a reason for hiding this comment

joroshiba left a comment

Choose a reason for hiding this comment

Fraser999 commented Apr 16, 2024 •

edited

Loading