Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ethereum block delay too low #265

Open
hannydevelop opened this issue Oct 21, 2021 · 1 comment
Open

Ethereum block delay too low #265

hannydevelop opened this issue Oct 21, 2021 · 1 comment

Comments

@hannydevelop
Copy link
Contributor

hannydevelop commented Oct 21, 2021

Original Isuue

Surfaced from @informalsystems audit of Althea Gravity Bridge at commit 19a4cfe

severity: High
type: Protocol / Implementation bug
difficulty: Unknown

Involved artifacts

Description

In Orchestrator's Ethereum event watcher, function get_block_delay() is used to determine the block processing delay wrt. the latest block observed on Ethereum:

/// Let's make a conservative assumption of 1% chance of an uncle being a two block deep reorg
/// (actual is closer to 0.3%) and assume that continues as we increase the depth.
/// Given an uncle every 2.8 minutes, a 6 deep reorg would be 2.8 minutes * (100^4) or one
/// 6 deep reorg every 53,272 years.
///
pub async fn get_block_delay(web3: &Web3) -Uint256 {
    let net_version = get_net_version_with_retry(web3).await;

    match net_version {
        // Mainline Ethereum, Ethereum classic, or the Ropsten, Kotti, Mordor testnets
        // all POW Chains
        1 | 3 | 6 | 7 =6u8.into(),
        // Dev, our own Gravity Ethereum testnet, and Hardhat respectively
        // all single signer chains with no chance of any reorgs
        2018 | 15 | 31337 =0u8.into(),
        // Rinkeby and Goerli use Clique (POA) Consensus, finality takes
        // up to num validators blocks. Number is higher than Ethereum based
        // on experience with operational issues
        4 | 5 =10u8.into(),
        // assume the safe option (POW) where we don't know
        _ =6u8.into(),
    }
}

It is employed in the function check_for_events() uniformly in the following way (here shown only for deposits):

    let latest_block = get_block_number_with_retry(web3).await;
    let latest_block = latest_block - get_block_delay(web3).await;
    let deposits = web3.check_for_events(
            starting_block.clone(), Some(latest_block.clone()), ...
        ).await;    

As can be seen, the block delay is subtracted from the latest block number, and the events are then retrieved up to the adjusted block number, and processed.

There are three problems with the above approach.

The Ethereum block delay is too low

The above calculation of probabilities is not well-justified: probability theory simply doesn't apply here as Ethereum reorganizations are not random events. The correct estimation of a block delay is a risk assessment, and is a very complicated process. We are not at a position to advice on the correct block delay number, but it makes sense to take a glimpse on what Ethereum block delays are used by crypto exchanges:

As can be seen, the minimal block delay used by crypto exchanges is 12, and the average around 22, which is much larger than 6, used by Gravity Bridge.

No detection / mitigation plan in case of a long reorg

No block delay can guarantee that a reorg longer than that won't happen eventually, as DAO / Ethereum Classic case demonstrates well. While nothing can protect users against long reorgs, those events should be detected and mitigated. On the contrary, Gravity Bridge doesn't include any detection or mitigation mechanism against such events, and will proceed further, possibly causing more damage than necessary.

No testing against reorgs

As can be seen in the above code, testnets do not exercise a case of reorgs at all, even up to the given bound: block delay for testnets is taken to be 0. This doesn't provide any confidence, even a minimal one, that the code will behave correctly when deployed on the Ethereum main net.

Problem Scenarios

  • A reorg happens on the Ethereum main net with the length larger than it is currently assumed by Gravity Bridge (6)

  • This creates a possibility of double spends, or other undesirable consequences;

  • Without a detection / mitigation plan, Gravity Bridge will proceed as if nothing has happened, possibly causing more damage.

  • The testnets use 0 block delay

  • It is not clear how the code will behave even in presence of reorgs up to the currently present block delay.

Recommendation

Short term

  • Undertake risk assessment for the Gravity Bridge. At the very least, orient to the Ethereum block delays used by crypto exchanges, and increase the value used by Gravity Bridge to the average block delay employed there (~ 20 blocks).

  • In the testnets, undertake testing against Ethereum reorgs both up to, and above the assumed block delay. Make sure that up to the assumed block delay the code behaves correctly, and thoroughly understand possible effects of long reorgs.

Long term

  • Implement detection / mitigation mechanisms in case of an Ethereum reorg longer than the assumed block delay happens. The detection mechanism should make sure, at the very least, that this event is detected, and communicated to validators. The mitigation mechanism should probably just stop the Gravity Bridge operation; and it should also implement the method to resume the operation after the damage assessment and repair (e.g. in the form of a special message signed by the quorum of validators).
@EricBolten
Copy link
Contributor

This was updated in #325.

@EricBolten EricBolten self-assigned this Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants