docs: Update readme on Chonky spec. (#227)

matter-labs · Dec 19, 2024 · c506d3e · c506d3e
1 parent c1b8e6e
commit c506d3e
Showing 1 changed file with 17 additions and 19 deletions.
diff --git a/spec/README.md b/spec/README.md
@@ -1,34 +1,32 @@
 # ChonkyBFT
 
-This folder contains the specification of the ChonkyBFT, a new consensus protocol created by Bruno França and Grzegorz Prusak at Matter Labs. It has both the pseudo-code specification that was used as the basis for the Rust implementation in the rest of this repo and the Quint specification that was used to formally verify the protocol.
+This folder contains the specification of the ChonkyBFT, a new consensus protocol created by Matter Labs. It has both the pseudo-code specification that was used as the basis for the Rust implementation in the rest of this repo and the Quint specification that was used to formally verify the protocol.
 Chonky BFT is a consensus protocol inspired by [FaB Paxos](https://www.cs.cornell.edu/lorenzo/papers/Martin06Fast.pdf), [Fast-HotStuff](https://arxiv.org/abs/2010.11454) and [HotStuff-2](https://eprint.iacr.org/2023/397).
 It is committee-based and has only one round of voting, single slot finality, quadratic communication and _n=5f+1_ fault tolerance. Let's discuss what were our objectives when designing ChonkyBFT.
 
 ## Design goals in practice vs. theory
 
-We find that most recent research on consensus algorithms unfortunately has become somewhat detached from the realities of running those same consensus algorithms in practice. This has led to researchers optimizing algorithms along the wrong dimensions. Many times we see tables in papers comparing different algorithms along metrics that genuinely don’t matter when those algorithms are implemented.
+In recent years, research on consensus algorithms has often focused on theoretical optimizations that may not align with the practical challenges of implementing these algorithms. This has led to researchers prioritizing metrics that are less impactful in real-world deployments. For instance, tables comparing algorithms frequently highlight metrics that, while theoretically appealing, offer limited practical significance. Below, we outline commonly emphasized metrics that may not adequately reflect real-world performance considerations.
 
-### What doesn’t matter
+- Authenticator complexity. Optimizing to have fewer signatures was a  priority decades ago when crypto operations were expensive. Today, digital signatures are fast and small. However, many papers still report this measure and even go as far as suggesting threshold signatures over multisignatures, which introduces a much more complex step of distributed key generation instead of spending some more milliseconds on verifying the signatures.
+- Message complexity. In theory, reducing the number of messages exchanged across the network should improve the algorithm's performance. However, in practice, the actual performance gain depends on where the system bottleneck lies. For instance, even with linear communication, if the leader must still send and receive _N_ messages, the improvement may be negligible. Moreover, this approach often oversimplifies the cost of messages, treating all messages equally. In reality, a block proposal can be several megabytes in size, whereas a block commit may only be a few kilobytes.
+- Block latency. Block times of, for example, 0.1 seconds are not necessarily a reliable performance indicator if finality requires waiting for 100 blocks. The key metric that matters is the time it takes for a user to see their transaction finalized. Some algorithms claim to achieve only one round of voting; however, they introduce an additional round within the block broadcast mechanism. Or pipeline the voting rounds, thus requiring several blocks to finalize. These approaches can result in higher user-perceived latency, despite achieving shorter block times.
 
-- Authenticator complexity: This is probably the worst one. Optimizing to have fewer signatures made sense decades ago when crypto operations were expensive. Today, digital signatures are fast and small. However, many papers (for example HotStuff) still report this measure and even go as far as suggesting threshold signatures over multisignatures, which introduces a much more complex step of distributed key generation instead of spending some more milliseconds on verifying the signatures.
-- Message complexity: This also tends to be a red herring. In theory, the fewer messages are passed around the network, the faster the algorithm will be. In practice, it depends on where the bottleneck is. If your algorithm has linear communication, but the leader still has to send and receive N messages, then you are not gaining any meaningful performance. This also has the unfortunate effect of treating every message the same, while in practice a block proposal can be megabytes long and a block commit is a few kilobytes at most.
-- Block latency: This is the wrong latency to consider. It doesn’t matter if our block time is 0.1s, if then we have to wait 100 blocks to finalize. All it matters is how long it takes for an user to see their transaction finalized. This has led to algorithms like Narwhal and Tusk, which claim to have just one round of voting but another round “hidden” in the block broadcast mechanism. This actually leads to a worse latency for the user, even though the block times are shorter.
+While the above metrics are often highlighted, other aspects of consensus protocol design have a far greater impact in practical settings. Below, we discuss the metrics that are most relevant for real-world consensus algorithm performance.
 
-### What does matter
-
-- Systemic complexity: This relates to the [systemic vs. encapsulated complexity](https://vitalik.eth.limo/general/2022/02/28/complexity.html) topic. Our consensus algorithms are not run in isolation, they are meant to support other applications. An example of this problem is probabilistic vs provable finality. Algorithms that finalize probabilistically impose complexity on the applications. Exchanges must determine how many confirmations to wait for each different chain they accept, the same for multi-chain dapps, hybrid dapps, block explorers, wallets, etc. Algorithms that finalize provably give a clear signal to every application that they can use. This is important enough that even Ethereum is planning to move to [single-slot finality](https://ethereum.org/en/roadmap/single-slot-finality/#why-aim-for-quicker-finality), because not finalizing every block is not enough.
-- Simplicity: To model and implement the algorithm. Your algorithm might be able to save one round-trip in an optimistic scenario, but is it worth it if it’s too complex to create a formal model out of it? And if then the implementation will take 4 engineers and 3 audits? Simple algorithms that can be formally proven and are straight-forward to implement are more secure algorithms. A bug that causes downtime (or even worse, safety violations) is much worse for the UX than slightly slower block times.
-- Transaction latency: What was discussed before. The only latency that matters is the one experienced by the user.
+- Systemic complexity. This aligns with the discussion on [systemic vs. encapsulated complexity](https://vitalik.eth.limo/general/2022/02/28/complexity.html). Consensus algorithms do not operate in isolation; they are designed to support a wide range of applications. A clear example of this issue is the distinction between probabilistic and provable finality. Algorithms that provide probabilistic finality introduce additional complexity for applications, as exchanges, multi-chain dApps, hybrid dApps, block explorers, wallets, and similar systems must determine an appropriate number of confirmations for each chain. In contrast, algorithms with provable finality deliver a clear and deterministic signal that applications can rely on. This distinction is significant enough that even Ethereum is planning to adopt [single-slot finality](https://ethereum.org/en/roadmap/single-slot-finality/#why-aim-for-quicker-finality).
+- Simplicity. When designing and implementing an algorithm, it is essential to consider the trade-off between complexity and practicality. While saving one round-trip in an optimistic scenario may seem advantageous, it may not be worthwhile if the algorithm is too complex to formalize and model effectively. Additionally, a complex implementation that requires significant resources—such as multiple engineers and extensive audits—can introduce further challenges. Simple algorithms that are straightforward to implement and can be formally proven offer greater security and reliability. A bug that results in downtime, or worse, safety violations, has a far more detrimental impact on user experience than marginally slower block times.
+- Transaction latency. As previously discussed, the only latency that truly matters is the one experienced by the user.
 
 ## Lessons learned
 
-For our particular use case, there are a few lessons that we learned from researching and implementing previous consensus algorithms:
+For our specific use case, we have gained several insights from researching and implementing previous consensus algorithms:
 
-- Chained consensus is not worth it. It doesn’t improve the throughput or the latency while increasing systemic complexity. We always finalize every block.
-- Lower fault tolerance to reduce voting rounds. This we learned from FaB Paxos. Decreasing our fault tolerance from *3f+1* to *5f+1* allows us to finalize in just one voting round. This [blog post from Decentralized Thoughts](https://decentralizedthoughts.github.io/2021-03-03-2-round-bft-smr-with-n-equals-4-f-equals-1/) explains intuitively how this works.
-- Linear communication is not worth it. Quadratic communication for replicas simplifies security (there are fewer cases where we need to consider the effect of a malicious leader), implementation (you can fully separate the leader component) and view changes (constant timeouts are enough, [Jolteon/Ditto](https://arxiv.org/abs/2106.10362) ended up going in that direction after trying to implement HotStuff). Further, the performance drop is likely not significant (see [ParBFT](https://eprint.iacr.org/2023/679.pdf)).
-- Re-proposals as a way of guaranteeing that there are no “rogue” blocks. This is a problem that didn’t get any attention so far (as far as we know), and is probably somewhat unique to public blockchains. The issue is that in all committee-based consensus algorithms it is possible that a commit QC (to use HotStuff’s terminology) is formed but that not enough replicas receive it. This will cause a timeout and another block to be proposed. Most algorithms just solve this by saying that the old block is no longer valid. All honest replicas will be in agreement about which block is canonical, but someone who just receives that single block and is not aware of the timeout will think that that particular block was finalized. This breaks the very desirable property of being able to verify that a given block is part of the chain just from seeing the block, without being required to have the entire chain. The way we solve this is to require that block proposals after a timeout (where a commit QC might have been formed) re-propose the previous block. This guarantees that if we see a block with a valid commit QC, then that block is part of the chain (maybe it wasn’t finalized in that particular view, but it was certainly finalized).
-- Always justify messages to remove time dependencies. That’s something we got from Fast-HotStuff. Messages should have enough information by themselves that any replica is capable of verifying their validity without any other information (with the exception of having previous blocks, but that’s external to the consensus algorithm anyway). If we don’t, then we introduce subtle timing dependencies. For example, Tendermint had a bug that was only discovered years later, where the solution was that the leader had to wait for the maximum network delay at the end of every round. If that wait doesn’t happen, a lock can occur. Funnily enough, Hotstuff-2 reintroduces this timing dependency in order to get rid of one round-trip, which significantly worsens the difficulty of modelling and implementing such a system.
-- Make garbage collection and reconfiguration part of the algorithm. These are parts of the algorithm that will certainly be implemented. If we don’t specify and model them before, we will be left with awkwardly implementing them later on.
+1. Chained consensus is not beneficial for our use case. It does not improve throughput or latency but adds unnecessary systemic complexity. Instead, we finalize every block.
+2. Reducing fault tolerance can minimize the number of voting rounds, as demonstrated by FaB Paxos. By decreasing our fault tolerance requirement from _3f+1_ to _5f+1_, we are able to finalize consensus in a [single voting round](https://decentralizedthoughts.github.io/2021-03-03-2-round-bft-smr-with-n-equals-4-f-equals-1/).
+3. Linear communication is not always advantageous. Quadratic communication among replicas simplifies security, as there are fewer cases where the impact of a malicious leader needs to be considered. It also simplifies implementation by allowing a clear separation of the leader component, as well as view changes, where constant timeouts suffice. For example, [Jolteon/Ditto](https://arxiv.org/abs/2106.10362) adopted this approach after encountering challenges while implementing HotStuff. Additionally, the performance trade-off is likely minimal, as demonstrated by [ParBFT](https://eprint.iacr.org/2023/679.pdf).
+4. Re-proposals can be used to ensure that no "rogue" blocks exist—a challenge that has received little attention so far (to the best of our knowledge) and is particularly relevant to public blockchains. In committee-based consensus algorithms, it is possible for a commit quorum certificate (to use HotStuff's terminology) to be formed without being received by enough replicas. This can result in a timeout and the proposal of a new block. Most algorithms address this by declaring the old block invalid. While all honest replicas will agree on which block is canonical, a participant who only sees the single block without knowledge of the timeout may mistakenly believe that it was finalized. This undermines the desirable property of verifying a block's inclusion in the chain by inspecting only the block itself, without needing the entire chain. To address this, we require that block proposals following a timeout (where a commit quorum certificate might have been formed) re-propose the previous block. This approach guarantees that any block with a valid commit quorum certificate is part of the chain—it may not have been finalized in the current view, but it was certainly finalized
+5. Always justify messages to remove time dependencies, a lesson we learned from Fast-HotStuff. Messages should carry sufficient information to allow any replica to independently verify their validity without relying on additional data (except for previous blocks, which are external to the consensus algorithm). Failing to do so can introduce subtle timing dependencies. For example, Tendermint contained a bug—discovered years later—where the leader had to wait for the maximum network delay at the end of each round to avoid a deadlock. If this wait did not occur, the system could lock. Interestingly, HotStuff-2 reintroduces this timing dependency to eliminate one round-trip, which significantly increases the complexity of modeling and implementing the system.
+6. Make garbage collection and reconfiguration integral parts of the algorithm. These components will inevitably need to be implemented, and failing to specify and model them upfront may result in awkward and inefficient implementations later.
 
-FaB Paxos satisfies the first 4 points and Fast-HotStuff satisfies the 5th. ChonkyBFT is basically FaB Paxos with some ideas from Fast-HotStuff/HotStuff-2.
+FaB Paxos satisfies the first 4 points, and Fast-HotStuff satisfies the 5th. ChonkyBFT is basically FaB Paxos with some ideas from Fast-HotStuff/HotStuff-2.