Special thanks to Justin Drake, Hsiao-wei Wang, @antonttc, Anders Elowsson and Francesco for feedback and review.
Originally, “the Merge” referred to the most important event in the Ethereum protocol’s history since its launch: the long-awaited and hard-earned transition from proof of work to proof of stake. Today, Ethereum has been a stably running proof of stake system for almost exactly two years, and this proof of stake has performed remarkably well in stability, performance and avoiding centralization risks. However, there still remain some important areas in which proof of stake needs to improve.
My roadmap diagram from 2023 separated this out into buckets: improving technical features such as stability, performance, and accessibility to smaller validators, and economic changes to address centralization risks. The former got to take over the heading for “the Merge”, and the latter became part of “the Scourge”.
The Merge, 2023 roadmap edition.
This post will focus on the “Merge” part: what can still be improved in the technical design of proof of stake, and what are some paths to getting there?
This is not meant as an exhaustive list of things that could be done to proof of stake; rather, it is a list of ideas that are actively being considered.
- Single slot finality
- Transaction confirmation and finalization as fast as possible, while preserving decentralization
- Improve staking viability for solo stakers
- Improve robustness
- Improve Ethereum's ability to resist and recover from 51% attacks (including finality reversion, finality blocking, and censorship)
- Single slot finality and staking democratization
- Single secret leader election
- Faster transaction confirmations
- Other research areas
Today, it takes 2-3 epochs (~15 min) to finalize a block, and 32 ETH is required to be a staker. This was originally a compromise meant to balance between three goals:
- Maximizing the number of validators that can participate in staking (this directly implies minimizing the min ETH required to stake)
- Minimizing the time to finality
- Minimizing the overhead of running a node, in this case the cost of downloading, verifying and re-broadcasting all the other validator's signatures
The three goals are in conflict: in order for economic finality to be possible (meaning: an attacker would need to burn a large amount of ETH to revert a finalized block), you need every single validator to sign two messages each time finality happens. And so if you have many validators, either you need a long time to process all their signatures, or you need very beefy nodes to process all the signatures at the same time.
Note that this is all conditional on a key goal of Ethereum: ensuring that even successful attacks have a high cost to the attacker. This is what is meant by the term “economic finality”. If we did not have this goal, then we could solve this problem by randomly selecting a committee to finalize each slot. Chains that do not attempt to achieve economic finality, such as Algorand, often do exactly this. But the problem with this approach is that if an attacker does control 51% of validators, then they can perform an attack (reverting a finalized block, or censoring, or delaying finality) at very low cost: only the portion of their nodes that are in the committee could be detected as participating in the attack and penalized, whether through slashing or socially-coordinated soft fork. This means that an attacker could repeatedly attack the chain many times over, losing only a small portion of their stake during each attack. Hence, if we want economic finality, a naive committee-based approach does not work, and it appears at first glance that we do need the full set of validators to participate.
Ideally, we want to preserve economic finality, while simultaneously improving on the status quo in two areas:
- Finalize blocks in one slot (ideally, keep or even reduce the current length of 12s), instead of 15 min
- Allow validators to stake with 1 ETH (down from 32 ETH)
The first goal is justified by two goals, both of which can be viewed as “bringing Ethereum’s properties in line with those of (more centralized) performance-focused L1 chains”.
First, it ensures that all Ethereum users actually benefit from the higher level of security assurances achieved through the finality mechanism. Today, most users do not, because they are not willing to wait 15 minutes; with single-slot finality, users will see their transactions finalized almost as soon as they are confirmed. Second, it simplifies the protocol and surrounding infrastructure if users and applications don’t have to worry about the possibility of the chain reverting except in the relatively rare case of an inactivity leak.
The second goal is justified by a desire to support solo stakers. Poll after poll repeatedly show that the main factor preventing more people from solo staking is the 32 ETH minimum. Reducing the minimum to 1 ETH would solve this issue, to the point where other concerns become the dominant factor limiting solo staking.
There is a challenge: the goals of faster finality and more democratized staking both conflict with the goal of minimizing overhead. And indeed, this fact is the entire reason why we did not start with single-slot finality to begin with. However, more recent research presents a few possible paths around the problem.
Single-slot finality involves using a consensus algorithm that finalizes blocks in one slot. This in itself is not a difficult goal: plenty of algorithms, such as Tendermint consensus, already do this with optimal properties. One desired property unique to Ethereum, which Tendermint does not support, is inactivity leaks, which allow the chain to keep going and eventually recover even when more than 1/3 of validators go offline. Fortunately, this desire has already been addressed: there are already proposals that modify Tendermint-style consensus to accommodate inactivity leaks.
_A leading single slot finality proposal_
The harder part of the problem is figuring out how to make single-slot finality work with a very high validator count, without leading to extremely high node-operator overhead. For this, there are a few leading solutions:
-
Option 1: Brute force - work hard on implementing better signatures aggregation protocols, potentially using ZK-SNARKs, which would actually allow us to process signatures from millions of validators in each slot.
_Horn, one of the proposed designs for a better aggregation protocol._
-
Option 2: Orbit committees - a new mechanism which allows a randomly-selected medium-sized committee to be responsible for finalizing the chain, but in a way that preserves the cost-of-attack properties that we are looking for.
One way to think about Orbit SSF is that it opens up a space of compromise options along a spectrum from x=0 (Algorand-style committees, no economic finality) to x=1 (status quo Ethereum), opening up points in the middle where Ethereum still has enough economic finality to be extremely secure, but at the same time we get the efficiency benefits of only needing a medium-sized random sample of validators to participate in each slot.
Orbit takes advantage of pre-existing heterogeneity in validator deposit sizes to get as much economic finality as possible, will still giving small validators a proportionate role. In addition, Orbit uses slow committee rotation to ensure high overlap between adjacent quorums, ensuring that its economic finality still applies at committee-switching boundaries.
-
Option 3: two-tiered staking - a mechanism where there are two classes of stakers, one with higher deposit requirements and one with lower deposit requirements. Only the higher-deposit tier would be directly involved in providing economic finality. There are various proposals (eg. see the Rainbow staking post) for exactly what rights and responsibilities the lower-deposit tier has. Common ideas include:
- the right to delegate stake to a higher-tier staker
- a random sample of lower-tier stakers attesting to, and being needed to finalize, each block
- the right to generate inclusion lists
- Paths toward single slot finality (2022): https://notes.ethereum.org/@vbuterin/single_slot_finality
- A concrete proposal for a single slot finality protocol for Ethereum (2023): https://eprint.iacr.org/2023/280
- Orbit SSF: https://ethresear.ch/t/orbit-ssf-solo-staking-friendly-validator-set-management-for-ssf/19928
- Further analysis on Orbit-style mechanisms: https://ethresear.ch/t/vorbit-ssf-with-circular-and-spiral-finality-validator-selection-and-distribution/20464
- Horn, signature aggregation protocol (2022): https://ethresear.ch/t/horn-collecting-signatures-for-faster-finality/14219
- Signature merging for large-scale consensus (2023): https://ethresear.ch/t/signature-merging-for-large-scale-consensus/17386?u=asn
- Signature aggregation protocol proposed by Khovratovich et al: https://hackmd.io/@7dpNYqjKQGeYC7wMlPxHtQ/BykM3ggu0#/
- STARK-based signature aggregation (2022): https://hackmd.io/@vbuterin/stark_aggregation
- Rainbow staking: https://ethresear.ch/t/unbundling-staking-towards-rainbow-staking/18683
There are four major possible paths to take (and we can also take hybrid paths):
- Maintain status quo
- Brute-force SSF
- Orbit SSF
- SSF with two-tiered staking
(1) means doing no work and leaving staking as is, but it leaves Ethereum’s security experience and staking centralization properties worse than it could be.
(2) brute-forces the problem with high tech. Making this happen requires aggregating a very large number of signatures (1 million+) in a very short period of time (5-10s). One way to think of this approach is that it involves minimizing systemic complexity by going all-out on accepting encapsulated complexity.
(3) avoids “high tech”, and solves the problem with clever rethinking around protocol assumptions: we relax the “economic finality” requirement so that we require attacks to be expensive, but are okay with the cost of attack being perhaps 10x less than today (eg. $2.5 billion cost of attack instead of $25 billion). It’s a common view that Ethereum today has far more economic finality than it needs, and its main security risks are elsewhere, and so this is arguably an okay sacrifice to make.
The main work to do is verifying that the Orbit mechanism is safe and has the properties that we want, and then fully formalizing and implementing it. Additionally, EIP-7251 (increase max effective balance) allows for voluntary validator balance consolidation that immediately reduces the chain verification overhead somewhat, and acts as an effective initial stage for an Orbit rollout.
(4) avoids clever rethinking and high tech, but it does create a two-tiered staking system which still has centralization risks. The risks depend heavily on the specific rights that the lower staking tier gets. For example:
- If a low-tier staker needs to delegate their attesting rights to a high-tier staker, then delegation could centralize and we would thus end up with two highly centralized tiers of staking.
- If a random sample of the lower tier is needed to approve each block, then an attacker could spend a very small amount of ETH to block finality.
- If lower-tier stakers can only make inclusion lists, then the attestation layer may remain centralized, at which point a 51% attack on the attestation layer can censor the inclusion lists themselves.
Multiple strategies can be combined, for example:
(1 + 2): use brute-force techniques to reduce the min deposit size without doing single slot finality. The amount of aggregation required is 64x less than in the pure (3) case, so the problem becomes easier.
(1 + 3): add Orbit without doing single slot finality
(2 + 3): do Orbit SSF with conservative parameters (eg. 128k validator committee instead of 8k or 32k), and use brute-force techniques to make that ultra-efficient.
(1 + 4): add rainbow staking without doing single slot finality
In addition to its other benefits, single slot finality reduces the risk of certain types of multi-block MEV attacks. Additionally, attester-proposer separation designs and other in-protocol block production pipelines would need to be designed differently in a single-slot finality world.
Brute-force strategies have the weakness that they make it harder to reduce slot times.
Today, which validator is going to propose the next block is known ahead of time. This creates a security vulnerability: an attacker can watch the network, identify which validators correspond to which IP addresses, and DoS attack each validator right when they are about to propose a block.
The best way to fix the DoS issue is to hide the information about which validator is going to produce the next block, at least until the moment when the block is actually produced. Note that this is easy if we remove the “single” requirement: one solution is to let anyone create the next block, but require the randao reveal to be less than 2256 / N. On average, only one validator would be able to meet this requirement - but sometimes there would be two or more and sometimes there would be zero. Combining the “secrecy” requirement with the “single” requirement” has long been the hard problem.
Single secret leader election protocols solve this by using some cryptographic techniques to create a “blinded” validator ID for each validator, and then giving many proposers the opportunity to shuffle-and-reblind the pool of blinded IDs (this is similar to how a mixnet works). During each slot, a random blinded ID is selected. Only the owner of that blinded ID is able to generate a valid proof to propose the block, but no one else knows which validator that blinded ID corresponds to.
_Whisk SSLE protocol_
- Paper by Dan Boneh (2020): https://eprint.iacr.org/2020/025.pdf
- Whisk (concrete proposal for Ethereum, 2022): https://ethresear.ch/t/whisk-a-practical-shuffle-based-ssle-protocol-for-ethereum/11763
- Single secret leader election tag on ethresear.ch: https://ethresear.ch/tag/single-secret-leader-election
- Simplified SSLE using ring signatures: https://ethresear.ch/t/simplified-ssle/12315
Realistically, what’s left is finding and implementing a protocol that is sufficiently simple that we are comfortable implementing it on mainnet. We highly value Ethereum being a reasonably simple protocol, and we do not want complexity to increase further. SSLE implementations that we’ve seen add hundreds of lines of spec code, and introduce new assumptions in complicated cryptography. Figuring out an efficient-enough quantum-resistant SSLE implementation is also an open problem.
It may end up the case that the extra complexity introduced by SSLE only goes down enough once we take the plunge and introduce the machinery to do general-purpose zero-knowledge proofs into the Ethereum protocol at L1 for other reasons (eg. state trees, ZK-EVM).
An alternative option is to simply not bother with SSLE, and use out-of-protocol mitigations (eg. at the p2p layer) to solve the DoS issues.
If we add an attester-proposer separation (APS) mechanism, eg. execution tickets, then execution blocks (ie. blocks containing Ethereum transactions) will not need SSLE, because we could rely on block builders being specialized. However, we would still benefit from SSLE for consensus blocks (ie. blocks containing protocol messages such as attestations, perhaps pieces of inclusion lists, etc).
There is value in Ethereum’s transaction confirmation time decreasing further, from 12 seconds down to eg. 4 seconds. Doing this would significantly improve the user experience of both the L1 and based rollups, while making defi protocols more efficient. It would also make it easier for L2s to decentralize, because it would allow a large class of L2 applications to work on based rollups, reducing the demand for L2s to build their own committee-based decentralized sequencing.
There are broadly two families of techniques here:
- Reduce slot times, down to eg. 8 seconds or 4 seconds. This does not necessarily have to mean 4-second finality: finality inherently takes three rounds of communication, and so we can make each round of communication be a separate block, which would after 4 seconds get at least a preliminary confirmation.
- Allow proposers to publish pre-confirmations over the course of a slot. In the extreme, a proposer could include transactions that they see into their block in real time, and immediately publish a pre-confirmation message for each transaction (“My first transaction is 0×1234…”, “My second transaction is 0×5678…”). The case of a proposer publishing two conflicting confirmations can be dealt with in two ways: (i) by slashing the proposer, or (ii) by using attesters to vote on which one came earlier.
- Based preconfirmations: https://ethresear.ch/t/based-preconfirmations/17353
- Protocol-enforced proposer commitments (PEPC): https://ethresear.ch/t/unbundling-pbs-towards-protocol-enforced-proposer-commitments-pepc/13879
- Staggered periods across parallel chains (a 2018-era idea for achieving low latency): https://ethresear.ch/t/staggered-periods/1793
It’s far from clear just how practical it is to reduce slot times. Even today, stakers in many regions of the world have a hard time getting attestations included fast enough. Attempting 4-second slot times runs the risk of centralizing the validator set, and making it impractical to be a validator outside of a few privileged geographies due to latency. Specifically, moving to 4-second slot times would require reducing the bound on network latency ("delta") to two seconds.
The proposer preconfirmation approach has the weakness that it can greatly improve average-case inclusion times, but not worst-case: if the current proposer is well-functioning, your transaction will be pre-confirmed in 0.5 seconds instead of being included in (on average) 6 seconds, but if the current proposer is offline or not well-functioning, you would still have to wait up to a full 12 seconds for the next slot to start and provide a new proposer.
Additionally, there is the open question of how pre-confirmations will be incentivized. Proposers have an incentive to maximize their optionality as long as possible. If attesters sign off on timeliness of pre-confirmations, then transaction senders could make a portion of the fee conditional on an immediate pre-confirmation, but this would put an extra burden on attesters, and potentially make it more difficult for attesters to continue functioning as a neutral “dumb pipe”.
On the other hand, if we do not attempt this and keep finality times at 12 seconds (or longer), the ecosystem will put greater weight on pre-confirmation mechanisms made by layer 2s, and cross-layer-2 interaction will take longer.
Proposer-based preconfirmations realistically depend on an attester-proposer separation (APS) mechanism, eg. execution tickets. Otherwise, the pressure to provide real-time preconfirmations may be too centralizing for regular validators.
Exactly how short slot times can be also depends on the slot structure, which depends heavily on what versions of APS, inclusion lists, etc we end up implementing. There are slot structures that contain fewer rounds and are thus more friendly to short slot times, but they make tradeoffs in other places.
There is often an assumption that if a 51% attack happens (including attacks that are not cryptographically provable, such as censorship), the community will come together to implement a minority soft fork that ensures that the good guys win, and the bad guys get inactivity-leaked or slashed. However, this degree of over-reliance on the social layer is arguably unhealthy. We can try to reduce reliance on the social layer, by making the process of recovering as automated as possible.
Full automation is impossible, because if it were, that would count as a >50% fault tolerant consensus algorithm, and we already know the (very restrictive) mathematically provable limitations of those kinds of algorithms. But we can achieve partial automation: for example, a client could automatically refuse to accept a chain as finalized, or even as the head of the fork choice, if it censors transactions that the client has seen for long enough. A key goal would be ensuring that the bad guys in an attack at least cannot get a quick clean victory.
Today, a block finalizes if 67% of stakers support it. There is an argument that this is overly aggressive. There has been only one (very brief) finality failure in all of Ethereum’s history. If this percentage is increased, eg. to 80%, then the added number of non-finality periods will be relatively low, but Ethereum would gain security properties: in particular, many more contentious situations will result in temporary stopping of finality. This seems a much healthier situation than “the wrong side” getting an instant victory, both when the wrong side is an attacker, and when it’s a client that has a bug.
This also gives an answer to the question “what is the point of solo stakers”? Today, most stakers are already staking through pools, and it seems very unlikely to get solo stakers up to 51% of staked ETH. However, getting solo stakers up to a quorum-blocking minority, especially if the quorum is 80% (so a quorum-blocking minority would only need 21%) seems potentially achievable if we work hard at it. As long as solo stakers do not go along with a 51% attack (whether finality-reversion or censorship), such an attack would not get a “clean victory”, and solo stakers would be motivated to help organize a minority soft fork.
Note that there are interactions between quorum thresholds and the Orbit mechanism: if we end up using Orbit, then what exactly "21% of stakers" means will become a more complicated question, and will depend in part on the distribution of validators.
Metaculus currently believes, though with wide error bars, that quantum computers will likely start breaking cryptography some time in the 2030s:
Quantum computing experts such as Scott Aaronson have also recently started taking the possibility of quantum computers actually working in the medium term much more seriously. This has consequences across the entire Ethereum roadmap: it means that each piece of the Ethereum protocol that currently depends on elliptic curves will need to have some hash-based or otherwise quantum-resistant replacement. This particularly means that we cannot assume that we will be able to lean on the excellent properties of BLS aggregation to process signatures from a large validator set forever. This justifies conservatism in the assumptions around performance of proof-of-stake designs, and also is a cause to be more proactive to develop quantum-resistant alternatives.