This post is primarily intended for readers roughly familiar with 2019-era cryptography, especially SNARKs and STARKs. If you are not, I recommend reading those articles first. Special thanks to Justin Drake, Jim Posen, Benjamin Diamond and Radi Cojbasic for feedback and review.
Over the past two years, STARKs have become a crucial and irreplaceable technology for efficiently making easy-to-verify cryptographic proofs of very complicated statements (eg. proving that an Ethereum block is valid). A key reason why is small field sizes: whereas elliptic curve-based SNARKs require you to work over 256-bit integers in order to be secure enough, STARKs let you use much smaller field sizes, which are more efficient: first the Goldilocks field (64-bit integers), and then Mersenne31 and BabyBear (both 31-bit). Thanks to these efficiency gains, Plonky2, which uses Goldilocks, is hundreds of times faster at proving many kinds of computation than its predecessors.
A natural question to ask is: can we take this trend to its logical conclusion, building proof systems that run even faster by operating directly over zeroes and ones? This is exactly what Binius is trying to do, using a number of mathematical tricks that make it very different from the SNARKs and STARKs of three years ago. This post goes through the reasons why small fields make proof generation more efficient, why binary fields are uniquely powerful, and the tricks that Binius uses to make proofs over binary fields work so effectively.
Binius. By the end of this post, you should be able to understand every part of this diagram.
- Recap: finite fields
- Recap: arithmetization
- Plonky2: from 256-bit SNARKs and STARKs to 64-bit... only STARKs
- From small primes to binary
- From univariate polynomials to hypercubes
- Simple Binius - an example
- Binary fields
- Full Binius
- Putting it all together
- What did we not cover?
One of the key tasks of a cryptographic proving system is to operate over huge amounts of data, while keeping the numbers small. If you can compress a statement about a large program into a mathematical equation involving a few numbers, but those numbers are as big as the original program, you have not gained anything.
To do complicated arithmetic while keeping numbers small, cryptographers generally use modular arithmetic. We pick some prime "modulus" p
. The % operator means "take the remainder of": $15\ %\ 7 = 1$, $53\ %\ 10 = 3$, etc (note that the answer is always non-negative, so for example $-1\ %\ 10 = 9$).
You've probably already seen modular arithmetic, in the context of adding and subtracting time (eg. what time is four hours after 9:00?). But here, we don't just add and subtract modulo some number, we also multiply, divide and take exponents.
We redefine:
The above rules are all self-consistent. For example, if
-
$5 + 3 = 1$ (because$8$ %$7 = 1$ ) -
$1 - 3 = 5$ (because$-2$ %$7 = 5$ ) $2 \cdot 5 = 3$ -
$3 / 5 = 2$ (because ($3 \cdot 5^5$ ) %$7 = 9375$ %$7 = 2$ )
A more general term for this kind of structure is a finite field. A finite field is a mathematical structure that obeys the usual laws of arithmetic, but where there's a limited number of possible values, and so each value can be represented in a fixed size.
Modular arithmetic (or prime fields) is the most common type of finite field, but there is also another type: extension fields. You've probably already seen an extension field before: the complex numbers. We "imagine" a new element, which we label
The way that SNARKs and STARKs prove things about computer programs is through arithmetization: you convert a statement about a program that you want to prove, into a mathematical equation involving polynomials. A valid solution to the equation corresponds to a valid execution of the program.
To give a simple example, suppose that I computed the 100'th Fibonacci number, and I want to prove to you what it is. I create a polynomial $F$ that encodes Fibonacci numbers: so $F(0) = F(1) = 1$, $F(2) = 2$, $F(3) = 3$, $F(4) = 5$, and so on for 100 steps. The condition that I need to prove is that $F(x+2) = F(x) + F(x+1)$ across the range $x = {0, 1 ... 98}$. I can convince you of this by giving you the quotient:
$$H(x) = \frac{F(x+2) - F(x+1) - F(x)}{Z(x)}$$
Where $Z(x) = (x - 0) * (x - 1) * ... * (x - 98)$. If I can provide valid $F$ and $H$ that satisfy this equation, then $F$ must satisfy $F(x+2) - F(x+1) - F(x)$ across that range. If I additionally verify that $F$ satisfies $F(0) = F(1) = 1$, then $F(100)$ must actually be the 100th Fibonacci number.
If you want to prove something more complicated, then you replace the "simple" relation $F(x+2) = F(x) + F(x+1)$ with a more complicated equation, which basically says "$F(x+1)$ is the output of initializing a virtual machine with the state $F(x)$, and running one computational step". You can also replace the number 100 with a bigger number, eg. 100000000, to accommodate more steps.
All SNARKs and STARKs are based on this idea of using a simple equation over polynomials (or sometimes vectors and matrices) to represent a large number of relationships between individual values. Not all involve checking equivalence between adjacent computational steps in the same way as above: PLONK does not, for example, and neither does R1CS. But many of the most efficient ones do, because enforcing the same check (or the same few checks) many times makes it easier to minimize overhead.
Five years ago, a reasonable summary of the different types of zero knowledge proof was as follows. There are two types of proofs: (elliptic-curve-based) SNARKs and (hash-based) STARKs. Technically, STARKs are a type of SNARK, but in practice it's common to use "SNARK" to refer to only the elliptic-curve-based variety, and "STARK" to refer to hash-based constructions. SNARKs are small, and so you can verify them very quickly and fit them onchain easily. STARKs are big, but they don't require trusted setups, and they are quantum-resistant.
STARKs work by treating the data as a polynomial, computing evaluations of that polynomial across a large number of points, and using the Merkle root of that extended data as the "polynomial commitment"
A key bit of history here is that elliptic curve-based SNARKs came into widespread use first: it took until roughly 2018 for STARKs to become efficient enough to use, thanks to FRI, and by then Zcash had already been running for over a year. Elliptic curve-based SNARKs have a key limitation: if you want to use elliptic curve-based SNARKs, then the arithmetic in these equations must be done with integers modulo the number of points on the elliptic curve. This is a big number, usually near
Even if your "original" data is made up of "small" numbers, the proving process requires computing quotients, extensions, random linear combinations, and other transformations of the data, which lead to an equal or larger number of objects that are, on average, as large as the full size of your field. This creates a key inefficiency: to prove a computation over n
small values, you have to do even more computation over n
much bigger values. At first, STARKs inherited the habit of using 256-bit fields from SNARKs, and so suffered the same inefficiency.
A Reed-Solomon extension of some polynomial evaluations. Even though the original values are small, the extra values all blow up to the full size of the field (in this case
In 2022, Plonky2 was released. Plonky2's main innovation was doing arithmetic modulo a smaller prime:
To continue to be safe, Plonky2 also needed to introduce extension fields. A key technique in checking arithmetic equations is "sampling at a random point": if you want to check if
To stop this, we sample
Computers do arithmetic by representing larger numbers as sequences of zeroes and ones, and building "circuits" on top of those bits to compute things like addition and multiplication. Computers are particularly optimized for doing computation with 16-bit, 32-bit and 64-bit integers. Moduluses like
What would be even better, however, is doing computation in binary directly. What if addition could be "just" XOR, with no need to worry about "carrying" the overflow from adding 1 + 1 in one bit position to the next bit position? What if multiplication could be more parallelizable in the same way? And these advantages would all come on top of being able to represent True/False values with just one bit.
Capturing these advantages of doing binary computation directly is exactly what Binius is trying to do. A table from the Binius team's zkSummit presentation shows the efficiency gains:
Despite being roughly the same "size", a 32-bit binary field operation takes 5x less computational resources than an operation over the 31-bit Mersenne field.
Suppose that we are convinced by this reasoning, and want to do everything over bits (zeroes and ones). How do we actually commit to a polynomial representing a billion bits?
Here, we face two practical problems:
- For a polynomial to represent a lot of values, those values need to be accessible at evaluations of the polynomial: in our Fibonacci example above, $F(0)$, $F(1)$ ... $F(100)$, and in a bigger computation, the indices would go into the millions. And the field that we use needs to contain numbers going up to that size.
- Proving anything about a value that we're committing to in a Merkle tree (as all STARKs do) requires Reed-Solomon encoding it: extending $n$ values into eg. $8n$ values, using the redundancy to prevent a malicious prover from cheating by faking one value in the middle of the computation. This also requires having a large enough field: to extend a million values to 8 million, you need 8 million different points at which to evaluate the polynomial.
A key idea in Binius is solving these two problems separately, and doing so by representing the same data in two different ways. First, the polynomial itself. Elliptic curve-based SNARKs, 2019-era STARKs, Plonky2 and other systems generally deal with polynomials over one variable:
That is,
This example is of course just for illustration: in practice, the whole point of going to a hypercube is to let us work with individual bits. The "Binius-native" way to count Fibonacci numbers would be to use a higher-dimensional cube, using each set of eg. 16 bits to store a number. This requires some cleverness to implement integer addition on top of the bits, but with Binius it's not too difficult.
Now, we get to the erasure coding. The way STARKs work is: you take
See here for a python implementation of this protocol.
Let's go through an example, using regular integers as our field for convenience (in a real implementation this will be binary field elements). First, we take the hypercube we want to commit to, and encode it as a square:
Now, we Reed-Solomon extend the square. That is, we treat each row as being a degree-3 polynomial evaluated at x = {0, 1, 2, 3}
, and evaluate the same polynomial at x = {4, 5, 6, 7}
:
Notice that the numbers blow up quickly! This is why in a real implementation, we always use a finite field for this, instead of regular integers: if we used integers modulo 11, for example, the extension of the first row would just be [3, 10, 0, 6]
.
If you want to play around with extending and verify the numbers here for yourself, you can use my simple Reed-Solomon extension code here.
Next, we treat this extension as columns, and make a Merkle tree of the columns. The root of the Merkle tree is our commitment.
Now, let's suppose that the prover wants to prove an evaluation of this polynomial at some point
Suppose we pick
And for the row part:
What this means is: a list of all possible products of one value from each set. In the row case, we get:
Using
Now, we compute a new "row"
You can view what's going on here as a partial evaluation. If we were to multiply the full tensor product
The prover provides the verifier with this new row,
Now, we take advantage of the linearity of Reed-Solomon codes. The key property that we use is: taking a linear combination of a Reed-Solomon extension gives the same result as a Reed-Solomon extension of a linear combination. This kind of "order independence" often happens when you have two operations that are both linear.
The verifier does exactly this. They compute the extension of
In this case, extending
But the verifier still needs to check one more thing: actually check the evaluation of the polynomial at
In our example, where
So now we take this linear combination of
Which exactly equals the answer you get if you evaluate the polynomial directly.
The above is pretty close to a complete description of the "simple" Binius protocol. This already has some interesting advantages: for example, because the data is split into rows and columns, you only need a field half the size. But this doesn't come close to realizing the full benefits of doing computation in binary. For this, we will need the full Binius protocol. But first, let's get a deeper understanding of binary fields.
The smallest possible field is arithmetic modulo 2, which is so small that we can write out its addition and multiplication tables:
|
|
We can make larger binary fields by taking extensions: if we start with $F_2$ (integers modulo 2) and then define $x$ where $x^2 = x + 1$, we get the following addition and multiplication tables:
|
|
It turns out that we can expand the binary field to arbitrarily large sizes by repeating this construction. Unlike with complex numbers over reals, where you can add one new element
-
$x_0$ satisfies$x_0^2 = x_0 + 1$ -
$x_1$ satisfies$x_1^2 = x_1x_0 + 1$ -
$x_2$ satisfies$x_2^2 = x_2x_1 + 1$ -
$x_3$ satisfies$x_3^2 = x_3x_2 + 1$
And so on. This is often called the tower construction, because of how each successive extension can be viewed as adding a new layer to a tower. This is not the only way to construct binary fields of arbitary size, but it has some unique advantages that Binius takes advantage of.
We can represent these numbers as a list of bits, eg.
This is a relatively uncommon notation, but I like representing binary field elements as integers, taking the bit representation where more-significant bits are to the right. That is,
Addition in binary fields is just XOR (and, incidentally, so is subtraction); note that this implies that
Then, split up the multiplication:
The last piece is the only slightly tricky one, because you have to apply the reduction rule, and replace
Division in binary fields is done by combining multiplication and inversion:
Left: addition table for four-bit binary field elements (ie. elements made up only of combinations of
The beautiful thing about this type of binary field is that it combines some of the best parts of "regular" integers and modular arithmetic. Like regular integers, binary field elements are unbounded: you can keep extending as far as you want. But like modular arithmetic, if you do operations over values within a certain size limit, all of your answers also stay within the same bound. For example, if you take successive powers of
And after 255 steps, you get right back to
And finally, binary fields work conveniently with bits: if you do math with numbers that fit into 52435875175126190479447740508185965837690552500527637822603658699938581184513
, and so encoding binary data involves throwing away a bit of space and doing extra checks at the application layer to make sure that each element is storing a value less than
This all means that we can do things like the Reed-Solomon encoding that we did above, in a way that completely avoids integers "blowing up" like we saw in our example, and in a way that is extremely "native" to the kind of calculation that computers are good at. The "splitting" property of binary fields - how we were able to do
See here for a python implementation of this protocol.
Now, we can get to "full Binius", which adjusts "simple Binius" to (i) work over binary fields, and (ii) let us commit to individual bits. This protocol is tricky to understand, because it keeps going back and forth between different ways of looking at a matrix of bits; it certainly took me longer to understand than it usually takes me to understand a cryptographic protocol. But once you understand binary fields, the good news is that there isn't any "harder math" that Binius depends on. This is not elliptic curve pairings, where there are deeper and deeper rabbit holes of algebraic geometry to go down; here, binary fields are all you need.
Let's look again at the full diagram:
By now, you should be familiar with most of the components. The idea of "flattening" a hypercube into a grid, the idea of computing a row combination and a column combination as tensor products of the evaluation point, and the idea of checking equivalence between "Reed-Solomon extending then computing the row combination", and "computing the row combination then Reed-Solomon extending", were all in simple Binius.
What's new in "full Binius"? Basically three things:
- The individual values in the hypercube, and in the square, have to be bits (0 or 1)
- The extension process extends bits into more bits, by grouping bits into columns and temporarily pretending that they are larger field elements
- After the row combination step, there's an element-wise "decompose into bits" step, which converts the extension back into bits
We will go through both in turn. First, the new extension procedure. A Reed-Solomon code has the fundamental limitation that if you are extending
Now, the row combination. To make "evaluate at a random point" checks cryptographically secure, we need that point to be sampled from a pretty large space, much larger than the hypercube itself. Hence, while the points within the hypercube are bits, evaluations outside the hypercube will be much larger. In our example above, the "row combination" ends up being
This presents a problem: we know how to combine pairs of bits into a larger value, and then do a Reed-Solomon extension on that, but how do you do the same to pairs of much larger values?
The trick in Binius is to do it bitwise: we look at the individual bits of each value (eg. for what we labeled as "11", that's
Recapping:
- We take the bits in the hypercube, and convert them into a grid
- Then, we treat adjacent groups of bits on each row as larger field elements, and do arithmetic on them to Reed-Solomon extend the rows
- Then, we take a row combination of each column of bits, and get a (for squares larger than 4x4, much smaller) column of bits for each row as the output
- Then, we look at the output as a matrix, and treat the bits of that as rows again
Why does this work? In "normal" math, the ability to (often) do linear operations in either order and get the same result stops working if you start slicing a number up by digits. For example, if I start with the number 345, and I multiply it by 8 and then by 3, I get 8280, and if do those two operations in reverse, I also do 8280. But if I insert a "split by digit" operation in between the two steps, it breaks down: if you do 8x then 3x, you get:
But if you do 3x then 8x, you get:
But in binary fields built with the tower construction, this kind of thing does work. The reason why has to do with their separability: if you multiply a big value by a small value, what happens in each segment, stays in each segment. If we multiply
Generally, zero knowledge proof systems work by making statements about polynomials that simultaneously represent statements about the underlying evaluations: just like we saw in the Fibonacci example, $F(X+2) - F(X+1) - F(X) = Z(X) * H(X)$ simultaneously checks all steps of the Fibonacci computation. We check statements about polynomials by proving evaluations at a random point: given a commitment to $F$, you might randomly choose eg. 1892470, demand proofs of evaluations of $F$, $Z$ and $H$ at that point (and $H$ at adjacent points), check those proofs, and then check if $F(1892472) - F(1892471) - F(1892470)$ $= Z(1892470) * H(1892470)$. This check at a random point stands in for checking the whole polynomial: if the polynomial equation doesn't match, the chance that it matches at a specific random coordinate is tiny.
In practice, a major source of inefficiency comes from the fact that in real programs, most of the numbers we are working with are tiny: indices in for loops, True/False values, counters, and similar things. But when we "extend" the data using Reed-Solomon encoding to give it the redundancy needed to make Merkle proof-based checks safe, most of the "extra" values end up taking up the full size of a field, even if the original values are small.
To get around this, we want to make the field as small as possible. Plonky2 brought us down from 256-bit numbers to 64-bit numbers, and then Plonky3 went further to 31 bits. But even this is sub-optimal. With binary fields, we can work over individual bits. This makes the encoding "dense": if your actual underlying data has n
bits, then your encoding will have n
bits, and the extension will have 8 * n
bits, with no extra overhead.
Now, let's look at the diagram a third time:
In Binius, we are committing to a multilinear polynomial: a hypercube $P(x_0, x_1 ... x_k)$, where the individual evaluations $P(0, 0 ... 0)$, $P(0, 0 ... 1)$ up to $P(1, 1, ... 1)$ are holding the data that we care about. To prove an evaluation at a point, we "re-interpret" the same data as a square. We then extend each row, using Reed-Solomon encoding over groups of bits, to give the data the redundancy needed for random Merkle branch queries to be secure. We then compute a random linear combination of rows, with coefficients designed so that the new combined row actually holds the evaluation that we care about. Both this newly-created row (which get re-interpreted as 128 rows of bits), and a few randomly-selected columns with Merkle branches, get passed to the verifier. This is $O(\sqrt{N})$ data: the new row has $O(\sqrt{N})$ size, and each of the (constant number of) columns that get passed has $O(\sqrt{N})$ size.
The verifier then does a "row combination of the extension" (or rather, a few columns of the extension), and an "extension of the row combination", and verifies that the two match. They then compute a column combination, and check that it returns the value that the prover is claiming. And there's our proof system (or rather, the polynomial commitment scheme, which is the key building block of a proof system).
-
Efficient algorithms to extend the rows, which are needed to actually make the computational efficiency of the verifier
$O(\sqrt{N})$ . With naive Lagrange interpolation, we can only get$O(N^{\frac{2}{3}})$ . For this, we use Fast Fourier transforms over binary fields, described here (though the exact implementation will be different, because this post uses a less efficient construction not based on recursive extension). -
Arithmetization. Univariate polynomials are convenient because you can do things like
$F(X+2) - F(X+1) - F(X) = Z(X) * H(X)$ to relate adjacent steps in the computation. In a hypercube, the interpretation of "the next step" is not nearly as clean as "$X + 1$". You can do$X * k$ and jump around powers of$k$ , but this jumping around behavior would sacrifice many of the key advantages of Binius. The Binius paper introduces solutions to this (eg. see Section 4.3), but this is a "deep rabbit hole" in its own right. -
How to actually safely do specific-value checks. The Fibonacci example required checking key boundary conditions:
$F(0) = F(1) = 1$ , and the value of$F(100)$ . But with "raw" Binius, checking at pre-known evaluation points is insecure. There are fairly simple ways to convert a known-evaluation check into an unknown-evaluation check, using what are called sum-check protocols; but we did not get into those here. - Lookup protocols, another technology which has been recently gaining usage as a way to make ultra-efficient proving systems. Binius can be combined with lookup protocols for many applications.
-
Going beyond square-root verification time. Square root is expensive: a Binius proof of
$2^{32}$ bits is about 11 MB long. You can remedy this using some other proof system to make a "proof of a Binius proof", thus gaining both Binius's efficiency in proving the main statement and a small proof size. Another option is the much more complicated FRI-Binius protocol, which creates a poly-logarithmic-sized proof (like regular FRI). -
How Binius affects what counts as "SNARK-friendly". The basic summary is that, if you use Binius, you no longer need to care much about making computation "arithmetic-friendly": "regular" hashes are no longer more efficient than traditional arithmetic hashes, multiplication modulo
$2^{32}$ or modulo$2^{256}$ is no longer a big headache compared to multiplication modulo$p$ , and so forth. But this is a complicated topic; lots of things change when everything is done in binary.
I expect many more improvements in binary-field-based proving techniques in the months ahead.