a16z · VolodymyrBg · Dec 17, 2024 · Dec 17, 2024 · Dec 17, 2024 · Dec 17, 2024
diff --git a/book/src/background/binius/multiplicative-generator.md b/book/src/background/binius/multiplicative-generator.md
@@ -17,7 +17,7 @@ This algorithm ensures that the element $x$ does not reside in any proper subgro
 ## Background:
 - **Group Structure**: $\text{GF}(2^k)^*$ is the multiplicative group of the field $\text{GF}(2^k)$, containing all field elements except 0. Its order is $2^k - 1$, which is prime to 2.
 - **Maximal Proper Divisors**: These divisors are crucial as testing powers for these values ensures that $x$ is not a member of any smaller cyclic subgroup. The divisors of $2^k - 1$ are used to verify the generator property.
-- **Probabilistic Success Rate**: The likelihood of a random element being a generator is significant, given the structure of the group. This can be supported by the [[chinese-remainder-theorem]], which suggests that the intersection of several smaller groups (subgroups corresponding to divisors) is likely non-trivial only when all subgroup conditions are simultaneously satisfied.
+- **Probabilistic Success Rate**: The likelihood of a random element being a generator is significant, given the group's structure. This can be supported by the [[chinese-remainder-theorem]], which suggests that the intersection of several smaller groups (subgroups corresponding to divisors) is likely non-trivial only when all subgroup conditions are simultaneously satisfied.
 ## Example Divisors:
 - For $\text{GF}(2^{16})^*$, $|\mathbb{F}^*| = 2^{16} - 1 = 3 \times 5 \times 17 \times 257$
 - For $\text{GF}(2^{32})^*$, $|\mathbb{F}^*| = 2^{32} - 1 = 3 \times 5 \times 17 \times 257 \times 65537$
@@ -27,7 +27,7 @@ This algorithm ensures that the element $x$ does not reside in any proper subgro
 # Maximal Proper Divisors
 A maximal proper divisor of a number $n$ is a divisor $d$ of $n$ which is neither $1$ nor $n$ itself, and there are no other divisors of $n$ that divide $d$ except $1$ and $d$. Essentially, $d$ is a divisor that is not a multiple of any smaller divisor other than $1$ and itself, making it 'maximal' under the set of proper divisors.
 ## Algorithm for Finding
-The algorithm to find the maximal proper divisors of a given number $n$ involves identifying all divisors of $n$ and then selecting those which do not have other divisors besides $1$ and themselves. The steps are as follows:
+The algorithm to find the maximal proper divisors of a given number $n$ involves identifying all divisors of $n$ and then selecting those that do not have other divisors besides $1$ and themselves. The steps are as follows:
 1. **Find All Divisors**: First, list all divisors of $n$ by checking for every integer $i$ from $1$ to $\sqrt{n}$ if $i$ divides $n$. If $i$ divides $n$, then both $i$ and $n/i$ are divisors.
 2. **Filter Maximal Divisors**: From this list of divisors, exclude $1$ and $n$. For each remaining divisor, check if it can be expressed as a multiple of any other divisor from the list (other than $1$ and itself). If it cannot, then it is a maximal proper divisor.
 
@@ -52,4 +52,4 @@ function findMaximalProperDivisors(n):
         if is_maximal:
             maximal_proper_divisors.append(d)
 
-    return maximal_proper_divisors
+    return maximal_proper_divisors
diff --git a/book/src/background/multilinear-extensions.md b/book/src/background/multilinear-extensions.md
@@ -29,4 +29,4 @@ for i in 0..half {
 ```
 
 ### Multi Variable Binding
-Another common algorithm is to take the MLE $\tilde{f}(x_1, ... x_v)$ and compute its evaluation at a single $v$-variate point outside the boolean hypercube $x \in \mathbb{F}^v$. This algorithm can be performed in $O(n)$ time by preforming the single variable binding algorithm $\log(n)$ times. The time spent on $i$'th variable binding is $O(n/2^i)$, so the total time across all $\log n$ bindings is proportional to $\sum_{i=1}^{\log n} n/2^i = O(n)$. 
+Another common algorithm is to take the MLE $\tilde{f}(x_1, ... x_v)$ and compute its evaluation at a single $v$-variate point outside the boolean hypercube $x \in \mathbb{F}^v$. This algorithm can be performed in $O(n)$ time by performing the single variable binding algorithm $\log(n)$ times. The time spent on $i$'th variable binding is $O(n/2^i)$, so the total time across all $\log n$ bindings is proportional to $\sum_{i=1}^{\log n} n/2^i = O(n)$. 
diff --git a/book/src/background/risc-v.md b/book/src/background/risc-v.md
@@ -44,7 +44,7 @@ For detailed instruction formats and encoding, refer to the __chapter 2__ of [sp
 
 - Maintains the simple register-based architecture of RV32I
 
-- Results always written to a single 32-bit register (for upper/lower multiplication results, two separate instructions are used)
+- Results are always written to a single 32-bit register (for upper/lower multiplication results, two separate instructions are used)
 
 - All instructions in this extension are encoded in the standard 32-bit RISC-V format
 

diff --git a/book/src/future/continuations.md b/book/src/future/continuations.md
@@ -10,7 +10,7 @@ Jolt will pursue both approaches to prover space control. Below, we provide more
 
 Continuations work by breaking a large computation into “chunks”, proving each chunk (almost) independently, and recursively aggregating the proofs (i.e., proving one knows the proofs for each chunk). 
 
-Continuations come in two flavors: "brute-force recursion" and folding. In brute-force recursion, the proofs for different chunks are aggregated by having the prover prove it knows each proof. Roughly speaking, the verifier of each proof is repesented as a circuit, and a SNARK is used to prove that the prover knows a satisfying assignment for each circuit. 
+Continuations come in two flavors: "brute-force recursion" and folding. In brute-force recursion, the proofs for different chunks are aggregated by having the prover prove it knows each proof. Roughly speaking, the verifier of each proof is represented as a circuit, and a SNARK is used to prove that the prover knows a satisfying assignment for each circuit. 
 
 In folding schemes, the "proof" for each chunk is actually only a "partial proof", in particular omitting any evaluation proofs for any committed polynomials. The partial proofs for each chunk are not explicitly checked by anyone. Rather, they are "aggregated" into a single partial proof, and that partial proof is then "completed" into a full SNARK proof. In folding schemes, the prover winds up recursively proving that it correctly aggregated the partial proofs. This has major performance benefits relative to "brute force recursion", because aggregating proofs is much simpler than checking them. Hence, proving aggregation was done correctly is much cheaper than proving full-fledged proof verification was done correctly.
 

diff --git a/book/src/future/folding.md b/book/src/future/folding.md
@@ -3,7 +3,7 @@
 The plan to implement folding is simple, with a (very) sketchy overview provided below. 
 
 <OL>
-  <LI> Verifying Jolt proofs involves two procedures: verifiying sum-check proofs, and folding 
+  <LI> Verifying Jolt proofs involves two procedures: verifying sum-check proofs, and folding 
 polynomial evaluation claims for committed polynomials. </LI>
 
 <LI> Running Nova with BN254 as the primary curve, we can simply verify sum-check proofs natively. </LI>
@@ -21,7 +21,7 @@ and the details will be fully fleshed out in an upcoming e-print.</LI>
 </OL>
 
 Note that this plan does not require "non-uniform folding". The fact that there are many different primitive RISC-V 
-instructions is handled by "monolithic Jolt". Folding is merely applied to accumluate many copies of the same claim,
+instructions is handled by "monolithic Jolt". Folding is merely applied to accumulate many copies of the same claim,
 namely that a Jolt proof (minus any HyperKZG evaluation proof) was correctly verified. 
 
 # Space cost estimates

diff --git a/book/src/future/groth-16.md b/book/src/future/groth-16.md
@@ -22,7 +22,7 @@ are now in progress).
 Each scalar multiplication costs about 400 group operations, each of which costs about 10 field multiplications,
 so that's about $150 \cdot 400 \cdot 10=600k$ field multiplications.
 The real killer is that these field multiplications must be done non-natively in constraints, due to the fact
-that that BN254 does not have a pairing-friendly  "sister curve" (i.e., a curve whose scalar field matches the BN254 base field).
+that BN254 does not have a pairing-friendly  "sister curve" (i.e., a curve whose scalar field matches the BN254 base field).
 This means that each of the $600k$ field multiplications costs thousands of constraints. 
 
 On top of the above, the two pairings done by the HyperKZG verifier, implemented non-natively in constraints, 

diff --git a/book/src/future/proof-size-breakdown.md b/book/src/future/proof-size-breakdown.md
@@ -47,7 +47,7 @@ and one attests to the validity of initialization of memory plus a final pass ov
 The reason we do not run these grand products "together as one big grand product" is they are 
 each potentially of different sizes,
 and it is annoying (though possible) to "batch prove" differently-sized grand products together.
-However, a relatively easy way to get down to 3 grand prodcuts is to set the memory size
+However, a relatively easy way to get down to 3 grand products is to set the memory size
 in each of the three categories above to equal the number of reads/writes. This simply involves 
 padding the memory with zeros to make it equal in size to 
 the number of reads/writes into the memory (i.e., NUM_CYCLES). Doing this will not substantially increase

diff --git a/book/src/future/zk.md b/book/src/future/zk.md
@@ -3,7 +3,7 @@ One way to achieve zero-knowledge is to simply compose Jolt with a zero-knowledg
 A second way to achieve zero-knowledge is to combine Jolt with folding, which we will do regardless, in order to make the prover space independent of the number of RISC-V cycles being proven. As described in Section 7 of the latest version of the [HyperNova paper](https://eprint.iacr.org/2023/573),
 one can straightforwardly obtain zero-knowledge directly from folding, without composition with a zkSNARK like Groth16. 
 
-There are also ways to make Jolt zero-knowledge without invoking SNARK composition. For example, rendering sum-check-based SNARKs zero-knowledge without using composition was exactly the motivation for [Zeromorph](https://eprint.iacr.org/2023/917.pdf), which introduces a very efficienct zero-knowledge variant of KZG commitments for multilinear polynomials.
+There are also ways to make Jolt zero-knowledge without invoking SNARK composition. For example, rendering sum-check-based SNARKs zero-knowledge without using composition was exactly the motivation for [Zeromorph](https://eprint.iacr.org/2023/917.pdf), which introduces a very efficient zero-knowledge variant of KZG commitments for multilinear polynomials.
 
 If we use the Zeromorph polynomial commitment scheme, the commitment and any evaluation proof are hiding (they reveal nothing about the committed polynomial, and still give the verifier a commitment to the requested evaluation of the committed polynomial). One still needs to ensure that the various applications of the sum-check protocol in Jolt also do not leak any information about the witness. Here, techniques based on masking polynomials apply (see Section 13.3 of [Proofs, Arguments, and Zero-Knowledge](https://people.cs.georgetown.edu/jthaler/ProofsArgsAndZK.html) for a sketchy overview). However, the use of masking polynomials requires the prover to be able to commit to non-multilinear polynomials and hence introduce significant (but surmountable) issues.
 

diff --git a/book/src/how/instruction_lookups.md b/book/src/how/instruction_lookups.md
@@ -10,7 +10,7 @@ Lookup arguments allow the prover to convince the verifier that for a committed
 vector of indices $a$, and lookup table $T$, $T[a_i]=v_i$ for all $i$. 
 
 Lasso is a special lookup argument with highly desirable asymptotic costs largely correlated to the number of lookups (the length of the vectors $a$ and $v$),
-rather than the length of of the table $T$.
+rather than the length of the table $T$.
 
 A conversational background on lookups can be found [here](https://a16zcrypto.com/posts/article/building-on-lasso-and-jolt/). In short: lookups are great for zkVMs as they allow constant cost / developer complexity for the prover algorithm per VM instruction.
 

diff --git a/book/src/how/m-extension.md b/book/src/how/m-extension.md
@@ -113,5 +113,5 @@ If the current instruction is virtual, we can constrain the next instruction in
 next instruction in the bytecode.
 We observe that the virtual sequences used in the M extension don't involve jumps or branches,
 so this should always hold, *except* if we encounter a virtual instruction followed by a padding instruction.
-But that should never happend because an execution trace should always end with some return handling,
+But that should never happened because an execution trace should always end with some return handling,
 which shouldn't involve a virtual sequence.
diff --git a/book/src/how/r1cs_constraints.md b/book/src/how/r1cs_constraints.md
@@ -21,7 +21,7 @@ The inputs required for the constraint system for a single CPU step are:
 
 #### Pertaining to read-write memory
 * The (starting) RAM address read by the instruction: if the instruction is not a load/store, this is 0.
-* The bytes written to or read from memory.
+* The bytes are written to or read from memory.
 
 ####  Pertaining to instruction lookups
 * The chunks of the instruction's operands `x` and `y`.
@@ -44,7 +44,7 @@ the preprocessed bytecode in Jolt.
     1. `ConcatLookupQueryChunks`: Indicates whether the instruction performs a concat-type lookup.
     1. `Virtual`: 1 if the instruction is "virtual", as defined in Section 6.1 of the Jolt paper.
     1. `Assert`: 1 if the instruction is an assert, as defined in Section 6.1.1 of the Jolt paper.
-    1. `DoNotUpdatePC`: Used in virtual sequences; the program counter should be the same for the full seqeuence.
+    1. `DoNotUpdatePC`: Used in virtual sequences; the program counter should be the same for the full sequence.
 * Instruction flags: these are the unary bits used to indicate instruction is executed at a given step.
 There are as many per step as the number of unique instruction lookup tables in Jolt.
 
@@ -71,7 +71,7 @@ The main changes involved in making this happen are:
 - Spartan is modified to only take in the constraint matrices a single step, and the total number of steps.
 Using this, the prover and verifier can efficiently calculate the multilinear extensions of the full R1CS matrices.
 - The commitment format of the witness values is changed to reflect uniformity.
-All versions of a variable corresponding to each time step is committed together.
+All versions of a variable corresponding to each time step are committed together.
 This affects nearly all variables committed to in Jolt.
 - The inputs and witnesses are provided to the constraint system as segments.
 - Additional constraints are used to enforce consistency of the state transferred between CPU steps.

diff --git a/book/src/how/sparse-constraint-systems.md b/book/src/how/sparse-constraint-systems.md
@@ -50,7 +50,7 @@ we can "switch over" to the standard "dense" linear-time sum-check proving algor
 so that $n/2^i \approx m$. In Jolt, we expect this "switchover" to happen by round $4$ or $5$. 
 In the end, the amount of extra field work done by the prover owing to the sparsity will only be a factor of $2$ or so.
 
-Jolt uses this approach within Lasso as well. Across all of the primtive RISC-V instructions,
+Jolt uses this approach within Lasso as well. Across all of the primitive RISC-V instructions,
 there are about 80 "subtables" that get used. Any particular primitive instruction only needs
 to access between 4 and 10 of these subtables. We "pretend" that every primitive instruction
 actually accesses all 80 of the subtables, but use binary flags to "turn off" any subtable
@@ -63,7 +63,7 @@ commitment time, and that our grand product prover does not pay any field work f
 There are alternative approaches we could take to achieve "a la carte" prover costs, e.g., [vRAM](https://web.eecs.umich.edu/~genkin/papers/vram.pdf)'s approach
 of having the prover sort all cycles by which primitive operation or pre-compile was executed at that cycle
 (see also the much more recent work [Ceno](https://eprint.iacr.org/2024/387)).
-But the above approach is compatable with a streaming prover, avoids committing to the same data multiple times,
+But the above approach is compatible with a streaming prover, avoids committing to the same data multiple times,
 and has other benefits.
 
 We call this technique (fast proving for) "sparse constraint systems". Note that the term sparse here