-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the merkle tree implementation perform better. #2273
Conversation
It can be easy to look at small average step and hash times and miss that the total time is what we're really trying to reduce.
I definitely had some incorrect assumptions about this data structure which made it more difficult to learn. So, I'm documenting how it works and adding some tests. The simple_merkle test is currently failing because the `set` method doesn't allow setting an index larger than the largest currently set leaf's index. There is some debate as to whether or not this is the correct behavior. To run the test, use: ``` $> cargo test -- --include-ignored ```
At this point, the new root hash is eagerly calculated after each call to `extend`.
If this happened frequently, it should really improve the perfomance of the machine. However, it looks like it doesn't happen at all with the benchmark inputs.
This change introduces internal mutability to the merkle tree implementation in the form of a dirty_indices HashSet<usize> which keeps track of the indices of the parent nodes for leaves which have changed since the last time the hashes for all of the non-leaf nodes in the tree were calculated. The `set` method was changed to only modify the leaf's value and add the parent node to the set of dirty indices. The hashes of all the non-leaf nodes which are dirty and all their ancestors up to the root node are recalcualted just before returning the value of the root. This should allow us to reuse previously allocated merkle trees rather than having to throw them out and recaculate the hashes of all the nodes every time a leaf's value is changed.
Previously, it could hit an index out of bounds if the new leafs caused any parent layer to grow beyond its current size.
Hopefully, this will allow us to compare this branch's implementation of a merkle tree to the one on merkle-perf-a.
This gives us ordered traversal of the dirty indices. The performance benchmarks show that this severely regresses over the implmenetation on merkle-perf.
The previous implementation was growing the same layers and dirty_indices arrays because the clone isn't deep (I guess.)
This was just for troubleshooting.
Specifically, this pre-allocates the dirty HashSets for each layer instead of having to allocate new ones each time rehash is called.
There are a few different things going on in this commit. 1. I've added some counters for when methods get called on the Merkle tree. 2. I've added integration with gperftools for profiling specific areas of the code.
This allows me to profile CPU and Heap independently, and to enable and disable the call counters independently.
This part of the code is obviously slow. Let's see if we can improve it.
This nets a pretty variable 4%-7% savings in the micro bnechmark.
This saves every instance of a Merkle Tree from having to initialize its own vector of empty hashes.
This part of the code is obviously slow. Let's see if we can improve it.
This allows the most efficient lookup possible for empty layer hashes.
Yay! Tests caught typos.
This should always be the case because the dirt is cleaned up in order. But, this asserttion can catch it if we get it wrong.
Keeping track of the layers_i wasn't really making the code easier to read.
This makes the code a little cleaner, and it performs the same or slightly better. Before: ``` Running benchmark with always merkleize feature on avg hash time 226.487µs, avg step time 226ns, step size 1, num_iters 200, total time 45.356958ms avg hash time 285.669µs, avg step time 3.454µs, step size 1024, num_iters 200, total time 57.836958ms avg hash time 673.086µs, avg step time 141.763µs, step size 32768, num_iters 200, total time 162.983ms avg hash time 800.159µs, avg step time 3.333175ms, step size 1048576, num_iters 200, total time 826.676667ms avg hash time 2.394079ms, avg step time 54.744735ms, step size 16777216, num_iters 134, total time 7.711356917s avg hash time 6.981576ms, avg step time 221.284475ms, step size 67108864, num_iters 33, total time 7.754069s avg hash time 23.845249ms, avg step time 826.907254ms, step size 268435456, num_iters 8, total time 7.632934667s ``` After: ``` Running benchmark with always merkleize feature on avg hash time 223.077µs, avg step time 196ns, step size 1, num_iters 200, total time 44.670042ms avg hash time 283.239µs, avg step time 4.477µs, step size 1024, num_iters 200, total time 57.556875ms avg hash time 631.034µs, avg step time 139.475µs, step size 32768, num_iters 200, total time 154.115083ms avg hash time 813.914µs, avg step time 3.357342ms, step size 1048576, num_iters 200, total time 834.265ms avg hash time 2.39359ms, avg step time 55.253016ms, step size 16777216, num_iters 134, total time 7.779911875s avg hash time 6.768607ms, avg step time 222.297451ms, step size 67108864, num_iters 33, total time 7.781483917s avg hash time 25.057057ms, avg step time 840.610754ms, step size 268435456, num_iters 8, total time 7.765957833s ```
This commit also caught an error in the logic of the resize() impleentation, and fixes it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressing a slew of review comments. Hot-potato back to @PlasmaPower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I have a couple of follow-up optimizations but these can be done in a new PR
arbitrator/prover/src/merkle.rs
Outdated
// Replace the dirty leaf parents with clean ones. | ||
let mut dirt = std::mem::replace( | ||
&mut layers.dirty_leaf_parents, | ||
bitvec![0; (layers.data[0].len() + 1) >> 1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires reallocating a new bitvec. Instead, I think you can just iterate over layers.dirty_leaf_parents.iter_ones()
and then at the end of the function call layers.dirty_leaf_parents.fill(false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably just too much of a rust newbie. But, I cannot figure out how to do this without angering the borrow-checker.
The problem is that I want to have a local variable that I can use in the inner for loop (i.e. dirt.iter_ones
) which I can assign the value of the new_dirt
at the end of that inner loop without affecting the layers.dirty_leaf_parents
values.
But, if I simply:
let mut dirt = &mut layers.dirty_leaf_parents
Then, when I try to:
dirt = new_dirt
the compiler tells me I've got mismatched types because dirt
is a &mut BitVec
and new_dirt
is just a BitVec
.
But, if I try to dereference dirt
for the assignment (*dirt = new_dirt
), then it overwrites the value of layers.dirty_leaf_parents
which means when I go to fill(false)
at the end of the outter for loop, the BitVec
is not big enough.
If I try to take a mutable reference to new_dirt
(dirt = &mut new_dirt
) the compiler is unhappy because the new_dirt variable is about to drop from scope (or maybe it's because of duplicate mutable borrows, I don't remmember.)
I suppose I could capture the size of layers.dirty_leaf_parents
before the outter for loop, and then use layers.dirty_leaf_parents.fill(false)
, layers.dirty_leaf_parents.resize(orig_len, false)
at the end of the outter loop.
Another idea I had, but discardded because the code was really ugly was to conditionally iterate over either layers.dirty_leaf_parents.iter_ones()
or dirt.iter_ones
depending on whether or not layer_i == 1
but that feels sloppy.
For now, I'm just using let mut dirt = layers.dirty_leaf_parents.clone()
, but doesn't that also require allocating a new bitvec?
Looks like the clippy linter has some concerns https://github.com/OffchainLabs/nitro/actions/runs/10022178302/job/27701676765?pr=2273#step:20:462 I don't like the last lint, let's either disable it (as we disable others in |
This way online documentation can be generated correctly.
This way, it's clearer what the end-state of the `dirty_leaf_parents` is after `rehash` is called?
Cleaned up the lint. |
@PlasmaPower, I'm going to need one more review pass from you. Also, I don't quite follow if you are suggesting to change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Here are the main high-level ideas that this PR implements:
Memory::resize()
calls.Merkle
have internal mutability. This means usingArc<Mutex<>>>
to wrap the mutable internal data structures such that they are thread-safe and only change atomically from the point of view of the public API for the package.Merkel
tree have been made "dirty" (because ofset
calls.) Then, the root hash of the tree (and all dirty branches) are recalculated on-demand just before calculating the proof.Put it all together, and we're starting to see real performance improvements. Those are largely documented in the Linear issue linked below.
Related to https://linear.app/offchain-labs/issue/NIT-2411/arbitrator-optimizations