-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implements concurrent Smt::compute_mutations
#365
base: next
Are you sure you want to change the base?
Implements concurrent Smt::compute_mutations
#365
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! I think the logic itself looks good. My comments are mostly about naming, docs and deduplication. I might have to take another look anyway, since I first had to understand how the Smt is implemented in sequential code 😅, so I'll just comment for now.
In general, I think adding comments to code parts that are not easy to understand would improve readability and understandability.
Regarding the approach, please correct me if I have misunderstandings, but my understanding of the approach is the following.
Assuming a tree of depth 64 with subtrees of depth 8 and mutations of just two (for example's sake) leaves at indices 0
and 65536
, compute_mutations
would do this, on a high-level and making some simple assumptions about how rayon assigns threads:
- Compute subtrees that were modified. This happens in
sorted_pairs_to_mutated_leaves
. This would yield two subtrees, covering the column ranges0..256
and65536..65792
. - Then in
build_subtree_mutations
, the subtrees are updated in parallel.- 1st iteration:
- Thread 0: Compute updates for leaves with indices
0..256
at depth 64. Then updates for leaves at depth 63 within this subtree, and so on, until it eventually results in new root at depth 56, column 0. - Thread 1: Compute updates for leaves with indices
65536..65792
at depth 64. Then updates for leaves at depth 63 within this subtree, and so on, until it eventually results in new root at depth 56, column 256 (= 65536 >> 8).
- Thread 0: Compute updates for leaves with indices
- 2nd iteration:
- Thread 0: Compute updates for leaves with indices
0..256
at depth 56 (only root 0 has changed). Eventually this results in a new root at depth 48, column 0. - Thread 1: Compute updates for leaves with indices
256..512
at depth 56 (only root 256 has changed). Eventually this results in a new root at depth 48, column 1.
- Thread 0: Compute updates for leaves with indices
- 3rd iteration:
- Thread 0: Compute updates for leaves with indices
0..256
at depth 48 (only root 0 has changed). Eventually this results in a new root at depth 40, column 0.
- Thread 0: Compute updates for leaves with indices
- More iterations like the 3rd until the root at depth 0 has been reached.
- 1st iteration:
Is this accurate? Would it make sense to add something like this as a doc comment to compute_mutations_subtree
(with corrections if it's inaccurate)?
10M entries tree. batch insertions (10k inserts): batch updates (10k updates): |
Co-authored-by: Philipp Gackstatter <[email protected]>
Hey @krushimir, quick question: Is this still Work-In-Progress or can it be marked as ready for review? |
Hi @PhilippGackstatter, I'll push some more changes today and then I'll mark it ready. |
Smt::compute_mutations
Smt::compute_mutations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: I'm surprised we don't use criterion
and rust's builtin benchmark support for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these benchmarks take a while to run (many minutes). Also, I believe @polydez tried to use criterion
but found that there is a pretty significant discrepancy in results.
@@ -233,7 +233,7 @@ impl<const DEPTH: u8> SimpleSmt<DEPTH> { | |||
&self, | |||
kv_pairs: impl IntoIterator<Item = (LeafIndex<DEPTH>, Word)>, | |||
) -> MutationSet<DEPTH, LeafIndex<DEPTH>, Word> { | |||
<Self as SparseMerkleTree<DEPTH>>::compute_mutations(self, kv_pairs) | |||
<Self as SparseMerkleTree<DEPTH>>::compute_mutations_sequential(self, kv_pairs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this sequential?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parallel implementation works only with trees whose depth is a multiple of 8 - some context here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks. I would add a comment explaining that 👍
#[cfg(feature = "concurrent")] | ||
{ | ||
self.compute_mutations_concurrent(kv_pairs) | ||
} | ||
#[cfg(not(feature = "concurrent"))] | ||
{ | ||
self.compute_mutations_sequential(kv_pairs) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually ever want sequential outside of test purposes? Can we not just have no feature split.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason is that this also needs to work in no_std
setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Co-authored-by: Philipp Gackstatter <[email protected]>
9242cff
to
e89daa9
Compare
Quality Gate passedIssues Measures |
This PR introduces a concurrent implementation of
Smt::compute_mutations
, leveraging an approach similar to the existing parallel construction logic.Benchmark results were collected on a 64-core (128-thread) AMD EPYC 7662 processor, with Rayon’s thread pool explicitly limited to the specified thread counts.
For context, construction benchmarks are also included for performance comparison.
1. Construction Benchmark
10k key-value pairs
2. Batched Insertion Benchmark
10k key-value pairs
3. Batched Update Benchmark
10k key-value pairs