Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] benchmarks for hashing single subtrees recursively #334

Closed
wants to merge 4 commits into from

Conversation

Qyriad
Copy link
Contributor

@Qyriad Qyriad commented Oct 16, 2024

This code is a benchmark for hashing a single, depth-8 subtree of a sparse Merkle tree by recursively establishing child hashes, adapted from work in progress code using this method to compute subtrees in parallel.

Raw results, the number of the left indicating the number of new key-value pairs:

subtree8/32             time:   [72.467 µs 75.138 µs 77.577 µs]
subtree8/128            time:   [76.635 µs 78.540 µs 80.337 µs]
subtree8/512            time:   [104.26 µs 107.99 µs 111.24 µs]
subtree8/1024           time:   [137.81 µs 142.94 µs 148.87 µs]
subtree8/8192           time:   [629.44 µs 708.05 µs 797.79 µs]

The time it takes to hash a subtree increases linearly with respect to the amount of key-value pairs being added to the tree as a whole.

@bobbinth
Copy link
Contributor

Not a review yet, but looking briefly through the code I think I had something way simpler in mind. What I think we need as a basic building block of the algorithm is constructing a logical SMT (i.e., not our specific Smt) of depth 8 from a given set of nodes. This can probably be just a single function that looks something like this:

/// Builds a set of nodes for a Merkle tree of depth 8 from the specified set of leaves. The nodes are
/// appended to the `inner_nodes` map. The leaves are assumed to be located at the specified depth.
pub fn build_subtree(
    leaves: impl IntoIterator<Item = (u64, Digest)>
    leaf_depth: u8,
    &mut inner_nodes: BTreeMap<NodeIndex, InnerNode>,
)

Once we have this working (and assuming it is efficient), we can use it to build various levels of the actual SMTs. For example, for our Smt, the process could look like so:

  1. Compute and sort all the leaves (we should be able to do most of this in parallel).
  2. Call build_subtree() for each set of leaves forming a depth 8 subtree (here, leaf_depth = 64).
  3. Use the results of the previous step to get a new set of leaves and call build_subtree() on them again (now, leaf_depth = 48).
  4. Use the results of the previous step to get a new set of leaves and call build_subtree() on them again (now, leaf_depth = 32).
  5. Use the results of the previous step to get a new set of leaves and call build_subtree() on them again (now, leaf_depth = 16).
  6. Use the results of the previous step to get a new set of leaves and call build_subtree() on them again (now, leaf_depth = 8).

@bobbinth
Copy link
Contributor

/// Builds a set of nodes for a Merkle tree of depth 8 from the specified set of leaves. The nodes are
/// appended to the `inner_nodes` map. The leaves are assumed to be located at the specified depth.
pub fn build_subtree(
    leaves: impl IntoIterator<Item = (u64, Digest)>
    leaf_depth: u8,
    &mut inner_nodes: BTreeMap<NodeIndex, InnerNode>,
)

Actually, this may not be very parallelizable since BTreeMap cannot be mutated in parallel. An alternative could look something like this:

pub fn build_subtree(
    leaves: impl IntoIterator<Item = (u64, Digest)>
    leaf_depth: u8,
) -> BTreeMap<NodeIndex, InnerNode>

And then we can merge BTreeMap's in a single thread (assuming this is a relatively fast process).

@Qyriad Qyriad force-pushed the wip/qyriad/bench-subtree branch from 56087c7 to 9638969 Compare October 24, 2024 01:30
@Qyriad
Copy link
Contributor Author

Qyriad commented Oct 24, 2024

Alright, I've pushed a simpler implementation much closer to what you suggested, and micro-benchmarks for it. This implementation takes the leaves pre-sorted, since presumably we'll want to only sort at the beginning. The benchmarks don't include the sort time, though I can easily change that. The benchmarks look like this:

subtree8-even/64        time:   [2.6322 ms 2.6329 ms 2.6338 ms]
Found 4 outliers among 60 measurements (6.67%)
  2 (3.33%) low mild
  2 (3.33%) high mild
subtree8-even/128       time:   [5.1247 ms 5.1529 ms 5.1772 ms]
Found 10 outliers among 60 measurements (16.67%)
  10 (16.67%) high severe
subtree8-even/192       time:   [7.8698 ms 7.8721 ms 7.8750 ms]
subtree8-even/256       time:   [10.500 ms 10.504 ms 10.507 ms]

subtree8-rand/64        time:   [1.0604 ms 1.0683 ms 1.0765 ms]
Found 9 outliers among 60 measurements (15.00%)
  6 (10.00%) low severe
  2 (3.33%) high mild
  1 (1.67%) high severe
subtree8-rand/128       time:   [2.1299 ms 2.1355 ms 2.1397 ms]
subtree8-rand/192       time:   [3.4609 ms 3.4727 ms 3.4815 ms]
subtree8-rand/256       time:   [5.0838 ms 5.0890 ms 5.0942 ms]
Found 2 outliers among 60 measurements (3.33%)
  1 (1.67%) low mild
  1 (1.67%) high mild

There seems to always be several outliers, no matter how quiet I make my system. Here's the output without the outlier diagnostic-noise, for easier reading:

subtree8-even/64        time:   [2.6322 ms 2.6329 ms 2.6338 ms]
subtree8-even/128       time:   [5.1247 ms 5.1529 ms 5.1772 ms]
subtree8-even/192       time:   [7.8698 ms 7.8721 ms 7.8750 ms]
subtree8-even/256       time:   [10.500 ms 10.504 ms 10.507 ms]

subtree8-rand/64        time:   [1.0604 ms 1.0683 ms 1.0765 ms]
subtree8-rand/128       time:   [2.1299 ms 2.1355 ms 2.1397 ms]
subtree8-rand/192       time:   [3.4609 ms 3.4727 ms 3.4815 ms]
subtree8-rand/256       time:   [5.0838 ms 5.0890 ms 5.0942 ms]

It also turns out that I did the math for roughly-evenly distributed leaves incorrectly, for the benchmarks this PR had originally. I at first made this mistake in this new benchmark too, and was astonished to see the performance jump from microsecond figures to millisecond figures going from supposedly evenly distributed data to random data. I was accidentally generating far too many leaves with the same index, which were then getting de-duplicated. After fixing that, the even benchmarks are now in the same order of magnitude as the random ones.

@bobbinth
Copy link
Contributor

Thank you! A couple of follow up questions:

How much time does it take to build a tree for a single leaf? The reason I'm asking is that vast majority of the time we'd be building trees that have just one leaf in them. For example, assuming the leaves are randomly distributed, if we have 100M leaves, the subtrees up until depth 24 are very likely to be just single-leaf trees.

How does the timing for building a tree from 256 leaves compare to the timing for building a fully balanced MerkleTree with 256 leaves? I'm curious because the fully-balanced case should give us the lower bound on performance as most of the time there should be spent hashing.

@Qyriad
Copy link
Contributor Author

Qyriad commented Oct 25, 2024

Good questions! I'll find out!

Copy link

@Qyriad
Copy link
Contributor Author

Qyriad commented Oct 25, 2024

And here are the results:

balanced-merkle-even    time:   [1.2906 ms 1.2907 ms 1.2909 ms]
balanced-merkle-rand    time:   [1.2834 ms 1.2859 ms 1.2877 ms]

subtree8-even/1         time:   [40.945 µs 40.948 µs 40.951 µs]
subtree8-even/64        time:   [2.6020 ms 2.6023 ms 2.6026 ms]
subtree8-even/128       time:   [5.1623 ms 5.1783 ms 5.1898 ms]
subtree8-even/192       time:   [7.7192 ms 7.7501 ms 7.7729 ms]
subtree8-even/256       time:   [10.127 ms 10.180 ms 10.233 ms]

subtree8-rand/1         time:   [40.610 µs 40.733 µs 40.820 µs]
subtree8-rand/64        time:   [1.0627 ms 1.0637 ms 1.0647 ms]
subtree8-rand/128       time:   [2.1227 ms 2.1256 ms 2.1283 ms]
subtree8-rand/192       time:   [3.4584 ms 3.4629 ms 3.4672 ms]
subtree8-rand/256       time:   [5.0221 ms 5.0341 ms 5.0430 ms]

@bobbinth
Copy link
Contributor

41 microseconds for a single-leaf case is pretty good!

A bit surprising though that hashing a fully-balanced 256-leaf tree is about 4x more efficient than building a subtree with 256 leaves (I was thinking it'd be closer to 2x). I think this is fine for now and we can definitely optimize this more in the future (let's create an issue for this).

The next step would be to use this method as a building block for building a full tree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants