Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmr: added partial mmr #195

Merged
merged 1 commit into from
Oct 19, 2023
Merged

mmr: added partial mmr #195

merged 1 commit into from
Oct 19, 2023

Conversation

hackaugusto
Copy link
Contributor

@hackaugusto hackaugusto commented Oct 17, 2023

Describe your changes

Note: I'm adding a few more test, a preliminary review can be useful but I will push some additional code today.

This differs significantly to the #86 . That PR implemented a data structure that could track a single element, i.e. a single authentication path from a Mmr. This implements a structure that can track an arbitrary number of elements (from just the peaks/0 leaves, up to every leaf in the Mmr).

Design choices

  • The first iteration was based on Vec<MerklePath>. Each element in the vector was one authentication path to be updated in isolation. Issues:
    • Being a vector of vectors, to figure out the peaks of the PartialMmr one would need to iterate over all the MerklePaths
    • Updates would potentially result into dozens of allocations (the higher end for this structure is in the low thousands, that would cause a large spike of allocations on updates the reached the Vec's default capacity)
    • It would use more memory because none of the elements were shared.
  • The second iteration was based on a Vec<RpoDigest> with separate parent/child links as Vec<usize> pointers. Notes:
    • It reduced the storage to a single vector removing issues with allocation
    • The approach allowed nodes to be shared reducing memory allocation
    • Removals become way too complicated, because there were both links to the parent and children which on removal required a complete walk over the elements and rewrite of the indeces
  • This implementation is based on a stable index for the Mmr nodes, the index is computed based on a in-order tree walk.
    • It fixes the storage issue by storing all the elements into a BTree, so a single structure is used eliminating the additional allocations
    • It reduces the memory usages by sharing elements
    • Removals became simple, because the index is stable and independent of the entries, one just needs to remove the elements from the map which are no longer necessary

Some additional notes about this implementation:

  • Updates include the minimal number of elements to update the authentication path of every tree merge. However this includes elements that may not be needed by the partial mmr. This is because the request would increase in size (Maybe it makes sense to do this, but I would leave this optimization for the future if we really need it, it would require the client sending the index of the nodes it is tracking, which are just a few usize elements, allowing the server to prune more aggressively its response)
  • The result of the tree merge is not sent, meaning the user can not validate its computation. This should be okay, if we need to add checks for debugging purposes we can just download the Mmr accumulator and compare the values.

@hackaugusto
Copy link
Contributor Author

hackaugusto commented Oct 17, 2023

It is not obvious how the code works at first glance, so here is a visualization:

  • nodes with dashed border do not yet exist, they do however have an "allocated id" as represented by the values inside the graph's nodes. That is to say, the ids are computed based on a in-order tree walk (starting at 1 for the first element and incrementing by 1 on every new node visited), and there is a one-to-one mapping from id to element in the tree, these positions are used as the Mmr grows and elements are added.
    • This index is used instead of NodeIndex because it does not have the depth in it. Using the NodeIndex would cause a complete rewrite of the indeces on tree merges (because the depth increases by 1)
    • Because the Mmr trees are perfect binary trees, the total number of nodes in each branch can be computed with a simple bit shift. This in turn allows for very efficient/easy computation to go to left/right/up.
  • nodes with a white background exist but are not tracked by the partial Mmr (they are present in the full Mmr)
  • nodes inside a square have their authentication path tracked
  • nodes with a blue background are part of an authentication path, their presence also signals which nodes should be saved on updates
  • nodes with a yellow background are the peaks of the Mmr, they are used in the partial Mmr to authenticate elements that are tracked
  • the tree grows to the right, it can contain $2^{usize::BITS-1}$ leaves
flowchart BT
    id1[1]
    id2((2))
    id3((3))
    id4((4))
    id5[5]
    id6((6))
    id7((7))
    id8((8))
    id9((9))
    id10((10))
    id11((11))
    id12((12))
    id13((13))
    id14((14))
    id15((15))
    id16((16))
    id17[17]
    id18((18))
    id19((19))
    id20((20))
    id21((21))
    id22((22))
    id23((23))
    id24((24))
    id25((25))
    id26((26))
    id27((27))
    id28((28))
    id29((29))
    id30((30))
    id31((31))

    id1---id2
    id3---id2
    id5---id6
    id7---id6
    id9---id10
    id11---id10
    id13---id14
    id15---id14
    id17---id18
    id19---id18
    id21---id22
    id23---id22
    id25---id26
    id27---id26
    id29---id30
    id31---id30

    id2---id4
    id6---id4
    id10---id12
    id14---id12
    id18---id20
    id22---id20
    id26---id28
    id30---id28

    id4---id8
    id12---id8
    id20---id24
    id28---id24

    id8---id16
    id24---id16

    classDef unused stroke-width:4px,stroke-dasharray:5,fill:white;
    classDef peak fill:yellow;
    classDef untracked fill:white;

    class id27,id29,id31 unused;
    class id26,id30 unused;
    class id28 unused;
    class id24 unused;
    class id16 unused;
   
    class id17,id18,id21,id23 untracked;
    class id4,id9,id10,id11,id13,id14,id15 untracked;
    class id1,id5 untracked;

    class id25,id20,id8 peak;
Loading

Updates to the above structure depend on the state of the new full Mmr (not represented). The update procedure works as follows:

  • The state of the any Mmr, including the partial Mmr above, can be represented by their leaf count. In the example above there are 13 leaves (the odd elements from 1 to 25)
    • The binary representation of 13 is 0b1101, the bits represent exactly the size of each tree in the Mmr, left to right from biggest to smallest. This particular Mmr has one tree of size 8 rooted at element 8, another one of size 4 rooted at element 20, and one of size 1 rooted at 25
  • It is possible to compute the necessary elements for a Mmr update exclusively on the number of leaves.
    • A partial Mmr can not reference unknown elements, so the updates are only tree merges + new peaks
    • The first step of the update is to find the trees that have been merged
    • Once these trees have been identified, the procedure collects the new right siblings that are unknown to the partial Mmr
    • The procedure accounts for values that can be computed locally, and only collects the values that are new. This is equivalent to tree merges, and happens when the partial Mmr has tree of consecutive powers of two (all trees have a power of two size, here the merge is based on a tree of size $2^n$ and another $2^{n+1}$)
    • The elements mentioned above are all which are required to update the partial Mmr's authentication paths. It may be the case that none of these elements are required, this implementation does not try to send the optimal number of nodes, since that would increase the request size.
    • The last piece for the Mmr update are the new peaks

Examples:

  • When requesting an update from the above Mmr with version 13, for an Mmr with version 14.
    • There is a single new block, block 14 at index 27. That value is sent and inserted as the authentication path for node at index 25. The partial Mmr can compute the node 26 which is the new peak
  • When requesting an update to a Mmr with version 16. The values with index 27 and 30 are sent, because these are the authentication path updates, the partial Mmr compute the new root at index 16 from these values
  • When requesting an update to a Mmr with version >16. The same values as the previous point are sent, plus the new peaks.

@hackaugusto hackaugusto force-pushed the hacka-partial-mmr2 branch 2 times, most recently from a487c68 to 4495590 Compare October 17, 2023 21:04
@bobbinth bobbinth mentioned this pull request Oct 18, 2023
12 tasks
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thank you! The trick with in-order indexing is especially cool.

This is not a full review yet (I still need to internalize a few things), but I left some comments inline. Most of them are nits/typos.

src/merkle/mmr/accumulator.rs Outdated Show resolved Hide resolved
src/merkle/path.rs Outdated Show resolved Hide resolved
src/merkle/mmr/partial.rs Outdated Show resolved Hide resolved
src/merkle/mmr/partial.rs Outdated Show resolved Hide resolved
src/merkle/mmr/partial.rs Outdated Show resolved Hide resolved
src/merkle/mmr/partial.rs Outdated Show resolved Hide resolved
src/merkle/mmr/partial.rs Outdated Show resolved Hide resolved
src/merkle/mmr/partial.rs Outdated Show resolved Hide resolved
src/merkle/mmr/partial.rs Outdated Show resolved Hide resolved
src/merkle/mmr/inorder.rs Outdated Show resolved Hide resolved
@hackaugusto hackaugusto force-pushed the hacka-partial-mmr2 branch 3 times, most recently from f3e20b1 to 39f0a0f Compare October 18, 2023 14:45
@hackaugusto hackaugusto marked this pull request as ready for review October 18, 2023 14:46
@hackaugusto hackaugusto force-pushed the hacka-partial-mmr2 branch 2 times, most recently from 4061285 to 1471309 Compare October 18, 2023 19:23
@hackaugusto hackaugusto dismissed bobbinth’s stale review October 18, 2023 19:24

applied requested changes

@hackaugusto
Copy link
Contributor Author

hackaugusto commented Oct 18, 2023

one note: a thing that annoyed me while working on this, is that I have used usize for most of the values, whereas the rest of the codebase uses u64. So there are quite a few casts in the code. In hindsight I think using usize was the wrong idea, and we probably should update the code of the MMRs to start using u64 too (that not only makes things consistent, but also makes it easier to track how many elements we can store on these structures)

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I added a few more comments inline - again mostly nits.

One thing that I think would be great to do to improve code clarity/readability is to create a wrapper type for Forest - but this is probably best left for a subsequent PR (but let's create an issue for this).

src/merkle/mmr/mod.rs Outdated Show resolved Hide resolved
src/merkle/mmr/partial.rs Show resolved Hide resolved
src/merkle/mmr/partial.rs Show resolved Hide resolved
src/merkle/mmr/partial.rs Show resolved Hide resolved
src/merkle/mmr/partial.rs Outdated Show resolved Hide resolved
src/merkle/mmr/partial.rs Show resolved Hide resolved
src/merkle/mmr/full.rs Outdated Show resolved Hide resolved
src/merkle/mmr/accumulator.rs Outdated Show resolved Hide resolved
src/merkle/mmr/tests.rs Outdated Show resolved Hide resolved
src/merkle/mmr/tests.rs Outdated Show resolved Hide resolved
@hackaugusto hackaugusto force-pushed the hacka-partial-mmr2 branch 2 times, most recently from 5daa919 to 2b74429 Compare October 19, 2023 13:27
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good! Thank you! One last thing that we should do is update the changelog. Once this is done, we can merge.

@hackaugusto hackaugusto merged commit 012ad5a into main Oct 19, 2023
9 checks passed
@hackaugusto hackaugusto deleted the hacka-partial-mmr2 branch October 19, 2023 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants