-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] full-chain membership proof++ integration #9436
base: master
Are you sure you want to change the base?
Conversation
else () | ||
set(CARGO_CMD cargo build --target "${RUST_TARGET}" --release) | ||
set(TARGET_DIR "release") | ||
endif () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a quick skim though of this CMakeLists.txt, it might be a good idea to put all general Rust FFI stuff (i.e. before this line) into a file by itself (perhaps in the /cmake/ folder idk), and open a PR specifically for that. I have a sneaking suspicion that we will be seeing more Rust FFI stuff in the future. The people who are good at all the build stuff (e.g. @selsta, @tobtoht, etc) could give this portion a nice read through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely planning to open a PR specifically for the Rust FFI first once it's ready. I'm thinking if/once we do see Rust FFI stuff used for other portions of the code, we can think about how to re-structure this portion once it's clear what new Rust code we want. Could avoid a premature debate when it might end up looking different in the future
# If a developer runs cargo build inside this sub-directory to only work with | ||
# the Rust side of things, they'll create this target directory which shouldn't | ||
# be committed | ||
target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to move the target
dir to the main build
directory? Could be helpful if testing out different branches in development?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it's an argument for the target dir with adjusted paths everywhere relevant.
@@ -31,36 +31,47 @@ jobs: | |||
toolchain: | |||
- name: "RISCV 64bit" | |||
host: "riscv64-linux-gnu" | |||
rust_host: "riscv64gc-unknown-linux-gnu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this stuff too should go into a "General Rust FFI" PR IMO
|
||
SelenePoint selene_hash_init_point(); | ||
|
||
uint8_t *helios_scalar_to_bytes(HeliosScalar helios_scalar); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C linkage with C++ types? I guess this is valid, but I'm not too sure about the rules on this quirk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What type here is being considered C++? The scalars are from Rust with their layout defined in C code above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HeliosScalar
has an internal type Residue<32 / sizeof(uint32_t)>
which is a C++ template. This probably works because no fancy "decorators" are needed for the linkage of the outer type, but I'm not certain if the internal C++ type changes anything. At least on the platforms tested so far, it hasn't mattered.
Presumably since C++ has strict layout rules, the static_assert
on sizeof
catches the only possible issue - internal padding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious if you might have an alternative approach in mind for this code
#[no_mangle] | ||
pub extern "C" fn helios_point_from_bytes(helios_point: *const u8) -> HeliosPoint { | ||
let mut helios_point = unsafe { core::slice::from_raw_parts(helios_point, 32) }; | ||
// TODO: Return an error here (instead of unwrapping) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing to mark as \TODO
is how to handle Rust stack unwinds. An unwind has undefined behavior once it hits an FFI boundary. There's a hidden (and very un-rust) utility called catch_unwind
that we may have to deploy in some of these functions to keep information details from leaking.
Unfortunately, I'm not really certain on what you do once you catch the unwind. The Rust code is an invalid state which means - immediate abort? Some quick Google searches didn't yield much insight as to what the standard practice is at the FFI boundary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does libunwind
allow "pausing" the winding such that it won't be undefined to call the FFI, but just loses the information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust will abort the process here if it was to panic: https://doc.rust-lang.org/nomicon/ffi.html#panic-can-be-stopped-at-an-abi-boundary
We could use catch_unwind
to prevent the abort, however we are also setting panic = "abort"
in the Cargo.toml
which will just immediately abort on any panic as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Boog900 I missed that. That's probably the only thing that can be done on panic. I don't even know how/why catch_unwind
is allowed, seems like it ruins tons of invariants if used to do something (other than abort).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figure it makes sense to catch and handle errors, while panics should reasonably abort (and panics should never trigger unless the code is broken). Sounds like you'd agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. If there's an error returned on the call stack, you can probably bubble that up through the FFI, otherwise you basically have to abort. Rust code really isn't written to be "exception-safe", so lots of invariants would break if an unwind occurs.
if (result.err != nullptr) | ||
{ | ||
free(result.err); | ||
throw std::runtime_error("failed to hash grow"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust wont allocate for zero-sized types: https://doc.rust-lang.org/std/boxed/struct.Box.html#method.new
Which means the pointer you are freeing here is invalid and the other places using a ptr to ()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the pointer be null in that case (making this obviously broken for that reason, as a returned error won't be identified as an error)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://doc.rust-lang.org/std/boxed/struct.Box.html#method.into_raw
Will be non-null. I'm unsure the free behavior here 0_o
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO for this
// https://github.com/rust-lang/rust/issues/79609 | ||
#[cfg(all(target_os = "windows", target_arch = "x86"))] | ||
#[no_mangle] | ||
pub extern "C" fn _Unwind_Resume() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If mingw-w64 was compiled with dwarf2 exceptions enabled this will cause a multiple definitions error. I do this in #9440, because if you cross-compile the rust standard library with sjlj mingw-w64, you run into the same undefined references to unwind issue. Building std without unwinding should be possible, but I ran into problems. Removing these lines here breaks CI, because distros package sjlj mingw-w64.
I'll try to come up with the least hacky way to work around this. IMO, we could consider dropping the 32-bit Windows target. Windows 11 requires 64-bit and I can't imagine anyone using a >18 year old CPU to run Windows 10.
Highlighting two items proposed by @jeffro256 that I intend to implement:
I think these tasks are ok to keep on the back-burner for now, but noting it for prospective reviewers and/or discussion. Note that the second task would reduce the usefulness of the first (since deep reorgs are expected to be very unlikely hence the 10 block lock in the first place); however, I would argue the first is still worth doing especially because it would avoid exposing another RPC route where the daemon needs to do expensive computation to serve the request. Plus reducing complexity of this PR is a major win. I intend to implement both of these tasks before marking code from this PR ready for review. |
Hey man, I just want to say what you are doing here is absolutely incredible. If this is successful and integrated into Monero it would move it into a new stratosphere of privacy, incomparable to anything else really. I can't wait to see where this goes! |
It's still not working, but the scaffolding to get it working is there and clear.
- Added torsion check tests - Fixed sync issue with restoring wallet from 0
- still debugging failing verify
This is a WIP draft PR for the full-chain membership proof (FCMP++) integration. It's roughly following section 6 of the specification written by @kayabaNerve (paper, commit).
Checklist of items expected in this PR:
grow_tree
algorithmtrim_tree
algorithmThe above checklist does not include all items required to complete the integration.
I plan to divide the code into commits where each subsequent commit builds off the prior commit. I could eventually close this PR in favor of smaller PR's that can be reviewed sequentially and in isolation.
This PR description can function as living documentation for the code as work on the integration progresses (and audits/FCMP++ research progress in parallel) . In this description, I highlight the most critical components from the code, aiming to make the PR as a whole easier to understand. Thoughts/feedback is welcome at any time.
A. Rust FFI
Since much of the full-chain membership proof++ code is written in Rust, this PR implements a Foreign Function Interface (FFI) to call the Rust code from C++. Using cmake, the Rust code is compiled into a static lib (
libfcmp_pp_rust.a
) when you runmake
from the root of the monero repo. The static lib's functions are exposed via the C++src/fcmp_pp/fcmp++.h
header file (generated with the help of cbindgen and modified slightly). The heavy lifting on the Rust side is done in @kayabaNerve'sfull-chain-membership-proofs
Rust crate; the Rust handles the math on the Helios and Selene curves, and FCMP++ construction and verification.Here is what the structure looks like at time of writing:
B. Curve trees merkle tree
The curve trees merkle tree is a new store for spendable transaction outputs in the chain. FCMP++'s work by proving you own (and can spend) an output in the tree, without revealing which output is yours. All existing valid cryptonote outputs will be inserted into the tree as soon as the outputs unlock. Once an output is in the tree, users can construct FCMP++'s with that output. Thus, the anon set will roughly be the entire chain since genesis.
The leaves in the tree are composed of output tuples
{O.x, I.x, C.x}
, and each layer after the leaf layer is composed of hashes of chunks of the preceding layer, as follows:Each layer is composed of points alternating on two curves (@tevador's proposed Selene and Helios curves). The leaves are Selene scalars (we convert ed25519 points to Selene scalars), the layer after leaves is composed of points on the Selene curve (we hash chunks of Selene scalars from the leaf layer to get this layer's Selene points), the following layer is composed of points on the Helios curve (we convert the prior layer's Selene points to Helios scalars, and hash chunks of those Helios scalars to get this layer's Helios points), the following layer is composed of points on the Selene curve (we convert the prior layer's Helios points to Selene scalars, and hash chunks of those Selene scalars to get this layer's Selene points), and so on. We continue until there is just one chunk in a layer to hash, leaving us with the tree root.
Each curve has a defined chunk width used when hashing the children in the preceding layer. The final layer has a single element in it: the root.
There are 3 critical steps to growing the tree:
a. Curve trees merkle tree: Preparing locked outputs for insertion to the tree upon unlock
We first need to determine the block in which outputs unlock. We keep track of locked outputs by last locked block in the database so we can grow the tree with unlocked outputs once their last locked block enters the chain.
Take note of the function:
get_outs_by_last_locked_block
. Upon adding a block, we iterate over all the block's tx outputs in order, and place the outputs in the containerOutputsByLastLockedBlock = std::unordered_map<uint64_t, std::vector<OutputContext>>
. Theuint64_t
is the output's last locked block index. The output's last locked block index is calculated using the newget_last_locked_block_index
function.get_last_locked_block_index
is documented further below. Thestd::vector<OutputContext>
for each last locked block should be sorted in the order outputs appear in the chain.Upon adding a block, we'll add those outputs to the database here:
LMDB table changes are documented further below in section A.d.
get_last_locked_block_index
The idea behind this function is to have a deterministic and efficient method of growing the tree when outputs unlock.
Most outputs in the chain don't include an
unlock_time
; those outputs unlock 10 blocks after they are included in the chain.Some outputs include an
unlock_time
which should either be interpreted as the height at which an output should unlock, or the time at which an output should unlock. When theunlock_time
should be interpreted as height, the response toget_last_locked_block_index
is trivial. When interpreted as time, the logic is less straightforward. In this PR, as proposed by @kayabaNerve, I use the prior hard fork's block and time as an anchor point, and determine the unlock block from that anchor point. By converting timestampedunlock_time
to a deterministic unlock block, we avoid needing to search for outputs that unlock by timestamp.Note it is possible (likely) for the returned
last_locked_block_index
to be distinct from current consensus' enforced unlock block for timestamp-based locked outputs only. The proposal is for consensus to enforce this new rule for FCMP++'s (users won't be able to construct fcmp's until outputs unlock according to the rules ofget_last_locked_block_index
).Note:
get_last_locked_block_index
fromunlock_time
is not in production form as is. The calculation should account for:b. Curve trees merkle tree:
grow_tree
This function takes a set of new outputs and uses them to grow the tree.
It has 3 core steps:
Steps 1 and 3 are fairly straightforward. Step 2 carries the most weight and is the most complex. It's implemented in the
CurveTrees
classget_tree_extension
function documented further below.This step-wise approach enables clean separation of the db logic (steps 1 and 3) from the grow logic (step 2). In my view, this separation enables cleaner, more efficient code, and stronger testing. It also enables reusable tree building code for wallet scanning.
get_tree_extension
get_tree_extension
has 2 core steps:Prepare new leaves for insertion into the tree.
a. Sort new outputs by the order they appear in the chain (guarantees consistent insertion order in the tree).
b. Convert valid outputs to leaf tuples (from the form
{output_pubkey,commitment}
to{O,I,C}
to{O.x,I.x,C.x}
).output_pubkey
orcommitment
that are not on the ed255129 curve, or are equal to identity after clearing torsion.CurveTrees<Selene, Helios>::leaf_tuple
function for the code.c. Place all leaf tuple members in a flat vector (
[{output 0 output pubkey and commitment}, {output 1 output pubkey and commitment},...]
becomes[O.x,I.x,C.x,O.x,I.x,C.x,...]
).Go layer by layer, hashing chunks of the preceding layer, and place results in the
TreeExtension
struct.a. Get
GrowLayerInstructions
for the current layer.GrowLayerInstructions
for the layer after the leaf layer is distinct from all other layers after.old_total_children
,new_total_children
,parent_chunk_width
, and a bool for whether or not thelast_child_will_change
, we can determine how exactly we expect a layer to grow.b. Get the
LayerExtension
for the current layer to add to theTreeExtension
struct.GrowLayerInstructions
to determine correct values when hashing the preceding "child" layer.c. Curve trees merkle tree:
trim_tree
This function trims the provided number of leaf tuples from the tree.
IMPORTANT NOTE: this section describes how trim is implemented in this WIP PR as of this writing, however, the implementation will undergo significant change in line with this comment.
The function has 5 core steps:
TrimLayerInstructions
, which we can use to know how to trim each layer in the tree.TreeReduction
struct, which we can use to trim the tree.TreeReduction
struct to trim the tree.Step 1 is straightforward.
Step 2 carries the most weight and is the most complex. It's implemented in the
CurveTrees
classget_trim_instructions
function documented further below.In step 3, the "new last chunk in each layer" is referring to what will become the new last chunk in a layer after trimming that layer. We need values from those existing chunks in order to correctly and efficiently trim the chunk.
Step 4 is also complex, and is implemented in the
CurveTrees
classget_tree_reduction
function documented further below.In step 5, we also make sure to re-add any trimmed outputs back to the locked outputs table. We only trim the tree 1 block at a time. Therefore any trimmed outputs must necessarily be re-locked upon removal from the tree.
Like for
grow_tree
this step-wise approach enables clean separation of db logic (steps 1, 3, 5) from the trim logic (steps 2 and 4).get_trim_instructions
This function first gets instructions for trimming the leaf layer, then continues getting instructions for each subsequent layer until reaching the root.
The function doing the heavy lifting is:
Similar to growing a layer, there are edge cases to watch out for when trimming a layer:
This function captures these edge cases and outputs a struct that tells the caller how exactly to handle them.
Note: I'm holding off on documenting
always_regrow_with_remaining
in this writeup since it will not be needed once this comment is implemented.get_tree_reduction
This function iterates over all layers, outputting a
LayerReduction
struct for each layer, which is a very simple struct we can use to trim a layer in the tree:It uses each layer's
TrimLayerInstructions
from above as a guide, dictating exactly what data to use to calculate a new last hash for each layer.d. Curve trees merkle tree: LMDB changes
The following changes to the db are necessary in order to store and update the curve trees merkle tree.
NEW:
locked_outputs
tablePotential outputs to be inserted into the merkle tree, indexed by each outputs' last locked block ID.
We store the ouput ID to guarantee outputs are inserted into the tree in the order they appear in the chain.
This table stores the output pub key and commitment (64 bytes) instead of
{O.x,I.x,C.x}
, since{O.x,I.x,C.x}
(96 bytes) can be derived from the output pub key and commitment, saving 32 bytes per output. Note that we should theoretically be able to stop storing the output public key and commitment in theoutput_amounts
table at the hard fork, since that table should only be useful to construct and verify pre-FCMP++ txs.NEW:
leaves
tableLeaves in the tree.
We store the output ID so that when we trim the tree, we know where to place the output back into the locked outputs table.
Same as above: this table stores the output pub key and commitment (64 bytes) instead of
{O.x,I.x,C.x}
, since{O.x,I.x,C.x}
(96 bytes) can be derived from the output pub key and commitment, saving 32 bytes per output.Note that we must save the output pub key for outputs in the chain before the fork that includes FCMP++, since we need to derive
I
from the pre-torsion cleared points. After the fork, we can store torsion cleared valid{O,C}
pairs instead if we ban torsioned outputs and commitments at consensus, or if we redefine hash to point to use torsion clearedO.x
as its input.Note we also use the dummy zerokval key optimization for this table as explained in this comment:
NEW:
layers
tableEach record is a 32 byte hash of a chunk of children, as well as that hash's position in the tree.
The
layer_idx
is indexed starting at the layer after the leaf layer (i.e.layer_idx=0
corresponds to the layer after the leaf layer).Example:
{layer_idx=0, child_chunk_idx=4, child_chunk_hash=<31fa...>}
means that thechild_chunk_hash=<31fa...>
is a hash of the 5th chunk of leaves, and is a Selene point. Another example:{layer_idx=1, child_chunk_idx=36, child_chunk_hash=<a2b5...>}
means that thechild_chunk_hash=<a2b5...>
is a hash of the 37th chunk of elements fromlayer_idx=0
, and is a Helios point.An even
layer_idx
corresponds to Selene points. An oddlayer_idx
corresponds to Helios points.The element with the highest
layer_idx
is the root (which should also be the last element in the table). There should only be a single element with the highestlayer_idx
(i.e. only one data item with key == maxlayer_idx
).UPDATED:
block_info
tableNew fields:
bi_n_leaf_tuples
- the number of leaf tuples in the tree at that height.bi_tree_root
- the root hash of the tree at that height. It is a (compressed) Helios point or Selene point, which can be determined from the number of leaf tuples in the tree.e. Curve trees merkle tree: Growing the tree as the node syncs
At each block, the tree must grow with (valid) outputs that are spendable once the block is added to the chain. In the
add_block
function indb_lmdb.cpp
, note the following:Then when adding the block, we get the number of leaf tuples in the tree and tree root and store them on each block info record:
Finally, we use the container mentioned above to place the locked outputs from that block in a "staging"
locked_outputs
table, ready to be used to grow the tree once the locked outputs' last locked block enters the chain.Comments
f. Curve trees merkle tree: Migrating cryptonote outputs into the tree
All existing cryptonote outputs need to be migrated into the merkle tree.
locked_outputs
table.g. Curve trees merkle tree: Key image migration
Removing the sign bit from key images enables an optimization for fcmp's (refer to the specification paper for further details on the optimization). If an fcmp includes a key image with sign bit cleared, while the same key image with sign bit set exists in the chain already via a ring signature, then the fcmp would be a double spend attempt and the daemon must be able to detect and reject it. In order for the daemon to detect such double spends, upon booting the daemon, we clear the sign bit from all key images already in the db. All key images inserted to the db have their sign bit cleared before insertion, and the db prevents duplicates. We also make sure that all key images held in memory by the pool have sign bit cleared (see
key_images_container
). Transactions must have unique key images with sign bit cleared too (seecheck_tx_inputs_keyimages_diff
). Key images with sign bit cleared are a new type:crypto::key_image_y
. The sign bit can be cleared viacrypto::key_image_to_y
. The_y
denotes that the encoded point is now the point's y coordinate.This PR aims to avoid a breaking change to the
COMMAND_RPC_GET_TRANSACTION_POOL
endpoint, which currently serves key images in the pool via thespent_key_image_info::id_hash
response field. The PR does this by making sure the pool keeps track of the sign bit for eachcrypto::key_image_y
held in the pool. The daemon still prevents duplicatecrypto::key_image_y
from entering the pool (except in the case of reorgs as is currently the case), but upon serving the response toCOMMAND_RPC_GET_TRANSACTION_POOL
, the daemon re-derives thecrypto::key_image
usingcrypto::key_image_y
and the sign bit, and serves this originalcrypto::key_image
viaspent_key_image_info::id_hash
. Note that it is possible for two distinctid_hash
of the samekey_image_y
to exist, but thekey_image
has sign bit set for oneid_hash
and sign bit cleared for the otherid_hash
(thus 2 distinctid_hash
's). This would be possible if during a grace period that allows both fcmp's and ring signatures, there exists an alternate chain where a user constructs an fcmp spending an output, and an alternate chain where a user constructs a ring signature spending the same output and the key image has sign bit set.TODO: tests for this grace period scenario.
h. Curve trees merkle tree: Trim the tree on reorg and on pop blocks
BlockchainLMDB::remove_block()
.BlockchainLMDB::remove_block()
, after removing the block from the block info table, we callBlockchainLMDB::trim_tree
with the number of leaves to trim and the block id which we're trimming.output_id
to re-insert the output into the locked outputs table in the correct order.BlockchainLMDB::remove_block()
, the daemon removes all of the block's transactions from the db viaBlockchainLMDB::remove_transaction
.BlockchainLMDB::remove_transaction
isBlockchainLMDB::remove_output
, which is called for all of a tx's outputs.BlockchainLMDB::remove_output
we remove the output from the locked outputs table if it's present.BlockchainLMDB::trim_tree
.C. Transaction struct changes for FCMP++
cryptonote::transaction::rctSig
rctSigBase
Added a new
RCTType
enum usable in thetype
member ofrctSigBase
:RCTTypeFcmpPlusPlus = 7
FCMP++ txs are expected to use this
RCTType
instead ofRCTTypeBulletproofPlus
(even though FCMP++ txs are still expected to have a bp+ range proof).Added a new member to
rctSigBase
:crypto::hash referenceBlock; // block containing the merkle tree root used for the tx's FCMP++
This member is only expected present on txs of
rctSigBase.type == RCTTypeFcmpPlusPlus
.rctSigPrunable
Added 2 new members:
Note there is a single opaque FCMP++ struct per tx. The
FcmpPpProof
type is simply astd::vector<uint8_t>
. The length of theFcmpPpProof
is deterministic from the number of inputs in the tx and curve trees merkle tree depth. Thus, when serializing and de-serializing, we don't need to store the vector length, and can expect a deterministic number of bytes for theFcmpPpProof
by callingfcmp_pp::proof_len(inputs, curve_trees_tree_depth)
.Comments
tx_fcmp_pp
serialization test demonstrates what an expected dummytransaction
struct looks like with dummy data.D. Constructing FCMP++ transactions
TODO
E. Verifying FCMP++ transactions
TODO
F. Consensus changes for FCMP++
TODO
G. Wallet sync
TODO