-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpc: sync state api #51
Conversation
339f16d
to
400d0e6
Compare
@@ -4,24 +4,24 @@ package block_header; | |||
import "digest.proto"; | |||
|
|||
message BlockHeader { | |||
/// the hash of the previous blocks header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protobuf's commets are just two forward slashes. This was translated by prost as:
/// / the hash of the previous blocks header.
Because the forward slash was considered part of the comment
@@ -0,0 +1,29 @@ | |||
syntax = "proto3"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the requests/responses to a separate module. The reason is that for the time being the rpc and store are using the same definitions. Having them defined once means they are the same type to the Rust compiler, and there is no need to perform additional casting.
For the long run this will probably change, for example, if we add distributed tracing the RPC will need to forward a token to the Store, and probably will be more reasons to change the messages. But for now it is probably best to keep this simple.
@@ -27,6 +27,7 @@ miden-node-utils = { path = "../utils" } | |||
miden_objects = { workspace = true } | |||
once_cell = { version = "1.18.0" } | |||
prost = { version = "0.12" } | |||
rusqlite = { version = "0.29", features = ["array", "buildtime_bindgen"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array
is needed to support passing multiple values to the cursor. This is used to allow querying for tag/nullifier prefixies and account ids.
The rusqlite
library supports multiple versions of the sqlite
. It can use the system's library, it can statically link sqlite
to the final binary, and it can use a patched version of sqlite
with additional encryption features. To speed up compilation, the library is distributed with ffi generated for a conservative version of sqlite
that can be used with any of the aforementioned distributions. Unfortunately it doesn't support the array
extension. So we have to generate the ffi.
store/src/db.rs
Outdated
conn.interact(|conn| array::load_module(conn)) | ||
.await | ||
.map_err(|_| anyhow!("Loading carray module failed"))??; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is loading the array
extension, needed to perform the queries using vector of values form the rust land.
An alternative implementation would use the post_hook
. I have to look at that later on, but for now this seems to be working.
( | ||
block_num INTEGER NOT NULL, | ||
block_header BLOB NOT NULL, | ||
|
||
PRIMARY KEY (block_num), | ||
CONSTRAINT block_header_block_num_positive CHECK (block_num >= 0) | ||
CONSTRAINT block_header_block_num_is_u32 CHECK (block_num >= 0 AND block_num < 4294967296) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous store PR requested for the usage of u32
as the block number. This PR does some tidying up, to add that to the DB constraints and type system.
store/src/server/api.rs
Outdated
mmr_delta: todo!(), | ||
block_path: todo!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mmr
only supports constructing a delta against its latest version. So this can't yet be done, I'm looking into a fix for this to add in the crypto
repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened: 0xPolygonMiden/crypto#205 0xPolygonMiden/crypto#206 0xPolygonMiden/crypto#207
with these 3 PRs merged this is a trivial change
19db790
to
affe3ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! I left a few comments inline - they all are fairly small. I do have one broader comment/question:
If I understood the code correctly, when the store fulfills the sync_state
request it makes multiple sequential requests to the database (i.e., first notes_since_block_by_tag
, then get_block_header
, get_account_hash_by_block_range
etc.). All these requests are independent of each other - i.e., executed in different transactions (and may be executed against different versions of the database).
While I don't think this results in consistency issues for this specific endpoint, I wonder if a better approach would be to execute all these individual requests in a single transaction. One way to do this would be to have a single method on the Db
struct - something like get_state_sync_info
- and then to make individual queries to the DB inside this method.
store/src/db.rs
Outdated
/// Inserts a new nullifier to the DB. | ||
/// | ||
/// This method may be called multiple times with the same nullifier. | ||
pub async fn add_nullifier( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method (and other similar methods) used primarily for testing purposes? If so, we should probably indicate this in the comments and maybe group them together in a "testing" section or something similar.
Eventually, there will be only a single method for updating the data in the store - apply_block
. This method will lock the database for update and perform all required updates atomically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR it was added primarily for testing purposes, I assume this would be used in the future for other endpoints too (e.g. apply_block can call these methods).
- Put the methods behind a
cfg(test)
There should be no consistency issue as long as we don't have reorgs. I wrote the endpoint to first define the start/end blocks, and all the other loads are for the same range. The data in there would only change with a reorg. |
Yes - as I mentioned in my comment I don't think there is a consistency issue here. But it is still probably better to retrieve all required data in a single transaction. A few reasons:
Are there any reasons to prefer multiple independent requests? |
affe3ef
to
b502ece
Compare
I'm not sure I follow. You mean a single request would return the complete dataset? Each request has a different return type, I'm not sure how to do that in SQL DB, perhaps you're planning on switching to a NoSQL in the future (something document based?)
Assuming we are using a SQL DB, we would need multiple To say the above with a different perspective, the result format is different for each table, so different
There are some other issues with transactions, not only they are a hit on performance, they are also a huge bottleneck for migrations. IMO we should not rely on them as you're suggesting. Edit: Well, I guess that for MySQL and PostgreSQL the isolation level Footnotes |
b502ece
to
e7df884
Compare
moved the logic to load the state sync into a method in the DB |
this it the behavior of SQLite:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Looks good! I left a few comments inline - all except for one are very minor. The main comment is to update how we select a list of relevant notes - a part of the criteria there should be based on the sender
column.
A couple of other things, for which I think we should create separate issues:
- Regarding transaction isolation level: if I understood how SQLite works, the behavior I was going for may be provided out of the box in WAL mode (i.e., before a write transaction is committed, all reads - whether in a transaction or not - will see the database in the state prior to write transaction). So, we may not need to do anything extra except for enabling WAL mode - but I think we should discuss this in an issue.
- We should figure out how we want to do indexing. For example, do we need to add an index on the first 16 bits of note tag and on note sender? Or should we consider some other approach to quickly finding the "anchor block" for the
sync_state
endpoint.
For the second point, what worries me the most is the efficiency of doing this:
SELECT
block_num
FROM
notes
WHERE
((tag >> 48) IN rarray(?1) OR sender IN rarray(?2)) AND
block_num > ?3
ORDER BY
block_num ASC
LIMIT
1
Oh, so the client is expected to send its account_ids also as part of the request's
It is provided by both WAL and rollback modes. The difference is that with WAL a transaction can fail and needs to be retried, with rollback there is an exclusive lock for the writer, so it never fails at the cost of increased latency.
I was hoping that we would not have to do that right now. We don't have all the tables defined, neither the queries, and the ones we have may even change in the future (say, if we decide to change the number of bits in the tag, or something like that). I'm not sure if optimizing that right now would be the best approach. With that said, SQLite has very comprehensive documentation on indexes and the query planner
So here are two proposals: -- covering indexes
CREATE INDEX
idx_notes_tag_high_16bits
ON
notes
( tag >> 48, block_num );
CREATE INDEX
idx_notes_sender
ON
notes
( sender, block_num ) Or: CREATE TABLE
notes_idx
(
tag INTEGER NOT NULL,
block_num INTEGER NOT NULL,
PRIMARY KEY (tag, block_num),
CONSTRAINT notes_idx_tag_is_felt CHECK (tag >= 0 AND tag <= 18446744069414584321),
CONSTRAINT notes_idx_block_num_is_u32 CHECK (block_num >= 0 AND block_num < 4294967296)
) STRICT, WITHOUT ROWID; The table above forces the pair The table could be populated via a trigger, and we could also have a foreign key to the Footnotes |
Account ID's are already a part of the request - we just use them in two places (to look up account states and to look up notes by sender). Account IDs are not part of the
Yes, as I mentioned in my comment, we don't need to do this right now (or even in near future) - let's just create an issue to discuss this. |
e7df884
to
2c0ea42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good! Thank you! I left one small nit inline - after this, we can merge.
Also, let's create the two issues: one for transaction consistency mode and another one for indexes in the notes table.
store/src/db/mod.rs
Outdated
/// Searchs a block after `block_num` which contains note(s) matching [tag]s, and returns all | ||
/// matching notes. | ||
/// | ||
/// # Returns | ||
/// | ||
/// An empty vector if the blocks after `block_num` don't contains notes matching `tags`. | ||
/// Otherwise the matching notes from the next block are returned. Note that this method returns | ||
/// notes from a single block. | ||
pub async fn get_notes_since_block_by_tag_and_sender( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd probably mention sender
in the comments here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
2c0ea42
to
ba67f9b
Compare
block_number INTEGER NOT NULL, | ||
|
||
PRIMARY KEY (nullifier), | ||
CONSTRAINT nullifiers_nullifier_valid_digest CHECK (length(nullifier) = 32), | ||
CONSTRAINT nullifiers_block_number_positive CHECK (block_number >= 0), | ||
CONSTRAINT nullifiers_nullifier_is_digest CHECK (length(nullifier) = 32), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that I forgot to mention. The desire to avoid encoding/decoding data when reading/writing to the DB is leaking. This table can perform a constraint check ont he nullifier size because it is used the fixed encoding defined via winterfell's traits.
The tables above on the other hand are using the protobuf's serialization format, I skipped the constraint checks there because my first byte count was wrong, and in some situations the encoding is variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify: does this mean that some BLOB
fields in accounts
, notes
, and block_headers
tables use protobuf serialization format?
Looking through these tables, it wasn't immediately clear to me which fields would these be (maybe block_headers.block_header
and notes.merkle_path
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically everything except the nullifier. Maybe I should change that too, it is only using our binary format to create the nullifier tree, but that is done once during initialization, and the other uses actually need the protobuf format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically everything except the nullifier.
Does this include non-blob fields too?
I would think integers would be recorded as integers - but maybe that's not the case?
Regarding blob fields: things like nullifiers, hashes etc. are just 32 bytes which cannot be compressed - so, the format should be the same. But maybe I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think integers would be recorded as integers - but maybe that's not the case?
Integers should be fine
Regarding blob fields: things like nullifiers, hashes etc. are just 32 bytes which cannot be compressed - so, the format should be the same. But maybe I'm missing something?
The encoding format is documented here. There are some additional metadata in the protobuf's wire format that we don't use. I haven't spent a lot of time trying to fully digest the format, so I have an idea of how many bytes it should be, but I didn't want to add guess work to the constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's create an issue to discuss this. On the one hand, it would be nice to reduce the number of times we encode/decode things. But on the other, storing protobuf formats in the DB doesn't feel right. Maybe there are options which can balance these somehow.
closes: #43