rpc: sync state api #51

hackaugusto · 2023-10-26T13:29:56Z

closes: #43

hackaugusto · 2023-10-27T14:48:52Z

proto/proto/block_header.proto

@@ -4,24 +4,24 @@ package block_header;
 import "digest.proto";

 message BlockHeader {
-    /// the hash of the previous blocks header.


protobuf's commets are just two forward slashes. This was translated by prost as:

/// / the hash of the previous blocks header.

Because the forward slash was considered part of the comment

hackaugusto · 2023-10-27T14:51:04Z

proto/proto/requests.proto

@@ -0,0 +1,29 @@
+syntax = "proto3";


I moved the requests/responses to a separate module. The reason is that for the time being the rpc and store are using the same definitions. Having them defined once means they are the same type to the Rust compiler, and there is no need to perform additional casting.

For the long run this will probably change, for example, if we add distributed tracing the RPC will need to forward a token to the Store, and probably will be more reasons to change the messages. But for now it is probably best to keep this simple.

hackaugusto · 2023-10-27T14:54:51Z

store/Cargo.toml

@@ -27,6 +27,7 @@ miden-node-utils = { path = "../utils" }
 miden_objects = { workspace = true }
 once_cell = { version = "1.18.0" }
 prost = { version = "0.12" }
+rusqlite = { version = "0.29", features = ["array", "buildtime_bindgen"] }


array is needed to support passing multiple values to the cursor. This is used to allow querying for tag/nullifier prefixies and account ids.

The rusqlite library supports multiple versions of the sqlite. It can use the system's library, it can statically link sqlite to the final binary, and it can use a patched version of sqlite with additional encryption features. To speed up compilation, the library is distributed with ffi generated for a conservative version of sqlite that can be used with any of the aforementioned distributions. Unfortunately it doesn't support the array extension. So we have to generate the ffi.

hackaugusto · 2023-10-27T14:55:32Z

store/src/db.rs

+        conn.interact(|conn| array::load_module(conn))
+            .await
+            .map_err(|_| anyhow!("Loading carray module failed"))??;


This is loading the array extension, needed to perform the queries using vector of values form the rust land.

An alternative implementation would use the post_hook. I have to look at that later on, but for now this seems to be working.

hackaugusto · 2023-10-27T14:56:47Z

store/src/migrations.rs

        (
            block_num INTEGER NOT NULL,
            block_header BLOB NOT NULL,

            PRIMARY KEY (block_num),
-            CONSTRAINT block_header_block_num_positive CHECK (block_num >= 0)
+            CONSTRAINT block_header_block_num_is_u32 CHECK (block_num >= 0 AND block_num < 4294967296)


The previous store PR requested for the usage of u32 as the block number. This PR does some tidying up, to add that to the DB constraints and type system.

hackaugusto · 2023-10-27T14:58:19Z

store/src/server/api.rs

+            mmr_delta: todo!(),
+            block_path: todo!(),


The mmr only supports constructing a delta against its latest version. So this can't yet be done, I'm looking into a fix for this to add in the crypto repo.

opened: 0xPolygonMiden/crypto#205 0xPolygonMiden/crypto#206 0xPolygonMiden/crypto#207

with these 3 PRs merged this is a trivial change

bobbinth

Looks good! Thank you! I left a few comments inline - they all are fairly small. I do have one broader comment/question:

If I understood the code correctly, when the store fulfills the sync_state request it makes multiple sequential requests to the database (i.e., first notes_since_block_by_tag, then get_block_header, get_account_hash_by_block_range etc.). All these requests are independent of each other - i.e., executed in different transactions (and may be executed against different versions of the database).

While I don't think this results in consistency issues for this specific endpoint, I wonder if a better approach would be to execute all these individual requests in a single transaction. One way to do this would be to have a single method on the Db struct - something like get_state_sync_info - and then to make individual queries to the DB inside this method.

proto/proto/note.proto

rpc/src/server/api.rs

store/src/server/api.rs

bobbinth · 2023-10-29T07:28:24Z

store/src/db.rs

+    /// Inserts a new nullifier to the DB.
+    ///
+    /// This method may be called multiple times with the same nullifier.
+    pub async fn add_nullifier(


Is this method (and other similar methods) used primarily for testing purposes? If so, we should probably indicate this in the comments and maybe group them together in a "testing" section or something similar.

Eventually, there will be only a single method for updating the data in the store - apply_block. This method will lock the database for update and perform all required updates atomically.

In this PR it was added primarily for testing purposes, I assume this would be used in the future for other endpoints too (e.g. apply_block can call these methods).

Put the methods behind a cfg(test)

hackaugusto · 2023-10-29T14:12:03Z

There should be no consistency issue as long as we don't have reorgs. I wrote the endpoint to first define the start/end blocks, and all the other loads are for the same range. The data in there would only change with a reorg.

bobbinth · 2023-10-30T06:04:54Z

There should be no consistency issue as long as we don't have reorgs. I wrote the endpoint to first define the start/end blocks, and all the other loads are for the same range. The data in there would only change with a reorg.

Yes - as I mentioned in my comment I don't think there is a consistency issue here. But it is still probably better to retrieve all required data in a single transaction. A few reasons:

We push enforcement of consistency to the DB rather than leaving it in our code. This will mean less mental overhead in figuring out if the endpoint works as expected. This could prevent weird edge cases arising if we change how the endpoint works in the future.
We will have other endpoints which will need to get data from multiple tables in a single transaction, and it is better to use the same approach across all endpoints.
This will give us more flexibility in the future if we decided to switch to a different database. For example, in databases supporting stored procedures, the whole request could be fulfilled by the DB itself.

Are there any reasons to prefer multiple independent requests?

hackaugusto · 2023-10-30T14:56:21Z

For example, in databases supporting stored procedures, the whole request could be fulfilled by the DB itself.

I'm not sure I follow. You mean a single request would return the complete dataset? Each request has a different return type, I'm not sure how to do that in SQL DB, perhaps you're planning on switching to a NoSQL in the future (something document based?)

Are there any reasons to prefer multiple independent requests?

Assuming we are using a SQL DB, we would need multiple SELECTs inside a single transaction (one for each data type, which is to say one per table, e.g. nullifiers / accounts). To have the consistency level you're talking about with multiple queries, we would need to set the DB isolation level into SERIALIZABLE, which is the worst case for performance.

To say the above with a different perspective, the result format is different for each table, so different SELECTs are necessary, the default for PG and MySQL is READ COMITTED ¹ / REPEATABLE READ ². Which means each SELECT statement is internally consistent, but it does not mean that two consecutive SELECTs are consistent among themselves in PG. From the PG docs:

[..] two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes after the first SELECT starts and before the second SELECT starts

There are some other issues with transactions, not only they are a hit on performance, they are also a huge bottleneck for migrations. IMO we should not rely on them as you're suggesting.

Edit: Well, I guess that for MySQL and PostgreSQL the isolation level REPEATABLE READ provides more than the SQL standard requires, i.e. they have SNAPSHOT guarantees, which would be sufficient in this case but not portable. The point I'm trying to make is that transactions alone are not sufficient to ensure consistency, the behavior of the queries together with the server code and the server's/connection's configurations, like isolation level, must be looked at too, trying to reduce complexity by wrapping everything in a transaction will probably cause subtle bugs

hackaugusto · 2023-10-30T19:07:32Z

moved the logic to load the state sync into a method in the DB

hackaugusto · 2023-10-30T21:06:38Z

this it the behavior of SQLite:

Except in the case of shared cache database connections with PRAGMA read_uncommitted turned on, all transactions in SQLite show "serializable" isolation. SQLite implements serializable transactions by actually serializing the writes. There can only be a single writer at a time to an SQLite database. There can be multiple database connections open at the same time, and all of those database connections can write to the database file, but they have to take turns. SQLite uses locks to serialize the writes automatically; this is not something that the applications using SQLite need to worry about.

bobbinth

Thank you! Looks good! I left a few comments inline - all except for one are very minor. The main comment is to update how we select a list of relevant notes - a part of the criteria there should be based on the sender column.

A couple of other things, for which I think we should create separate issues:

Regarding transaction isolation level: if I understood how SQLite works, the behavior I was going for may be provided out of the box in WAL mode (i.e., before a write transaction is committed, all reads - whether in a transaction or not - will see the database in the state prior to write transaction). So, we may not need to do anything extra except for enabling WAL mode - but I think we should discuss this in an issue.
We should figure out how we want to do indexing. For example, do we need to add an index on the first 16 bits of note tag and on note sender? Or should we consider some other approach to quickly finding the "anchor block" for the sync_state endpoint.

For the second point, what worries me the most is the efficiency of doing this:

SELECT
    block_num
FROM
    notes
WHERE
    ((tag >> 48) IN rarray(?1) OR sender IN rarray(?2)) AND
    block_num > ?3
ORDER BY
    block_num ASC
LIMIT
    1

store/src/server/api.rs

store/src/db.rs

hackaugusto · 2023-10-31T12:26:05Z

a part of the criteria there should be based on the sender column.

Oh, so the client is expected to send its account_ids also as part of the request's tags? And the idea of searching the note by using the sender, is for the client to get confirmation of when its notes are created?

if I understood how SQLite works, the behavior I was going for may be provided out of the box in WAL mode

It is provided by both WAL and rollback modes. The difference is that with WAL a transaction can fail and needs to be retried, with rollback there is an exclusive lock for the writer, so it never fails at the cost of increased latency.

We should figure out how we want to do indexing

I was hoping that we would not have to do that right now. We don't have all the tables defined, neither the queries, and the ones we have may even change in the future (say, if we decide to change the number of bits in the tag, or something like that). I'm not sure if optimizing that right now would be the best approach.

With that said, SQLite has very comprehensive documentation on indexes and the query planner
¹ ² ³. For the query you mentioned, we have a few things to consider:

Which columns would filter the result the most.
- The > block_number filter is hard to predict, for the time being users will use this endpoint starting from genesis, and later on to track the chain tip. So this filter can potentially scan the table or just read one block's notes. To me this means this won't be a good index candidate
- The OR by tag and sender are much better candidates.
  - We can very confidently assume a user will send a small fraction of the total number of notes, so filtering by sender is a very good strategy.
  - The tag is more complicated, as it is, we are doing a bitshift operation >>48, SQLite supports indexes over expressions ⁴, which we can take advantage of, but they can be error prone since indexes are chosen based on the syntactic form of the expression.
  - The two fields above require one index each, and the DB has to filter by each and then perform a union of the results. An alternative approach would be to have a separate table, say CREATE TABLE note_idx (tag INTEGER, block_num INTEGER), with just the tag data. The idea is that each note would produce two rows in the index table, one row with the value of sender and another with the value of tag>>48 and that we would do a single IN query, that would need a single index and eliminates the needs for joining of results.

So here are two proposals:

-- covering indexes
CREATE INDEX
  idx_notes_tag_high_16bits
ON
  notes
( tag >> 48, block_num );

CREATE INDEX
  idx_notes_sender
ON
  notes
( sender, block_num )

Or:

CREATE TABLE
  notes_idx
(
  tag INTEGER NOT NULL,
  block_num INTEGER NOT NULL,
  
  PRIMARY KEY (tag, block_num),
  CONSTRAINT notes_idx_tag_is_felt CHECK (tag >= 0 AND tag <= 18446744069414584321),
  CONSTRAINT notes_idx_block_num_is_u32 CHECK (block_num >= 0 AND block_num < 4294967296)
) STRICT, WITHOUT ROWID;

The table above forces the pair (tag, block_num) to be unique, which is not true (e.g. a user produces two notes in the same block, there are two notes with the same sender). The thing is that we don't care about that, we only want to learn the block height at which the user produce something, so that is okay. The benefit is that the table's clustered index is what we want to search on, so no extra index is needed.

The table could be populated via a trigger, and we could also have a foreign key to the block_num and clear the entries when the blocks are deleted (once we implement pruning in the future)

bobbinth · 2023-10-31T14:58:15Z

so the client is expected to send its account_ids also as part of the request's tags?

Account ID's are already a part of the request - we just use them in two places (to look up account states and to look up notes by sender). Account IDs are not part of the tags.

We should figure out how we want to do indexing

I was hoping that we would not have to do that right now.

Yes, as I mentioned in my comment, we don't need to do this right now (or even in near future) - let's just create an issue to discuss this.

applied requested changes

bobbinth

All looks good! Thank you! I left one small nit inline - after this, we can merge.

Also, let's create the two issues: one for transaction consistency mode and another one for indexes in the notes table.

bobbinth · 2023-10-31T19:16:42Z

store/src/db/mod.rs

+    /// Searchs a block after `block_num` which contains note(s) matching [tag]s, and returns all
+    /// matching notes.
+    ///
+    /// # Returns
+    ///
+    /// An empty vector if the blocks after `block_num` don't contains notes matching `tags`.
+    /// Otherwise the matching notes from the next block are returned. Note that this method returns
+    /// notes from a single block.
+    pub async fn get_notes_since_block_by_tag_and_sender(


nit: I'd probably mention sender in the comments here.

hackaugusto · 2023-11-01T12:55:30Z

store/src/migrations.rs

            block_number INTEGER NOT NULL,

            PRIMARY KEY (nullifier),
-            CONSTRAINT nullifiers_nullifier_valid_digest CHECK (length(nullifier) = 32),
-            CONSTRAINT nullifiers_block_number_positive CHECK (block_number >= 0),
+            CONSTRAINT nullifiers_nullifier_is_digest CHECK (length(nullifier) = 32),


One thing that I forgot to mention. The desire to avoid encoding/decoding data when reading/writing to the DB is leaking. This table can perform a constraint check ont he nullifier size because it is used the fixed encoding defined via winterfell's traits.

The tables above on the other hand are using the protobuf's serialization format, I skipped the constraint checks there because my first byte count was wrong, and in some situations the encoding is variable.

To clarify: does this mean that some BLOB fields in accounts, notes, and block_headers tables use protobuf serialization format?

Looking through these tables, it wasn't immediately clear to me which fields would these be (maybe block_headers.block_header and notes.merkle_path?)

Basically everything except the nullifier. Maybe I should change that too, it is only using our binary format to create the nullifier tree, but that is done once during initialization, and the other uses actually need the protobuf format

Basically everything except the nullifier.

Does this include non-blob fields too?

I would think integers would be recorded as integers - but maybe that's not the case?

Regarding blob fields: things like nullifiers, hashes etc. are just 32 bytes which cannot be compressed - so, the format should be the same. But maybe I'm missing something?

I would think integers would be recorded as integers - but maybe that's not the case?

Integers should be fine

Regarding blob fields: things like nullifiers, hashes etc. are just 32 bytes which cannot be compressed - so, the format should be the same. But maybe I'm missing something?

The encoding format is documented here. There are some additional metadata in the protobuf's wire format that we don't use. I haven't spent a lot of time trying to fully digest the format, so I have an idea of how many bytes it should be, but I didn't want to add guess work to the constraints.

Let's create an issue to discuss this. On the one hand, it would be nice to reduce the number of times we encode/decode things. But on the other, storing protobuf formats in the DB doesn't feel right. Maybe there are options which can balance these somehow.

hackaugusto force-pushed the hacka-sync-state-api branch 4 times, most recently from 339f16d to 400d0e6 Compare October 27, 2023 14:47

hackaugusto commented Oct 27, 2023

View reviewed changes

hackaugusto force-pushed the hacka-sync-state-api branch 3 times, most recently from 19db790 to affe3ef Compare October 27, 2023 19:39

hackaugusto requested a review from bobbinth October 27, 2023 19:39

hackaugusto marked this pull request as ready for review October 27, 2023 19:39

bobbinth reviewed Oct 29, 2023

View reviewed changes

hackaugusto mentioned this pull request Oct 30, 2023

Add rate limiting to the servers #52

Open

hackaugusto force-pushed the hacka-sync-state-api branch from affe3ef to b502ece Compare October 30, 2023 14:26

hackaugusto force-pushed the hacka-sync-state-api branch from b502ece to e7df884 Compare October 30, 2023 19:06

bobbinth previously requested changes Oct 31, 2023

View reviewed changes

hackaugusto force-pushed the hacka-sync-state-api branch from e7df884 to 2c0ea42 Compare October 31, 2023 16:01

bobbinth approved these changes Oct 31, 2023

View reviewed changes

This was referenced Oct 31, 2023

transaction consistency mode #56

Closed

indexes for the notes table #57

Open

rpc: sync state api

ba67f9b

hackaugusto force-pushed the hacka-sync-state-api branch from 2c0ea42 to ba67f9b Compare October 31, 2023 21:16

hackaugusto merged commit 0a484fe into main Oct 31, 2023

hackaugusto deleted the hacka-sync-state-api branch October 31, 2023 21:17

hackaugusto commented Nov 1, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rpc: sync state api #51

rpc: sync state api #51

hackaugusto commented Oct 26, 2023

hackaugusto Oct 27, 2023

hackaugusto Oct 27, 2023

hackaugusto Oct 27, 2023

hackaugusto Oct 27, 2023 •

edited

Loading

hackaugusto Oct 27, 2023

hackaugusto Oct 27, 2023

hackaugusto Oct 27, 2023 •

edited

Loading

bobbinth left a comment

bobbinth Oct 29, 2023

hackaugusto Oct 30, 2023

hackaugusto commented Oct 29, 2023 •

edited

Loading

bobbinth commented Oct 30, 2023

hackaugusto commented Oct 30, 2023 •

edited

Loading

hackaugusto commented Oct 30, 2023

hackaugusto commented Oct 30, 2023

bobbinth left a comment •

edited

Loading

hackaugusto commented Oct 31, 2023 •

edited

Loading

bobbinth commented Oct 31, 2023

bobbinth left a comment

bobbinth Oct 31, 2023

hackaugusto Oct 31, 2023

hackaugusto Nov 1, 2023

bobbinth Nov 1, 2023

hackaugusto Nov 1, 2023

bobbinth Nov 1, 2023

hackaugusto Nov 8, 2023 •

edited

Loading

bobbinth Nov 8, 2023

rpc: sync state api #51

rpc: sync state api #51

Conversation

hackaugusto commented Oct 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hackaugusto Oct 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hackaugusto Oct 27, 2023 • edited Loading

Choose a reason for hiding this comment

bobbinth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hackaugusto commented Oct 29, 2023 • edited Loading

bobbinth commented Oct 30, 2023

hackaugusto commented Oct 30, 2023 • edited Loading

Footnotes

hackaugusto commented Oct 30, 2023

hackaugusto commented Oct 30, 2023

bobbinth left a comment • edited Loading

Choose a reason for hiding this comment

hackaugusto commented Oct 31, 2023 • edited Loading

Footnotes

bobbinth commented Oct 31, 2023

bobbinth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hackaugusto Nov 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hackaugusto Oct 27, 2023 •

edited

Loading

hackaugusto Oct 27, 2023 •

edited

Loading

hackaugusto commented Oct 29, 2023 •

edited

Loading

hackaugusto commented Oct 30, 2023 •

edited

Loading

bobbinth left a comment •

edited

Loading

hackaugusto commented Oct 31, 2023 •

edited

Loading

hackaugusto Nov 8, 2023 •

edited

Loading