distributed sequencer algorithm #463

hosie · 2024-12-03T18:44:50Z

This PR introduces the detail documentation of the distributed sequencer protocol.

In domains ( such as Pente) where the spending rules for states allow any one of a group of parties to spend the state, then we need to coordinate the assembly of transactions across multiple nodes so that we can maximize the throughput by speculative spending new states and avoid transactions being reverted due to double concurrent spending / state contention. The distributed sequencer protocol described in this PR is the formal specification of how that operates and the responsibilities and expectations of each node involved.

This goes beyond describing what happens to be implemented in code at this moment in time and aims to be comprehensive in the specification of the algorithm. Further code changes PRs will be opened to bring the code inline with this architecture once it has been agreed.

Includes explicit (but concise) write up of the key architectural decisions made and the consequences of those compared to the alternatives.

Remaining TODOs before moving out of draft
Tweaks to protocol

remove the dispatchNotification protocol point and replace with heartbeat messages from submitter
- as a consequence of this, dispatchConfirmation has little value being a persistence point on the sender node so update the diagrams to not assume persistence here
heartbeat messages should go to everyone in the group to help with detection of active groups in e.g. in cases of coordinator revival
Delegation requests should include a field that states whether this is the first choice or not to help the proposed delegate decide if it agrees.
Remove any mention of DelegationReturned message or possibly include it in a "potential future optimizations" section. The protocol should not rely on this.

Fill in gaps / corrections in writeup

Formatting

some of the sequence diagrams have so many lanes, they get scaled down so far that they are difficult to read without zoom. Either break them down to smaller sequencers, or combine lanes ( e.g. fold the database layers in) and/or pre-build the svg and include it as and image so that the reader can at least open the image in a separate tab
dark mode need to tweak the css so that the images look good in dark model. This will require all images to be inline SVG or to have 2 copies of each image. I think I prefer 2 copies because inline SVG has the downside that chrome does not give the option to "open image in new tab" and sometimes that is useful.
In various high level discussions (e.g. scenario walkthroughs) we make reference to message types and message exchange handshakes but do not currently include hyperlinks.
could possibly break down the "Senders Responsibility" section. Maybe pull out failover to a separate section so that it shows up in the right nav.
the actual protocol points ( message exchanges and sub-processes) do not appear in the right nav. Restructure the hierarchy levels so that they do

Signed-off-by: John Hosie <[email protected]>

doc-site/docs/architecture/distributed_sequencer.md

peterbroadhurst · 2024-12-06T21:39:19Z

doc-site/docs/architecture/distributed_sequencer.md

+
+### Coordinator selection
+
+This is a form of leader election but is achieved through deterministic calculation for a given point in time ( block height), given predefined members of the group, and a common (majority) awareness of the current availability of each member. So there is no actual "election" term needed.


It's a pedantic words thing, but leader election is the outcome of the distributed algorithm, rather than the thing that is performed directly by Coordinator selection on each given node.

e.g. leader election is not performed using coordinator selection on a single node in isolation. Rather leader election is a/the property that emerges from every node following the full specification in this document, and one important component of that specification is a deterministic decision made independently by each node to come to the same conclusion about a set of data (including a block number that changes over time inconsistently across the nodes, and static configuration including the committee that is agreed deterministically by recording it to the blockchain at contract deployment time).

peterbroadhurst · 2024-12-06T21:42:46Z

doc-site/docs/architecture/distributed_sequencer.md

+
+This is a form of leader election but is achieved through deterministic calculation for a given point in time ( block height), given predefined members of the group, and a common (majority) awareness of the current availability of each member. So there is no actual "election" term needed.
+
+Each node is allocated positions on a hash ring.  These positions are a function of the node name.  Given that the entire committee agrees on the names of all nodes, all nodes independently arrive at the same decision about the positions of all nodes.


See note above about needing to be really specific about if it is node name or committee member identifier

doc-site/docs/architecture/distributed_sequencer.md

peterbroadhurst · 2024-12-06T21:46:48Z

doc-site/docs/architecture/distributed_sequencer.md

+For each Pente domain instance ( smart contract API) the coordinator is selected by choosing one of the nodes in the configured privacy group of that contract
+
+ - the selector is a pure function where the inputs are the node names + the current block number and the output is the name of one of the nodes
+ - the function will return the same output for a range of `n` consecutive blocks ( where `n` is a configuration parameter of the policy)


I don't think we have to solve for it in the first version of this specification, but I note that the ability to change n over time is something that might be necessary for tuning the algorithm on long-running privacy groups.

It probably fits into a wider note on "changing configuration"

peterbroadhurst · 2024-12-06T23:14:25Z

doc-site/docs/architecture/distributed_sequencer.md

+
+When a node starts acting as the coordinator, then it  periodically sends a heartbeat message to all nodes that for which it has an active delegation and will accept delegation requests from any other node in the group. 
+
+All sender nodes keep track of the current selected coordinator ( either by reevaluating the deterministic function periodically or caching the result until block height exceeds the current range boundary).  If they fail to receive a heartbeat message from the current coordinator, then they will chose the next closest node in the hashring (e.g. by reevaluating the hashring function with the unavailable nodes removed).


What is the intended behavior of the algorithm in this scenario?

Highest ranked coordinator is B

D delegates transaction to B

B stops for a period

Longer than the heartbeat interval

Shorter than it takes for the block-range to change

A is the second-highest scored coordinator

D delegates transaction to A (because it gives up on B)

B becomes available again

C delegates transaction to B

At this point D has delegated a TX to A, but C has delegated to B.
I haven't (yet at this point in the spec) understood what would cause A and B to notice that they are both active (because at the time A became the coordinator, B was not available to ask).

Ah, ok found this statement below:

If Node A comes back online, then it will begin transmitting heartbeat messages again because it has selected itself as coordinator given the current block height.

This seems complex to implement - what would cause A (B in my above sorry) to evaluate that it selected itself as the coordinator? It does not have any transactions after the the restart. Why would this contract (of the infinite potential contracts) be one after restart that it would know to select itself as a coordinator for?

peterbroadhurst · 2024-12-06T23:27:54Z

doc-site/docs/architecture/distributed_sequencer.md

+
+![image](../images/distributed_sequencer_switchover_frame1.svg){ width="500" }
+
+As all nodes in the group ( including node A) receive receipts of block 3, they will recognize node B as the new coordinator


I did not understand this sentence.

This implies some future inference model and/or state tracking that seems very complex to achieve, and it's unclear to me the justification for this.

I note I believe this different statement is true, but it's currently the only thing I understand to be true with respect to evaluation of block height:

When each node in the group is notified that the block height of the chain has reached 4 (happens at different times on each node) they will recognize node B as the highest scoring coordinator when they have reason to run the scoring algorithm

Ah, ok - I think I understand.
You are saying that the the combination of these two things will cause the scoring algorithm to occur:

Detection of block height moving to 4

The fact that transactions are unresolved for this contract address

This implies the algorithm re-evaluates the local "sender state" on detection of each block.

However, I'm worried I'm missing some detail as you say:

receive receipts of block 3

I'm not clear what receiving receipts means in this case, how it relates to block confirmations, or why it's block 3 (rather than 4) that's relevant here.

The block numbers in the diagram start at 0 therefore block 3 is the last block in the first range. Early versions of these diagrams had x+0,x+1 etc... where x was the block height when the privacy group was created but I felt that the x+.. was adding noise to the diagrams with no added value so I simplified it to be an absolute numbering and it happened to start at 0.

So when we see that block 3 has been mined, we can assume that any further transactions should come into the second range. Given that block 0 is the genesis block, it is probably not very helpful for me to be including it in the first range so I'll change it the numbering system to start at 1.

The indexing wasn't the problem. Starting from zero was fine for me - we were referring to the same block.

I think you just spelt "Detection of block height moving to 4" as "receive receipts of block 3".

If instead you had said "confirmation of block 3 being mined, so the next block is now block 4" I'd have been all square.

peterbroadhurst · 2024-12-06T23:49:27Z

doc-site/docs/architecture/distributed_sequencer.md

+
+The final transmission from node A, as coordinator, is a message to node B with details of the speculative domain context predicting confirmation of transactions `A-2` , `B-2`, `C-2` and `D-2` .  This includes the state IDs for any new states created by those transactions and any states that are spent by those transaction.  This message also includes the transaction hash for the final transaction that is submitted on the sequence coordinated by node A.  Node B needs to receive this message so that it can proceed with maximum efficiency. Therefore this message is sent with an assured delivery quality of service (similar to state distribution).  
+
+Meanwhile, all nodes delegate transactions  `A-3` , `B-3`, `C-3` and `D-3` to node B.  


I'm struggling to understand the assumptions for this "meanwhile".

What makes it safe to send anything to B, if you are not absolutely certain that A has transferred your previous transactions to B?

The message from C->B that is sent by C because DelegationReturned was received by C could (in fact in the above sequence is reasonably likely) to reach B, before the message from A->B that provides information about B's previous transactions.

I wonder, given these facts:

DelegationReturned will be sent by A->C after any message (sorry couldn't see the name) for I've taken ownership of submission for C-2 is sent from A->C

The sender is responsible for their transactions

The sender thus will know at the point it sends C-3 to B, that C-2 has reached the point of confirmed delegation on A

... whether C should just package up the information about which confirmed delegations it has on A, up to B.

peterbroadhurst · 2024-12-08T19:02:22Z

doc-site/docs/architecture/distributed_sequencer.md

+TODO
+
+ - need more detail on precisely _how_  nodes B, C and D know that transactions `B-2``A-2` , `B-2`, `C-2` and `D-2` are past the point of no return and that transactions `A-3` , `B-3`, `C-3` and `D-3` do need to be re-delegated.
+ - need more detail on precisely _how_ node B can continue to coordinate the assembly of transactions `A-3` , `B-3`, `C-3` and `D-3` in a domain context that is aware of the speculative states locks from transactions `B-2``A-2` , `B-2`, `C-2` and `D-2` 


If we start to use domain context as a specification level thing (vs. just an implementation detail to meet the requirements of the spec) we will need to define what it is and how it works.

peterbroadhurst · 2024-12-08T19:03:17Z

doc-site/docs/architecture/distributed_sequencer.md

+
+ - need more detail on precisely _how_  nodes B, C and D know that transactions `B-2``A-2` , `B-2`, `C-2` and `D-2` are past the point of no return and that transactions `A-3` , `B-3`, `C-3` and `D-3` do need to be re-delegated.
+ - need more detail on precisely _how_ node B can continue to coordinate the assembly of transactions `A-3` , `B-3`, `C-3` and `D-3` in a domain context that is aware of the speculative states locks from transactions `B-2``A-2` , `B-2`, `C-2` and `D-2` 
+ - It might be more productive for the next level of detail on these points to come in the form of a proposal in code.


Some code to experiment with behavior is great... we will need to come back and formally address the behavior of the specification afterwards.

doc-site/docs/architecture/distributed_sequencer.md

peterbroadhurst · 2024-12-08T19:06:28Z

doc-site/docs/architecture/distributed_sequencer.md

+ - If the sender node is ahead, it continues to retry the delegation until the delegate node finally catches up and accepts the delegation
+ - If the sender node is behind, it waits until its block indexer catches up and then selects the coordinator for the new range
+ - Coordinator node will continue to coordinate ( send endorsement requests and submit endorsed transactions to base ledger) until its block indexer has reached a block number that causes the coordinator selector to select a different node.
+ - at that time, it waits until all dispatched transactions are confirmed on chain, then delegates all current inflight transactions to the new coordinator.


Some nesting of sub-bullets would help readability here

Also, see comments above on my questions around "delegates all current inflight transactions to the new coordinator" being between coordinators. e.g. in the algorithm as described sometimes its the sender that delegates, and sometimes its another coordinator, and I'm not sure we've worked through that fully yet (or why we've chosen that approach vs. it's always the sender that delegates).

peterbroadhurst · 2024-12-08T19:09:23Z

doc-site/docs/architecture/distributed_sequencer.md

+ - Coordinator node will continue to coordinate ( send endorsement requests and submit endorsed transactions to base ledger) until its block indexer has reached a block number that causes the coordinator selector to select a different node.
+ - at that time, it waits until all dispatched transactions are confirmed on chain, then delegates all current inflight transactions to the new coordinator.
+ - if the new coordinator is not up to date with block indexing, then it will reject and the delegation will be retried until it catches up. 
+ - while a node is the current selected coordinator, it sends endorsement requests to every other node for every transaction that it is coordinating


~~This seems like an important part of the algorithm for sure, but I'm not sure why the spec for it is under variation in block height~~

Ok - understood now with the below bullet

if not, then it rejects the endorsement and includes its view of the current block number in the rejection message

Maybe I'm wondering if this fact should have been mentioned earlier.

I did a search and endorsement is mentioned a lot earlier as something that comes out of the spec, but not really in these concrete terms.

peterbroadhurst · 2024-12-08T19:13:05Z

doc-site/docs/architecture/distributed_sequencer.md

+
+#### Simple happy path
+
+```mermaid


Very cool 🚀

peterbroadhurst · 2024-12-08T19:15:10Z

doc-site/docs/architecture/distributed_sequencer.md

+
+### Sender's responsibility
+
+The sender node for any given transaction remains ultimately responsible for ensuring that transaction is successfully confirmed on chain or finalized as failed if it is not possible to complete the processing for any reason.  While the coordination of assembly and endorsement is delegated to another node, the sender continues to monitor the progress and is responsible for initiating retries or re-delegation to other coordinator nodes as appropriate.


or re-delegation

Again we have the inconsistency here. This paragraph states that re-delegation is the sender's responsibility, but as discussed in comments above we seem to have split this responsibility between sender and coordinator.

As I've mentioned above, I think it would be a simpler to understand spec if it was always the sender's responsibility as this paragraph indicates.

peterbroadhurst · 2024-12-08T19:16:45Z

doc-site/docs/architecture/distributed_sequencer.md

+Feedback available to the sender node that can be monitored to track the progress or otherwise of the transaction submission:
+
+ - when the sender node is choosing the coordinator, it may have recently received a heartbeat message from the preferred coordinator or an alternative coordinator
+ - when sending the delegation request to the coordinator, the sender node expects to receive an acknowledgement that the request has been received.  This is not a guarantee that the transaction will be completed.  At this point, the coordinator has only an in-memory record of that delegated transaction


I'm not clear so far whether if a node has A-1 and A-2, if it is responsible for waiting for the ack of A-1 being delegated, before delegating A-2. This is important in edge cases such as failover, as discussed above

yes. I need to add some details generally to explain how explicit dependencies are handled.

peterbroadhurst · 2024-12-08T19:18:05Z

doc-site/docs/architecture/distributed_sequencer.md

+ - when the sender node is choosing the coordinator, it may have recently received a heartbeat message from the preferred coordinator or an alternative coordinator
+ - when sending the delegation request to the coordinator, the sender node expects to receive an acknowledgement that the request has been received.  This is not a guarantee that the transaction will be completed.  At this point, the coordinator has only an in-memory record of that delegated transaction
+ - coordinator heartbeat messages.  The payload of these messages contains a list of transaction IDs that the coordinator is actively coordinating
+ - transaction confirmation request.  Once the coordinator has fulfilled the attestation plan, it sends a message to the transaction sender requesting permission to dispatch. If, for any reason, the sender has already re-delegated to another coordinator, then it will reject this request otherwise, it will accept.


This seems to answer some of my earlier questions, but it also seems inconsistent again. This implies it's the sender that re-delegates (whereas the above diagrams implied some direct old-coordinator <-> new-coordinator comms)

peterbroadhurst · 2024-12-08T19:21:09Z

doc-site/docs/architecture/distributed_sequencer.md

+
+Decisions and actions that need to be taken by the sender node
+
+ - When a user sends a transaction intent (`ptx_sendTransaction` or `ptx_prepareTransaction`), the sender node needs to chose which coordinator to delegate to.


A detail point - but no transaction can be sent to a coordinator, until any referenced dependencies have been confirmed.

peterbroadhurst · 2024-12-08T19:24:25Z

doc-site/docs/architecture/distributed_sequencer.md

+ - If the coordinator node seems to have forgotten about the transaction, then the sender node needs to decide to re-delegate it
+ - If the preferred coordinator node becomes unavailable then the sender node needs to decide which alternative coordinator to delegate to


I don't understand the difference between detection of "becomes unavailable" or "seems to have forgotten about it". I need a bit of help to understand.

I know in the spec, when a coordinator has a transaction it is responsible for regularly sending an "I'm still coordinating transaction X for contract Y" (referred to as "heartbeat") message. So one clear thing a sender can do is detect that it hasn't received an expected "heartbeat". But in that situation how would it know the difference between the coordinator having restarted and forgotten, vs. being unavailable.

peterbroadhurst · 2024-12-08T19:25:18Z

doc-site/docs/architecture/distributed_sequencer.md

+ - If the block height changes and there is a new preferred coordinator as per the selection algorithm then the sender node needs to decide whether to delegate the transaction to it. This will be dependent on whether the transaction has been dispatched or not.
+ - If the coordinator node seems to have forgotten about the transaction, then the sender node needs to decide to re-delegate it
+ - If the preferred coordinator node becomes unavailable then the sender node needs to decide which alternative coordinator to delegate to
+ - If a transaction has been delegated to an alternative coordinator and the preferred coordinator becomes available again, then the sender needs to decide to re-delegate to the preferred coordinator


Covered in a detail question earlier this exact scenario. It was unclear to me what would cause a sender to find out "the preferred coordinator becomes available again"

peterbroadhurst · 2024-12-08T19:45:17Z

doc-site/docs/architecture/distributed_sequencer.md

+    google.protobuf.Timestamp timestamp = 2;
+    string idempotency_key = 3;
+    string contract_address = 4;
+    repeated string transaction_ids = 5;


This is great. I assume this is only the transaction IDs that have been delegated by the sender this heartbeat is targeted to?

I am proposing that it may include other transactions and those should be ignored by the receiver of the message.

peterbroadhurst · 2024-12-08T19:46:57Z

doc-site/docs/architecture/distributed_sequencer.md

+##### <a name="message-transaction-delegation-rejected"></a>Transaction delegation rejected
+
+The handling of a delegation rejected message depends on the reason for rejection.
+ - if the reason is `MismatchedBlockHeight` and the target delegate is ahead then a new delegation request is sent once the sender has reached a compatible block height.  Compatible block height is defined as a block height in the same block range. 


Ok - think this is confirmation of my above point - the algorithm above is working to "is compatible block height?" (not "same") semantics.

peterbroadhurst · 2024-12-08T19:58:07Z

doc-site/docs/architecture/distributed_sequencer.md

+
+##### <a name="message-transaction-delegation-accepted"></a>Transaction delegation accepted
+
+If a node receives a `DelegationAccepted` message then it should start to monitor the continued acceptance of that delegation.  It can expect to receive `HeartbeatNotification` messages from the delegate node and for those messages to include the id of the delegated transaction. The `sender` node cannot assume that the `coordinator` node will persist the delegation request.  If the heartbeat messages stop or if the received heartbeat messages do not contain the expected transaction ids, then the sender should retrigger the `HandleTransaction` process to cause the transaction to be re-delegated either to the same delegate, or new delegate or to be coordinated by the sender node itself. Whichever is appropriate for the current point in time.  


Digressing to an implementation point here... how do we think code could reasonably be constructed for this?

It seems like a sender node will have:

A database with zero-or-more (maybe 1million) transactions it wishes to get submitted

Each transaction will be on the same of different contract address (maybe 1million contract addresses)

Each contract address has a different coordinator committee

Each transaction might be in one of a number of states (a state machine diagram would be awesome for this)

Initial state - candidate to be delegated to someone

Delegated to self - I'm the coordinator so I don't need to healthcheck myself

Delegated to remote, but not yet requested submission

Delegated to remote, submission requested, but submission not occurred

Delegated to remote, submission requested, submission approved, but not yet confirmed

Delegated to remote, submission confirmed

While in any of these states, we might detect a the block range changing means this is the "wrong" coordinator

Done. No need to continue tracking

The decision on whether an action should be performed for any transaction in this DB could be triggered by:

A message from the active coordinator for that txn (we have specs for these starting to finalize in this doc)

The lack of a message from the active coordinator fro that txn (the subject of this section)

The arrival of a new confirmed block

The arrival of an on-chain event (from a confirmed block) confirming a transaction

The completion of an earlier transaction that directly (through dependency) or indirectly (through memory-management of "in-fight" transactions for that contract) unlocks processing

In our code in Paladin we've chosen to have evaluation loops against the DB and in-memory state, working at various levels. I guess I'm writing this question, because I'm not 100% sure how to correlate those code modules to the state machine(s) implied by this specification that I can see how it fully scales to "mempool management" of a large sea of transactions with efficient DB query and in-memory management.

Potentially this whole question should be moved out of this issue on refinement of a formal specification, into a separate one that is about updating our code to be an complete high-performance implementation of the spec.

peterbroadhurst

Going to pause my review here @hosie because I think there's a lot of outstanding conversations in what I've done so far that is affecting my ability to understand some of the more detailed items like the flow charts.

hosie · 2024-12-09T11:02:02Z

Thanks @peterbroadhurst these are a great points. I feel bad that that you had trudge through some basic inconsistencies in the writing resulting from in flight course adjustment but hopefully that does eventually help to share the journey in the thinking rather than just the final destination.

I'll focus on fixing the inconsistencies first so that the current draft is at least a coherent strawman within itself and then we can pick up the following key conversations points to decide what significant changes are needed:

Identity Vs Node as committee members. I think this is a short conversation. I don't have any strong objections to moving completely to Identity as committee members. Just need to work that through.

How much responsibility is actually delegated. At one point I was of the thinking that when a sender node delegates to the coordinator then it is up to the coordinator to make sure the transaction eventually gets submitted, even if that means delegating onwards to yet another coordinator. I really struggled with this idea as we started to consider fault tolerance. It requires the coordinator to persist the delegation and if the coordinator becomes permanently unavailable all in-flight transactions are now stranded). I now prefer the model where the sender remains responsible for ensuring that all of its transactions are actively being coordinated and it is the senders responsibility to delegate to a different coordinator if necessary. If a node becomes permanently unavailable, the only transactions that suffer are its own transaction. This re-think is the cause for a lot of the inconsistencies that you found so I'll clean that up.

A consequence of this decision is that it creates a possibility where the sender has delegated to a new coordinator but has lost connection to the old coordinator which thinks it is still the coordinator. So this increases the probably of double submission and I have added a sync point into the protocol to mitigate that. This protocol point ( dispatchConfirmation ) could be a big conversation in itself but I do think it is worth tracking as a sub conversation of the delegate vs abdicate discussion.

Another consequence of this decision is that it is now the responsibility of the sender to detect when the preferred coordinator for the current range comes back online. That hasn't been worked through in the write up yet and I think we need a better strawman before we can have a useful convo on this.

Message ID / Correl ID etc
Do we need these? does the DSP care about them or are they just a utility for the transport plugin?

Feasibility of efficient, scalable, implementation
I do think your points about implementation are useful here. Agree at some point, the detail implementation convo should move to another issue but if we can not convince ourselves (at least on paper) that the first reference implementation could be a high performing solution, then I think the spec would be broken.

Hot reconfiguration
This is not something I have given a ton of thought but agree that it is valuable to think about, at least as a thought experiment to test out the spec before we finalize it.

Co-authored-by: Peter Broadhurst <[email protected]> Signed-off-by: John Hosie <[email protected]>

Signed-off-by: John Hosie <[email protected]>

github-actions · 2025-01-03T02:00:09Z

This PR is stale because it has been open 30 days with no activity.

github-actions · 2025-01-09T02:14:19Z

This PR is stale because it has been open 30 days with no activity.

distributed sequencer algorithm

b4f2382

Signed-off-by: John Hosie <[email protected]>

hosie mentioned this pull request Dec 3, 2024

Coordinator selector / delegation policy for multi-party endorsement domains #329

Open

hosie added 5 commits December 3, 2024 19:04

formatting

2e287ac

Signed-off-by: John Hosie <[email protected]>

formatting

1ee3dd1

Signed-off-by: John Hosie <[email protected]>

formatting

ff08831

Signed-off-by: John Hosie <[email protected]>

merge branch 'main' into distributed-sequencer-spec

e6d29c7

Signed-off-by: John Hosie <[email protected]>

messages and handlers

29b2c54

Signed-off-by: John Hosie <[email protected]>

peterbroadhurst mentioned this pull request Dec 6, 2024

distributed transaction management docs #276

Closed

peterbroadhurst reviewed Dec 6, 2024

View reviewed changes

doc-site/docs/architecture/distributed_sequencer.md Outdated Show resolved Hide resolved

peterbroadhurst reviewed Dec 6, 2024

View reviewed changes

doc-site/docs/architecture/distributed_sequencer.md Show resolved Hide resolved

peterbroadhurst reviewed Dec 6, 2024

View reviewed changes

doc-site/docs/architecture/distributed_sequencer.md Outdated Show resolved Hide resolved

peterbroadhurst reviewed Dec 6, 2024

View reviewed changes

doc-site/docs/architecture/distributed_sequencer.md Outdated Show resolved Hide resolved

peterbroadhurst reviewed Dec 6, 2024

View reviewed changes

peterbroadhurst reviewed Dec 8, 2024

View reviewed changes

doc-site/docs/architecture/distributed_sequencer.md Outdated Show resolved Hide resolved

peterbroadhurst reviewed Dec 8, 2024

View reviewed changes

doc-site/docs/architecture/distributed_sequencer.md

#### Simple happy path

```mermaid

Copy link

Contributor

peterbroadhurst Dec 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool 🚀

peterbroadhurst reviewed Dec 8, 2024

View reviewed changes

hosie and others added 17 commits December 9, 2024 11:02

Update doc-site/docs/architecture/distributed_sequencer.md

46d75c5

Co-authored-by: Peter Broadhurst <[email protected]> Signed-off-by: John Hosie <[email protected]>

Update doc-site/docs/architecture/distributed_sequencer.md

daf04c2

Co-authored-by: Peter Broadhurst <[email protected]> Signed-off-by: John Hosie <[email protected]>

Update doc-site/docs/architecture/distributed_sequencer.md

d60ab04

Co-authored-by: Peter Broadhurst <[email protected]> Signed-off-by: John Hosie <[email protected]>

Update doc-site/docs/architecture/distributed_sequencer.md

43fb85e

Co-authored-by: Peter Broadhurst <[email protected]> Signed-off-by: John Hosie <[email protected]>

Update doc-site/docs/architecture/distributed_sequencer.md

781717d

Co-authored-by: Peter Broadhurst <[email protected]> Signed-off-by: John Hosie <[email protected]>

fix inconsistencies

14b1e8b

Signed-off-by: John Hosie <[email protected]>

Formatting, grammar, readability fixes and typos

f6f288a

Signed-off-by: John Hosie <[email protected]>

inconsistency

2f1419b

Signed-off-by: John Hosie <[email protected]>

diagram ordering

2cf222f

Signed-off-by: John Hosie <[email protected]>

move descriptions inline with sequence diagrams

64905c2

Signed-off-by: John Hosie <[email protected]>

remove dispatch notification

f8e0f31

Signed-off-by: John Hosie <[email protected]>

remove DelegationReturned messages

b3807f2

Signed-off-by: John Hosie <[email protected]>

availabiliy tweaks

2530639

Signed-off-by: John Hosie <[email protected]>

state diagrams

52aa204

Signed-off-by: John Hosie <[email protected]>

add delegating state

12fd704

Signed-off-by: John Hosie <[email protected]>

submitter heartbeat

d0578f1

Signed-off-by: John Hosie <[email protected]>

begining of a test spec

1f47aa5

Signed-off-by: John Hosie <[email protected]>

github-actions bot added the stale label Jan 3, 2025

hosie self-assigned this Jan 8, 2025

hosie added paladin and removed stale labels Jan 8, 2025

peterbroadhurst mentioned this pull request Jan 8, 2025

bug: transaction with no input/output states was not delegated properly #502

Closed

github-actions bot added the stale label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distributed sequencer algorithm #463

distributed sequencer algorithm #463

hosie commented Dec 3, 2024 •

edited

Loading

peterbroadhurst Dec 6, 2024 •

edited

Loading

peterbroadhurst Dec 6, 2024

peterbroadhurst Dec 6, 2024

peterbroadhurst Dec 6, 2024

peterbroadhurst Dec 6, 2024 •

edited

Loading

peterbroadhurst Dec 6, 2024 •

edited

Loading

peterbroadhurst Dec 6, 2024

hosie Dec 9, 2024 •

edited

Loading

peterbroadhurst Dec 12, 2024

peterbroadhurst Dec 6, 2024

peterbroadhurst Dec 8, 2024

peterbroadhurst Dec 8, 2024

peterbroadhurst Dec 8, 2024

peterbroadhurst Dec 8, 2024 •

edited

Loading

peterbroadhurst Dec 8, 2024 •

edited

Loading

peterbroadhurst Dec 8, 2024

peterbroadhurst Dec 8, 2024

peterbroadhurst Dec 8, 2024

hosie Dec 9, 2024

peterbroadhurst Dec 8, 2024

peterbroadhurst Dec 8, 2024

peterbroadhurst Dec 8, 2024

peterbroadhurst Dec 8, 2024

peterbroadhurst Dec 8, 2024

hosie Dec 9, 2024

peterbroadhurst Dec 8, 2024

peterbroadhurst Dec 8, 2024

peterbroadhurst Dec 8, 2024 •

edited

Loading

peterbroadhurst left a comment

hosie commented Dec 9, 2024 •

edited

Loading

github-actions bot commented Jan 3, 2025

github-actions bot commented Jan 9, 2025


		### Coordinator selection

		This is a form of leader election but is achieved through deterministic calculation for a given point in time ( block height), given predefined members of the group, and a common (majority) awareness of the current availability of each member. So there is no actual "election" term needed.


		This is a form of leader election but is achieved through deterministic calculation for a given point in time ( block height), given predefined members of the group, and a common (majority) awareness of the current availability of each member. So there is no actual "election" term needed.

		Each node is allocated positions on a hash ring. These positions are a function of the node name. Given that the entire committee agrees on the names of all nodes, all nodes independently arrive at the same decision about the positions of all nodes.


		When a node starts acting as the coordinator, then it periodically sends a heartbeat message to all nodes that for which it has an active delegation and will accept delegation requests from any other node in the group.

		All sender nodes keep track of the current selected coordinator ( either by reevaluating the deterministic function periodically or caching the result until block height exceeds the current range boundary). If they fail to receive a heartbeat message from the current coordinator, then they will chose the next closest node in the hashring (e.g. by reevaluating the hashring function with the unavailable nodes removed).


		![image](../images/distributed_sequencer_switchover_frame1.svg){ width="500" }

		As all nodes in the group ( including node A) receive receipts of block 3, they will recognize node B as the new coordinator


		The final transmission from node A, as coordinator, is a message to node B with details of the speculative domain context predicting confirmation of transactions `A-2` , `B-2`, `C-2` and `D-2` . This includes the state IDs for any new states created by those transactions and any states that are spent by those transaction. This message also includes the transaction hash for the final transaction that is submitted on the sequence coordinated by node A. Node B needs to receive this message so that it can proceed with maximum efficiency. Therefore this message is sent with an assured delivery quality of service (similar to state distribution).

		Meanwhile, all nodes delegate transactions `A-3` , `B-3`, `C-3` and `D-3` to node B.


		### Sender's responsibility

		The sender node for any given transaction remains ultimately responsible for ensuring that transaction is successfully confirmed on chain or finalized as failed if it is not possible to complete the processing for any reason. While the coordination of assembly and endorsement is delegated to another node, the sender continues to monitor the progress and is responsible for initiating retries or re-delegation to other coordinator nodes as appropriate.


		Decisions and actions that need to be taken by the sender node

		- When a user sends a transaction intent (`ptx_sendTransaction` or `ptx_prepareTransaction`), the sender node needs to chose which coordinator to delegate to.

		- If the coordinator node seems to have forgotten about the transaction, then the sender node needs to decide to re-delegate it
		- If the preferred coordinator node becomes unavailable then the sender node needs to decide which alternative coordinator to delegate to


		##### <a name="message-transaction-delegation-accepted"></a>Transaction delegation accepted

		If a node receives a `DelegationAccepted` message then it should start to monitor the continued acceptance of that delegation. It can expect to receive `HeartbeatNotification` messages from the delegate node and for those messages to include the id of the delegated transaction. The `sender` node cannot assume that the `coordinator` node will persist the delegation request. If the heartbeat messages stop or if the received heartbeat messages do not contain the expected transaction ids, then the sender should retrigger the `HandleTransaction` process to cause the transaction to be re-delegated either to the same delegate, or new delegate or to be coordinated by the sender node itself. Whichever is appropriate for the current point in time.

distributed sequencer algorithm #463

Are you sure you want to change the base?

distributed sequencer algorithm #463

Conversation

hosie commented Dec 3, 2024 • edited Loading

peterbroadhurst Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterbroadhurst Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

peterbroadhurst Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hosie Dec 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterbroadhurst Dec 8, 2024 • edited Loading

Choose a reason for hiding this comment

peterbroadhurst Dec 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterbroadhurst Dec 8, 2024 • edited Loading

Choose a reason for hiding this comment

peterbroadhurst left a comment

Choose a reason for hiding this comment

hosie commented Dec 9, 2024 • edited Loading

github-actions bot commented Jan 3, 2025

github-actions bot commented Jan 9, 2025

hosie commented Dec 3, 2024 •

edited

Loading

peterbroadhurst Dec 6, 2024 •

edited

Loading

peterbroadhurst Dec 6, 2024 •

edited

Loading

peterbroadhurst Dec 6, 2024 •

edited

Loading

hosie Dec 9, 2024 •

edited

Loading

peterbroadhurst Dec 8, 2024 •

edited

Loading

peterbroadhurst Dec 8, 2024 •

edited

Loading

peterbroadhurst Dec 8, 2024 •

edited

Loading

hosie commented Dec 9, 2024 •

edited

Loading