Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP for dynamic coordinator selector #333

Merged
merged 94 commits into from
Nov 21, 2024
Merged

Conversation

hosie
Copy link
Contributor

@hosie hosie commented Oct 24, 2024

First steps towards implementing the dynamic coordinator selection algorithm for #329
and a bunch of other fixes required to achieve the concurrency and longevity needed to test this behaviour.

This PR introduces the coordinatorSelector component which is called before and then again after a new private transaction has been assembled and decides whether the assembling node should be responsible for coordinating the endorsement flow and submission to base ledger for this transaction or whether it should be delegated.

The decision is based on the configuration of the domain ( e.g. zeto is always coordinated by sender node, noto is always coordinated by notary node and pente choses one of the nodes in the privacy group).

Ultimately, the intent is for pente ( and other future similar domains) to dynamically rotate the coordinator responsibility in the interest of fairness and sharing the load. However, there are currently some gaps that need to be filled in order to reliably switch coordinators and those fixes are outwith the scope of this PR. So, for now, a static coordinator is chosen through a deterministic function - for each pente privacy group.

Other fixes that are included in here that were needed to get us to the point of being able to testing that this PR achieves its actual goal ( of enabling multiple concurrent nodes submitting multiple successive transactions for a period of tim)

  • Assembly is single threaded across all nodes for a given contract address
  • the domain context of the coordinator is shared with assembling nodes so that transactions can speculative spend unconfirmed states and can avoid attempting to double spend unconfirmed spends.
  • removed persistent syncpoint on delegation. This was just causing problems with foreign key constraints and race condition and in any case. the assembling node would always reassembly the transaction from scratch if it was restarted so there is no point in making this a persistence syncpoint handover. Instead, we will evolve toward a strategy where the sender node should send a reminder periodically if a delegated transaction is not confirmed. This latter point remains a TODO for follow up PR.
  • improved logging and status report ( debug_getTransactionStatus output) for private transaction manager

Also includes everything from #441

Still need to solve: ( for follow up PRS)

  • Restart recovery. On restart, load all pending private transactions from the TxManager's DB tables into the sequencer's in memory graph ( this could be a case of a periodic , idempotent, push of all pending private transactions from TxManger to PrivateTxManger.HandleNewTx
  • Queue full recovery. In cases where TxManager is unable to deliver a new transaction to PrivateTxManager because the target sequencer's event loop channel is full, then when it drains to a certain level, load those missed transactions into the sequencer
    • TBD: for all other events going into the event loop, it should be ok to lose them because the retry / reliable delivery handshakes will eventually kick in
  • Flush on range boundary handover
    • when coordinator switches from one node to another on a range boundary we should have a controlled mechanism for flushing through the transactions that are past the dispatch stage and transferring ownership of transactions that have not yet been dispatched. Otherwise, we have potential wasteful processing where some transactions are reverted and need to go through the re-assembled retry loop
      • TBD: need to do some analysis to determine if the gap here is non functional or is there a case where valid transactions become unprocessable and/or domain contexts get corrupted?
  • Don’t throw away the speculative assemble - in case of noto, at least, that is probably a good one
    • when we delegate to a remote coordinator, it discards the previous PostAssembly which is potentially wasteful
  • Signing key rotation. Currently we use the same signing key for all transactions submitted by a given coordinator. For anonymity, we should rotate this key. this means that transactions of a given sequence that are submitted on the new key must be blocked until we receive confirmation that the last transaction on the old key has been included in a block.

hosie added 23 commits October 24, 2024 02:25
Signed-off-by: John Hosie <[email protected]>
Signed-off-by: John Hosie <[email protected]>
Signed-off-by: John Hosie <[email protected]>
Signed-off-by: John Hosie <[email protected]>
Signed-off-by: John Hosie <[email protected]>
Signed-off-by: John Hosie <[email protected]>
Signed-off-by: John Hosie <[email protected]>
Signed-off-by: John Hosie <[email protected]>
Signed-off-by: John Hosie <[email protected]>
Signed-off-by: John Hosie <[email protected]>
@hosie hosie mentioned this pull request Nov 5, 2024
hosie added 6 commits November 8, 2024 08:53
Signed-off-by: John Hosie <[email protected]>
Signed-off-by: John Hosie <[email protected]>
Signed-off-by: John Hosie <[email protected]>
Signed-off-by: John Hosie <[email protected]>
Signed-off-by: John Hosie <[email protected]>
Signed-off-by: Peter Broadhurst <[email protected]>
Signed-off-by: Peter Broadhurst <[email protected]>
peterbroadhurst and others added 21 commits November 18, 2024 16:18
Signed-off-by: Peter Broadhurst <[email protected]>
Signed-off-by: Peter Broadhurst <[email protected]>
Signed-off-by: Peter Broadhurst <[email protected]>
Signed-off-by: Peter Broadhurst <[email protected]>
Signed-off-by: Peter Broadhurst <[email protected]>
…-dbtx

Refactor nonce assignment to post-submit of public transactions
@hosie hosie marked this pull request as ready for review November 21, 2024 18:58
Copy link
Contributor

@peterbroadhurst peterbroadhurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hosie - I've gone back through the change set in this PR to validate it just contains the overall set of changes that have been under significant test and enhancement through the past weeks.

I believe there are some tweaks you would like to make on the overall PR summary before it merges, but I'm leaving you the approval here as I don't think there are any code changes needed before merge.

Looking forward to seeing this great set of improvements merge 👍

@hosie hosie merged commit d18833d into main Nov 21, 2024
7 checks passed
@hosie hosie deleted the dynamic-coorinator-selection branch November 21, 2024 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants