Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi: explicitly acknowledge successful HTLC dispatch #14

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

calvinrzachman
Copy link
Owner

@calvinrzachman calvinrzachman commented Jan 5, 2025

Change Description

We expand the ChannelRouter to handle new failure modes when SendHTLC is implemented using RPCs over a network - as is the case when Router and Switch run in separate processes.

The ChannelRouter needs to distinguish between an attempt that was initiated successfully and an attempt for which it is not known whether it initiated successfully. It must attempt to track the result only for attempts which are known to have been initiated successfully. Otherwise, since we base retry decision of the result of attempt tracking, we risk duplicate attempts being made if SendHTLC is implemented via RPC between two processes communicating over async network.

  • Here “initiated successfully” means an explicit response/acknowledgement from the backend server processing the request. Network/gRPC Errors alone do not appear sufficient to make this determination.
  • The persisting of the acknowledgement from the Switch server (local or remote) can be done on a best effort basis.
  • NOTE: This requires an lnd change. It allows us to handle a class of failure mode that is not present when Router + Switch run in the same process. The acknowledgement is not needed for correct function when the Router + Switch run in the same process - because if SendHTLC errors, the HTLC is guaranteed to not have been nor will it be in-flight.

This sequence—RegisterAttempt, SendHTLC, and AcknowledgeAttempt—is helpful in a distributed environment to:

  1. Persist the intent to send HTLCs (RegisterAttempt).
  2. Execute the actual HTLC delivery (SendHTLC).
  3. Resolve ambiguity about the HTLC’s state (AcknowledgeAttempt) in a manner that survives restarts.

On restart, query the ControlTower for in-flight attempts:

  1. Acknowledged Attempts: Launch resultCollector (calls GetAttemptResult) to track these attempts.
  2. Non-Acknowledged Attempts: Retry SendHTLC to resolve ambiguity.
  • NOTE: GetAttemptResult CANNOT be used to resolve the ambiguity when communication happens over the network!

TODO

  • Determine whether the Switch's SendHTLC method is idempotent! If it is not, then this new startup procedure could risk sending a duplicate.
    • Current lnd logic (in CircuitMap) only protects against attempt ID reuse for in-flight HTLCs.
    • After an HTLC is resolved, attempt ID reuse is technically possible - at least as far as the CircuitMap is concerned.
    • The SendHTLC method will not stop attempt ID re-use!
  • Defaulting all attempts to ACKNOWLEDGED is safe for local router + switch, but unsafe for remote router + switch.
  • Defaulting all attempts to UNACKD is safe for remote router + switch, but unsafe for local router + switch.
    • Could try to run alternate router startup only when lnd is built/configured to be used with remote ChannelRouter.
  • Any migration needed?

This allows an attempt to be marked as successfully
sent by the caller and is useful for resolving ambiguity
on startup as to whether an HTLC was succesfully sent
or whether it still needs to be sent.

The ambiguity arises when our HTLC dispatching process
plays out via RPC over the netowrk rather than in-process.
The ChannelRouter needs to distinguish on startup
between an attempt that was initiated successfully
and an attempt for which it is not known whether it
initiated successfully. It must attempt to track the
result only for attempts which are known to have been
initiated successfully. Otherwise, since we base retry
decision of the result of attempt tracking, we risk
duplicate attempts being made if SendHTLC is implemented
via RPC between two processes communicating over async
network.

This persisting of the acknowledgement from the Switch
server (local or remote) can be done on a best effort basis.

NOTE: The acknowledgement is not needed for correct function
when the Router + Switch run in the same process.
Copy link

github-actions bot commented Jan 5, 2025

Pull reviewers stats

Stats of the last 30 days for lnd:

User Total reviews Time to review Total comments

}

// Generate the raw encoded sphinx packet to include with the HTLC.
onionBlob, _, err := GenerateSphinxPacket(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be generateSphinxPacket?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants