Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[custom channels]: channel arbitrator getting stuck on startup #9323

Closed
guggero opened this issue Nov 29, 2024 · 3 comments · Fixed by #9324
Closed

[custom channels]: channel arbitrator getting stuck on startup #9323

guggero opened this issue Nov 29, 2024 · 3 comments · Fixed by #9324
Assignees
Labels
bug Unintended code behaviour custom chans init Issues related to LND startup taproot
Milestone

Comments

@guggero
Copy link
Collaborator

guggero commented Nov 29, 2024

Similar issue to the one fixed in #9253, but slightly different state:

2024-11-29 10:15:07.184 [DBG] CNCT: Starting ChannelArbitrator(ce3ab677a419e40558d095faf7165ba143467c1dc5691632ad511224473eabc9:0), htlc_set=(map[contractcourt.HtlcSetKey]contractcourt.htlcSet) (len=2) {
 (contractcourt.HtlcSetKey) LocalHtlcSet: (contractcourt.htlcSet) {
  incomingHTLCs: (map[uint64]channeldb.HTLC) {
  },
  outgoingHTLCs: (map[uint64]channeldb.HTLC) {
  }
 },
 (contractcourt.HtlcSetKey) RemoteHtlcSet: (contractcourt.htlcSet) {
  incomingHTLCs: (map[uint64]channeldb.HTLC) {
  },
  outgoingHTLCs: (map[uint64]channeldb.HTLC) {
  }
 }
}
, state=StateCommitmentBroadcasted
2024-11-29 10:15:07.185 [INF] NTFN: New confirmation subscription: conf_id=3, txid=e4a01568a0cd879df94cd6c9ae8143e1304bf1731ddee7003c6f12df4f7baf13, num_confs=1 height_hint=872448
2024-11-29 10:15:07.185 [DBG] NTFN: Dispatching historical confirmation rescan for txid=e4a01568a0cd879df94cd6c9ae8143e1304bf1731ddee7003c6f12df4f7baf13
2024-11-29 10:15:07.184 [DBG] CNCT: ChannelArbitrator(a96d2f5e6152efd9571a1888013866cfe962de885bd41c1d60c1cb7baf9ec3f9:0): attempting to resolve *contractcourt.anchorResolver
2024-11-29 10:15:07.185 [DBG] CNCT: ChannelArbitrator(a96d2f5e6152efd9571a1888013866cfe962de885bd41c1d60c1cb7baf9ec3f9:0): contract *contractcourt.anchorResolver not yet resolved
2024-11-29 10:15:07.185 [INF] SWPR: Sweep request received: out_point=e4a01568a0cd879df94cd6c9ae8143e1304bf1731ddee7003c6f12df4f7baf13:0, witness_type=TaprootAnchorSweepSpend, relative_time_lock=0, absolute_time_lock=0, amount=0.00000330 BTC, parent=(<nil>), params=(startingFeeRate={false 0}, immediate=false, exclusive_group=none, budget=0.00000330 BTC, deadline=none)
2024-11-29 10:15:07.186 [INF] CNCT: ChannelArbitrator(ce3ab677a419e40558d095faf7165ba143467c1dc5691632ad511224473eabc9:0): starting state=StateCommitmentBroadcasted, trigger=chainTrigger, triggerHeight=872455
2024-11-29 10:15:07.186 [DBG] CNCT: ChannelArbitrator(ce3ab677a419e40558d095faf7165ba143467c1dc5691632ad511224473eabc9:0): attempting state step with trigger=chainTrigger from state=StateCommitmentBroadcasted
2024-11-29 10:15:07.188 [DBG] TSVR: FetchLeavesFromCommit called, ourBalance=71778000 mSAT, theirBalance=26196000 mSAT, numHtlcs=0
2024-11-29 10:15:07.190 [DBG] TSVR: FetchLeavesFromCommit called, ourBalance=71778000 mSAT, theirBalance=26196000 mSAT, numHtlcs=0
2024-11-29 10:15:07.190 [DBG] LNWL: ChannelPoint(ce3ab677a419e40558d095faf7165ba143467c1dc5691632ad511224473eabc9:0): Restoring 0 dangling remote updates
2024-11-29 10:15:07.190 [DBG] LNWL: ChannelPoint(ce3ab677a419e40558d095faf7165ba143467c1dc5691632ad511224473eabc9:0): Restoring 0 local updates that the peer should sign
2024-11-29 10:15:07.192 [INF] CNCT: ChannelArbitrator(ce3ab677a419e40558d095faf7165ba143467c1dc5691632ad511224473eabc9:0): no HTLCs at stake, sweeping anchor with default deadline
2024-11-29 10:15:07.192 [INF] CNCT: ChannelArbitrator(ce3ab677a419e40558d095faf7165ba143467c1dc5691632ad511224473eabc9:0): offering anchor from local commitment 7a4c66554f168a903202ed29e2a13e1061b8c0f484a4be7a2ef8d69ceac382c5:1 to sweeper with deadline=None, budget=0.00000330 BTC
2024-11-29 10:15:07.192 [INF] SWPR: Sweep request received: out_point=7a4c66554f168a903202ed29e2a13e1061b8c0f484a4be7a2ef8d69ceac382c5:1, witness_type=TaprootAnchorSweepSpend, relative_time_lock=0, absolute_time_lock=0, amount=0.00000330 BTC, parent=(fee=0.00001366 BTC, weight=958 wu), params=(startingFeeRate={false 0}, immediate=false, exclusive_group=948947804208955392, budget=0.00000330 BTC, deadline=none)

Stuck goroutine:

goroutine 1288 [select, 110 minutes]:
github.com/lightningnetwork/lnd/sweep.(*UtxoSweeper).SweepInput(0xc00143a500, {0x3b64b20, 0xc002c36900}, {0xc0048023b0, {0x0, 0x0}, 0x14a, 0x0, {0x0, 0x0}})
	github.com/lightningnetwork/[email protected]/sweep/sweeper.go:519 +0x4b2
github.com/lightningnetwork/lnd/contractcourt.(*ChannelArbitrator).sweepAnchors.func1(0xc000a4c960, {0xc002ba5230?, 0xc002ba5200?}, {0x26f7cb3, 0x5})
	github.com/lightningnetwork/[email protected]/contractcourt/channel_arbitrator.go:1385 +0x582
github.com/lightningnetwork/lnd/contractcourt.(*ChannelArbitrator).sweepAnchors(0xc0008c3c08, 0xc002e0e348, 0xd5007)
	github.com/lightningnetwork/[email protected]/contractcourt/channel_arbitrator.go:1409 +0x152
github.com/lightningnetwork/lnd/contractcourt.(*ChannelArbitrator).stateStep(0xc0008c3c08, 0xd5007, 0x0, 0xc0000a4f60?)
	github.com/lightningnetwork/[email protected]/contractcourt/channel_arbitrator.go:1152 +0xeb0
github.com/lightningnetwork/lnd/contractcourt.(*ChannelArbitrator).advanceState(0xc0008c3c08, 0xd5007, 0x0, 0x0)
	github.com/lightningnetwork/[email protected]/contractcourt/channel_arbitrator.go:1615 +0x165
github.com/lightningnetwork/lnd/contractcourt.(*ChannelArbitrator).Start(0x226b4c0?, 0xc0005d72b0)
	github.com/lightningnetwork/[email protected]/contractcourt/channel_arbitrator.go:529 +0x4cf
github.com/lightningnetwork/lnd/contractcourt.(*ChainArbitrator).Start(0xc002af4cf0?)
	github.com/lightningnetwork/[email protected]/contractcourt/chain_arbitrator.go:785 +0x12b9
github.com/lightningnetwork/lnd.(*server).Start.func1()
	github.com/lightningnetwork/[email protected]/server.go:2146 +0x1bd5
sync.(*Once).doSlow(0x3134353530373432?, 0x6434222c22366662?)
	sync/once.go:74 +0xc2
sync.(*Once).Do(...)
	sync/once.go:65
github.com/lightningnetwork/lnd.(*server).Start(0xc004d1dfd0?)
	github.com/lightningnetwork/[email protected]/server.go:2008 +0x72
github.com/lightningnetwork/lnd.Main.func11()
	github.com/lightningnetwork/[email protected]/lnd.go:705 +0x25
created by github.com/lightningnetwork/lnd.Main in goroutine 565
	github.com/lightningnetwork/[email protected]/lnd.go:704 +0x3df4

Sweeper is stuck on waiting on the custom channel hook becoming ready (which is blocked on lnd startup):

goroutine 1350 [select, 110 minutes]:
github.com/lightninglabs/taproot-assets.(*Server).waitForReady(0xc0000ccbe0)
	github.com/lightninglabs/[email protected]/server.go:729 +0x6a
github.com/lightninglabs/taproot-assets.(*Server).ExtraBudgetForInputs(0xc0000ccbe0, {0xc0024d6e20, 0x2, 0x2})
	github.com/lightninglabs/[email protected]/server.go:1150 +0xed
github.com/lightningnetwork/lnd/sweep.(*BudgetInputSet).attachExtraBudget.func1({0x3b48c60, 0xc0000ccbe0})
	github.com/lightningnetwork/[email protected]/sweep/tx_input_set.go:212 +0x10c
github.com/lightningnetwork/lnd/fn.MapOptionZ[...](...)
	github.com/lightningnetwork/lnd/[email protected]/option.go:174
github.com/lightningnetwork/lnd/sweep.(*BudgetInputSet).attachExtraBudget(0xc002ca6780, {0x5a?, {0x3b48c60?, 0xc0000ccbe0?}})
	github.com/lightningnetwork/[email protected]/sweep/tx_input_set.go:210 +0x59
github.com/lightningnetwork/lnd/sweep.NewBudgetInputSet({0xc00244bb00, 0x2, 0x0?}, 0xd53f7, {0x4a?, {0x3b48c60?, 0xc0000ccbe0?}})
	github.com/lightningnetwork/[email protected]/sweep/tx_input_set.go:200 +0x2f5
github.com/lightningnetwork/lnd/sweep.(*BudgetAggregator).createInputSets(0xc0023e9b30, {0xc00244b980?, 0x2, 0x454630?}, 0xd53f7)
	github.com/lightningnetwork/[email protected]/sweep/aggregator.go:181 +0x391
github.com/lightningnetwork/lnd/sweep.(*BudgetAggregator).ClusterInputs(0xc0023e9b30, 0x0?)
	github.com/lightningnetwork/[email protected]/sweep/aggregator.go:119 +0x945
github.com/lightningnetwork/lnd/sweep.(*UtxoSweeper).sweepPendingInputs(0xc00143a500, 0xc0018881e0?)
	github.com/lightningnetwork/[email protected]/sweep/sweeper.go:1537 +0x2f
github.com/lightningnetwork/lnd/sweep.(*UtxoSweeper).collector(0xc00143a500, 0xc0064a4060)
	github.com/lightningnetwork/[email protected]/sweep/sweeper.go:650 +0x3e5
github.com/lightningnetwork/lnd/sweep.(*UtxoSweeper).Start.func1()
	github.com/lightningnetwork/[email protected]/sweep/sweeper.go:428 +0x8c
created by github.com/lightningnetwork/lnd/sweep.(*UtxoSweeper).Start in goroutine 1288
	github.com/lightningnetwork/[email protected]/sweep/sweeper.go:424 +0x157

Sounds like we do need something like #9262 after all.

@guggero guggero added bug Unintended code behaviour taproot init Issues related to LND startup custom chans labels Nov 29, 2024
@ziggie1984 ziggie1984 self-assigned this Nov 29, 2024
@yyforyongyu
Copy link
Member

In lnd the line is blocked here,

lnd/sweep/tx_input_set.go

Lines 200 to 202 in 0c9b655

if err := bi.attachExtraBudget(auxSweeper); err != nil {
return nil, err
}

Which calls the method in tapd,
https://github.com/lightninglabs/taproot-assets/blob/42055562ef09ee7e359c4ad470b3c67caa6a62ae/server.go#L1150-L1154

Why do we need to call waitForReady if this is an auxiliary method? Think all it does is to attach a piece of info to the input and there's no need to wait for the server to be ready?

@guggero
Copy link
Collaborator Author

guggero commented Dec 2, 2024

In lnd the line is blocked here,

lnd/sweep/tx_input_set.go

Lines 200 to 202 in 0c9b655

if err := bi.attachExtraBudget(auxSweeper); err != nil {
return nil, err
}

Which calls the method in tapd, https://github.com/lightninglabs/taproot-assets/blob/42055562ef09ee7e359c4ad470b3c67caa6a62ae/server.go#L1150-L1154

Why do we need to call waitForReady if this is an auxiliary method? Think all it does is to attach a piece of info to the input and there's no need to wait for the server to be ready?

Hmm, okay, you're right, that specific call (ExtraBudgetForInputs) doesn't need to block on waitForReady().
But are we sure that for any state the channel arbitrator can be in, we never call into any of the other aux sweeper functions like DeriveSweepAddr or ResolveContract which definitely require lnd interaction?

@yyforyongyu
Copy link
Member

During startup, when the line NewAnchorResolutions is hit in Start -> advanceState -> stateStep, we call NewLightningChannel, inside which calls multiple fetchers from tapd, not familiar with tapd but notice that,
https://github.com/lightninglabs/taproot-assets/blob/42055562ef09ee7e359c4ad470b3c67caa6a62ae/server.go#L762-L775

It seems all the fetchers are stateless, which is nice as the creation of the LightningChannel is non-blocking. Then we call chainMachine.NewAnchorResolutions, which creates the AnchorResolution, however, it seems WithResolutionBlob was never called on anchor resolvers, so anchor inputs have nil resolutionBlob?

Then we move the line relaunchResolvers -> FetchContractResolutions -> decodeTapRootAuxData, which reads all the ResolutionBlobs from the disk and attaches them to the resolutions.

By the time we call NewBudgetInputSet -> attachExtraBudget -> ExtraBudgetForInputs, what's happening in tapd is, the method ExtraBudgetForInputs is reading i.ResolutionBlob(), which is handled by lnd and already populated at this point.

Another place I found that needs the waitForReady is ResolveContract, which is called inside the resolvers, e.g., newOutgoingHtlcResolution inside NewLocalForceCloseSummary that's called in ForceClose, which could also block the startup as it's called inside stateStep.

So unfortunately we cannot assume the aux methods are non-blocking, instead, we need to make all the handling of pending states - either it's pending force close channels, or pending open channels, and others, async to make sure it won't be blocked.

@saubyk saubyk added this to the v0.18.4 milestone Dec 2, 2024
@dstadulis dstadulis moved this to 🏗 In progress in Taproot-Assets Project Board Dec 3, 2024
@dstadulis dstadulis moved this from 🏗 In progress to 👀 In review in Taproot-Assets Project Board Dec 5, 2024
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Taproot-Assets Project Board Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unintended code behaviour custom chans init Issues related to LND startup taproot
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants