Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unit test flake: neutrino panic when restarting rescan #6985

Closed
guggero opened this issue Oct 5, 2022 · 4 comments · Fixed by #7049
Closed

unit test flake: neutrino panic when restarting rescan #6985

guggero opened this issue Oct 5, 2022 · 4 comments · Fixed by #7049
Labels
bug Unintended code behaviour neutrino Lightweight neutrino backend-type test flake

Comments

@guggero
Copy link
Collaborator

guggero commented Oct 5, 2022

Found in unit test run:

--- FAIL: TestLightningWallet (7.74s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x7d47b7]

goroutine 6 [running]:
testing.tRunner.func1.2({0x9c34e0, 0xfdbfe0})
	/opt/hostedtoolcache/go/1.19.0/x64/src/testing/testing.go:1396 +0x24e
testing.tRunner.func1()
	/opt/hostedtoolcache/go/1.19.0/x64/src/testing/testing.go:1399 +0x39f
panic({0x9c34e0, 0xfdbfe0})
	/opt/hostedtoolcache/go/1.19.0/x64/src/runtime/panic.go:884 +0x212
github.com/lightninglabs/neutrino.(*Rescan).Update(0x0, {0xc00064a248, 0x1, 0xc00129a4e0?})
	/home/runner/work/go/pkg/mod/github.com/lightninglabs/[email protected]/rescan.go:1391 +0x57
github.com/btcsuite/btcwallet/chain.(*NeutrinoClient).NotifyReceived(0xc0000fa120, {0xc0012bc060?, 0x1, 0x1})
	/home/runner/work/go/pkg/mod/github.com/btcsuite/[email protected]/chain/neutrino.go:457 +0x159
github.com/btcsuite/btcwallet/wallet.(*Wallet).NewAddress(0xc0001ce480, 0x0, {0x0?, 0x9f16e0?})
	/home/runner/work/go/pkg/mod/github.com/btcsuite/[email protected]/wallet/wallet.go:3020 +0x17a
github.com/lightningnetwork/lnd/lnwallet/btcwallet.(*BtcWallet).NewAddress(0xc0005a8180, 0x50?, 0x0, {0xa88a60?, 0x7?})
	/home/runner/work/lnd/lnd/lnwallet/btcwallet/btcwallet.go:556 +0x10c
github.com/lightningnetwork/lnd/lnwallet/test.loadTestCredits(0xc000078720, 0xc000340380, 0x14, 0x0?)
	/home/runner/work/lnd/lnd/lnwallet/test/test_interface.go:275 +0x287
github.com/lightningnetwork/lnd/lnwallet/test.createTestWallet({0xc00031ff80?, 0xa87ad1?}, 0x5?, 0xfe8080, {0xc43560, 0xc0001ca180}, {0xc48820, 0xc0005a8180}, {0xc43b00, 0xc0011c41c8}, ...)
	/home/runner/work/lnd/lnd/lnwallet/test/test_interface.go:378 +0x4a9
github.com/lightningnetwork/lnd/lnwallet/test.runTests(_, _, {_, _}, _, {{0xc00002eba0, 0xe}, {0xa873bc, 0x2}, {0xa879c6, ...}, ...}, ...)
	/home/runner/work/lnd/lnd/lnwallet/test/test_interface.go:3398 +0x2633
github.com/lightningnetwork/lnd/lnwallet/test.TestLightningWallet(0xc000007ba0, {0xa89361, 0x8})
	/home/runner/work/lnd/lnd/lnwallet/test/test_interface.go:3064 +0x8ef
github.com/lightningnetwork/lnd/lnwallet/test/neutrino_test.TestLightningWallet(0x0?)
	/home/runner/work/lnd/lnd/lnwallet/test/neutrino/neutrino_test.go:12 +0x25
testing.tRunner(0xc000007ba0, 0xb9f9b8)
	/opt/hostedtoolcache/go/1.19.0/x64/src/testing/testing.go:1446 +0x10b
created by testing.(*T).Run
	/opt/hostedtoolcache/go/1.19.0/x64/src/testing/testing.go:1493 +0x35f
FAIL	github.com/lightningnetwork/lnd/lnwallet/test/neutrino	7.770s
FAIL

Probably happens because the mutex is unlocked here: https://github.com/btcsuite/btcwallet/blob/master/chain/neutrino.go#L351
And afterward the s.rescan is set to nil but s.scanning is still true.

@guggero guggero added bug Unintended code behaviour neutrino Lightweight neutrino backend-type test flake labels Oct 5, 2022
@saubyk saubyk added this to the v0.17.0 milestone Oct 12, 2022
@MStreet3
Copy link
Contributor

MStreet3 commented Oct 13, 2022

Should this be opened as an issue against btcwallet? I don't see a fix in lnd that would address the root cause. It seems that the state management in the chain package for a NeutrinoClient in btcwallet is to blame.

Problem

The NeturinoClient Rescan method locks, sets rescan to nil then unlocks itself before resetting the value.
As the client state is scanning = true yet rescan = nil when the lock is released there is a race between the executing Rescan method and any NotifyReceived calls that are waiting to obtain the lock. If a NotifyReceived call picks up the lock in this state, it will panic when it tries to run rescan.Update as scanning is true.

Solution 1

Make a change against btcwallet to address the state management issues:

  1. make the entire Rescan method a critical path (i.e., only unlock via a single defer call)
  2. make the entire NotifyReceived method a critical path
  3. remove the locking in the notificationHandler to prevent a deadlock caused by making Rescan completely critical. This locking isn't necessary as it attempts to read the state of the rescanErr channel. But if rescanErr is nil, then there is no Rescan object created yet so no errors to read. Also, reading from a nil channel will simply block but the read is wrapped in a select statement with other exits.

Risks

  • there are no unit tests on the neutrino client in btcwallet and even a targeted change like this is hard to know if it introduced any regressions

@guggero
Copy link
Collaborator Author

guggero commented Oct 14, 2022

Yes, we should probably open the same issue in Neutrino. I just thought it would get more visibility when opening it in lnd (and also because the actual test that fails is in lnd, I don't think this is covered by unit tests in the Neutrino repo).

@MStreet3
Copy link
Contributor

MStreet3 commented Oct 15, 2022

made changes to btcwallet here: btcsuite/btcwallet#820
see btcsuite/btcwallet#822

@guggero guggero linked a pull request Oct 17, 2022 that will close this issue
@MStreet3
Copy link
Contributor

the btcwallet change is now here: btcsuite/btcwallet#822

and the lnd temporary change that runs itests with the above branch from mstreet3 repo is here: #7049

@saubyk saubyk removed this from the v0.18.0 milestone Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unintended code behaviour neutrino Lightweight neutrino backend-type test flake
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants