Skip to content

Commit

Permalink
fix(p2p): adjust backoff seconds to increase reconnect retries close …
Browse files Browse the repository at this point in the history
…to 24 hours (cometbft#4377)

close: cometbft#3519 

Adjust `reconnectBackOffBaseSeconds` to increase reconnect retries to up
1 day (~24 hours).

The new value can be validated here: https://go.dev/play/p/k8F5rS-i24p,
which will show that the total time is increased to almost 24 hours.

Initial reconnecting time: 2m8.493s
Total reconnecting time. : 23h55m56.249s

The `reconnectBackOffBaseSeconds` is increased by a bit over 10% (from
3.0 to 3.4 seconds) so this would not affect reconnection retries too
much.

#### PR checklist

- [ ] ~~Tests written/updated~~
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Daniel <[email protected]>
  • Loading branch information
3 people authored Nov 4, 2024
1 parent 1a68ad4 commit 81f2763
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 2 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
- `[p2p]` adjust backoff seconds to increase reconnect retries close to 24 hours
([\#3519](https://github.com/cometbft/cometbft/issues/3519))
6 changes: 4 additions & 2 deletions p2p/switch.go
Original file line number Diff line number Diff line change
Expand Up @@ -387,7 +387,8 @@ func (sw *Switch) stopAndRemovePeer(p Peer, reason any) {
}

// reconnectToPeer tries to reconnect to the addr, first repeatedly
// with a fixed interval, then with exponential backoff.
// with a fixed interval (approximately 2 minutes), then with
// exponential backoff (approximately close to 24 hours).
// If no success after all that, it stops trying, and leaves it
// to the PEX/Addrbook to find the peer with the addr again
// NOTE: this will keep trying even if the handshake or auth fails.
Expand All @@ -403,6 +404,7 @@ func (sw *Switch) reconnectToPeer(addr *na.NetAddr) {

start := time.Now()
sw.Logger.Info("Reconnecting to peer", "addr", addr)

for i := 0; i < reconnectAttempts; i++ {
if !sw.IsRunning() {
return
Expand All @@ -423,7 +425,7 @@ func (sw *Switch) reconnectToPeer(addr *na.NetAddr) {

sw.Logger.Error("Failed to reconnect to peer. Beginning exponential backoff",
"addr", addr, "elapsed", time.Since(start))
for i := 0; i < reconnectBackOffAttempts; i++ {
for i := 1; i <= reconnectBackOffAttempts; i++ {
if !sw.IsRunning() {
return
}
Expand Down

0 comments on commit 81f2763

Please sign in to comment.