Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVPN traffic not re-routed when breaking the active link #14355

Closed
1 task
sbrs3 opened this issue Sep 5, 2023 · 5 comments
Closed
1 task

EVPN traffic not re-routed when breaking the active link #14355

sbrs3 opened this issue Sep 5, 2023 · 5 comments
Labels
triage Needs further investigation

Comments

@sbrs3
Copy link

sbrs3 commented Sep 5, 2023

We are implementing L3 VPN instances with EVPN and VXLAN. eBGP sessions terminate on physical interfaces. VXLAN tunnels terminate on router loopbacks. Loopbacks are advertised through BGP.

See topology.jpg

In this scenario, a link break brings down EVPN traffic, even though there is a remaining path in the topology to reach the destination loopback. The destination loopback can be pinged, but ping through the VPN fails.

Breaking the link between site3-3 and site3-4 (10.21.33.1 is a loopback in the yellow VRF on the site3-3 router): VRF traffic stops working although the remote VTEP loopback is still reachable through the backup path.

root@site3-4:~# ip vrf exec yellow ping 10.21.33.1
PING 10.21.33.1 (10.21.33.1) 56(84) bytes of data.
From 10.21.34.1 icmp_seq=58 Destination Host Unreachable

root@site3-4:~# ping -I 10.0.3.4 10.0.3.3
PING 10.0.3.3 (10.0.3.3) from 10.0.3.4 : 56(84) bytes of data.
64 bytes from 10.0.3.3: icmp_seq=1 ttl=63 time=1.34 ms

VPN and GRD routes look good (see routes.txt).

It appears that FRR removes the neighbor entry for the remote loopback 10.0.3.3 when the link goes down, even though the loopback is still reachable via site3-5.

root@site3-4:~# ip neigh show
192.168.122.1 dev enp1s0 lladdr 52:54:00:05:6b:fa REACHABLE
10.0.3.3 dev br-red lladdr 02:77:bd:6a:03:a5 extern_learn  NOARP proto zebra
192.168.245.2 dev enp3s0 lladdr 52:54:00:d6:a6:96 REACHABLE
10.0.3.3 dev br-yellow lladdr 02:cf:ce:74:7b:73 extern_learn  NOARP proto zebra
192.168.234.1 dev enp2s0 lladdr 52:54:00:70:5a:18 REACHABLE

root@site3-4:~# ip neigh show
192.168.122.1 dev enp1s0 lladdr 52:54:00:05:6b:fa REACHABLE
192.168.245.2 dev enp3s0 lladdr 52:54:00:d6:a6:96 REACHABLE
10.0.3.3 dev br-yellow  INCOMPLETE

clear bgp *
brings back the VPN traffic, but it should not be needed.

  • [x ] Did you check if this is a duplicate issue?
  • Did you test it on the latest FRRouting/frr master branch?

Versions

  • FRR version: 8.5.2
  • OS version: Debian GNU/Linux 11
  • Linux 5.10.0-25-amd64
@sbrs3 sbrs3 added the triage Needs further investigation label Sep 5, 2023
@sbrs3
Copy link
Author

sbrs3 commented Sep 5, 2023

@sbrs3 sbrs3 changed the title EVPN/VXLAN resilience not working EVPN/VXLAN not re-routed when breaking the active link Sep 7, 2023
@sbrs3 sbrs3 changed the title EVPN/VXLAN not re-routed when breaking the active link EVPN traffic not re-routed when breaking the active link Sep 7, 2023
@sbrs3
Copy link
Author

sbrs3 commented Sep 13, 2023

I am wondering why FRR removes the Linux neighbor entry for the remote VTEP when the direct link goes down, even though there is still a second path in the topology to reach that remote VTEP (loopback; 10.0.3.3)?

This is what it looks like in the logs:

2023-09-05 13:14:03.479 [DEBG] zebra: [YK42S-VD2K1] Rx RTM_DELNEIGH family ipv4 IF br-yellow(11) vrf yellow(9) IP 10.0.3.3
2023-09-05 13:14:03.479 [DEBG] zebra: [NR6MZ-KY8YF] zebra neigh del if br-yellow/11 10.0.3.3

The same also happens in a topology with only two routers connected by two redundant links. When one of the two links goes down, the neighbor entry for the remote VTEP (loopback) is removed. Shouldn't the entry remain there as long as there is a path left to reach it?

Any comments?

@sbrs3
Copy link
Author

sbrs3 commented Oct 9, 2023

How can we proceed with this?

@chdxD1
Copy link

chdxD1 commented Oct 10, 2023

This is probably a duplicate of #12391, the MR got backported to 8.5 as well, can you try using 8.5.3 instead of 8.5.2 (or 9.0.1 which has the fix as well)?

@sbrs3
Copy link
Author

sbrs3 commented Oct 18, 2023

Re-tested with 8.5.3 and it works. Problem solved, apparently.

@sbrs3 sbrs3 closed this as completed Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs further investigation
Projects
None yet
Development

No branches or pull requests

2 participants