-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BGP container crashes after BGP established with 500/1K IPv4 and IPv6 peers #14143
Comments
We need a full backtrace to see where it crashes, otherwise this information is useless. Can you test with a vanilla FRR (latest) and verify the crash? |
Tested with FRR 8.5.1 and it reproduces.
Here, we can see that the origin is FRR SNMP agentx integration code. Specifically we crash at line 146:
So, I it crashed because we pass fd > 1023 to FD_ISSET macro:
The maxfd is returned by snmp_select_info.
I see snmp lib internally uses large FD set data structure and provides snmp_select_info2:
So, maybe the fix would be to use netsnmp_large_fd_set and corresponding NETSNMP_LARGE_FD_* macros? |
in /etc/frr/daemons what do you have MAX_FDS set to? |
@donaldsharp I think we done use /etc/frr/daemons. FRR daemons are started by another program. BTW on SONiC system: UPD: inside BGP container:
|
FRR reads the incoming MAX_FDS and uses that. Please ensure that a value can be set in Sonic and it can be respected. In any event I believe this is a side issue to the actual problem. |
Why I did it Upgrading FRR 8.5.4 to include latest fixes. Work item tracking Microsoft ADO (number only): How I did it New patches that were added: Patch FRR Pull request Issue fixed 0024-lib-use-snmp-s-large-fd-sets-for-agentx.patch FRRouting/frr#13396 FRRouting/frr#14143 0025-bgp-community-memory-leak-fix.patch FRRouting/frr#15466 FRRouting/frr#15459 0026-bgp-fib-suppress-announce-fix.patch FRRouting/frr#15634 FRRouting/frr#15626 0027-lib-Do-not-convert-EVPN-prefixes-into-IPv4-IPv6-if-n.patch FRRouting/frr#15418 FRRouting/frr#14419 Removed patches: Patch Upstream FRR commit that is present in 8.5.4 0019-zebra-Abstract-dplane_ctx_route_init-to-init-route-w.patch FRRouting/frr@3f01977 0020-zebra-Fix-crash-when-dplane_fpm_nl-fails-to-process-.patch FRRouting/frr@fe5f624 0022-bgpd-Don-t-read-the-first-byte-of-ORF-header-if-we-a.patch FRRouting/frr@3515178 0023-bgpd-Make-sure-we-have-enough-data-to-read-two-bytes.patch FRRouting/frr@460ee93 0024-bgpd-Do-not-process-NLRIs-if-the-attribute-length-is.patch FRRouting/frr@f291f1e 0025-bgpd-Use-treat-as-withdraw-for-tunnel-encapsulation-.patch FRRouting/frr@8a4a88c 0026-zebra-Add-encap-type-when-building-packet-for-FPM.patch FRRouting/frr@f0f7b28 0028-bgpd-Check-mandatory-attributes-more-carefully-for-U.patch FRRouting/frr@21418d6 0029-bgpd-Handle-MP_REACH_NLRI-malformed-packets-with-ses.patch FRRouting/frr@30b5c2a 0030-bgpd-Treat-EOR-as-withdrawn-to-avoid-unwanted-handli.patch FRRouting/frr@01f232c 0031-bgpd-Ignore-handling-NLRIs-if-we-received-MP_UNREACH.patch FRRouting/frr@a0c4ec2 0032-zebra-Fix-fpm-multipath-encap-addition.patch FRRouting/frr@10a9a5f Realigned patches: Old Patch New patch 0005-Add-support-of-bgp-l3vni-evpn.patch 0005-Add-support-of-bgp-l3vni-evpn.patch 0021-zebra-remove-duplicated-nexthops-when-sending-fpm-msg.patch 0019-zebra-remove-duplicated-nexthops-when-sending-fpm-msg.patch 0027-zebra-Fix-non-notification-of-better-admin-won.patch 0020-zebra-Fix-non-notification-of-better-admin-won.patch Disable-ipv6-src-address-test-in-pceplib.patch 0021-Disable-ipv6-src-address-test-in-pceplib.patch cross-compile-changes.patch 0022-cross-compile-changes.patch 0033-zebra-The-dplane_fpm_nl-return-path-leaks-memory.patch 0023-zebra-The-dplane_fpm_nl-return-path-leaks-memory.patch How to verify it Running sonic-mgmt test suite.
Describe the bug
Establish 2k (1k IPv4, 1k IPv6) dynamic BGP sessions and wait for a minute. Observe bgpd crash. The crash is observed only when using SNMP module:
-M snmp
option in bgpd command line.The log:
To Reproduce
For example:
-M snmp
option with SNMP agentx.Expected behavior
Expect it to work, no crash.
Screenshots
Versions
Additional context
The issue happens on SONiC OS.
Core:
bgpd.1691068968.51.core.zip
running-config.txt
The text was updated successfully, but these errors were encountered: