Disable Flapping BGP Neighbors

Friday, August 28, 2009 06:35 +0200… updated on Tuesday, December 29, 2020 09:51 UTC

Disable Flapping BGP Neighbors

It looks like we’re bound to experience a widespread BGP failure once every few months. They all follow the same pattern:

A “somewhat” undertested BGP implementation starts advertising paths with “unexpected” set of attributes.
A specific downstream BGP implementation (and it could be a different implementation every time) a few hops down the road hiccups and sends a BGP notification message to its upstream neighbor.
BGP session must be reset following a notification message; the routes advertised over it are lost and withdrawn, causing widespread ripples across the Internet.
The offending session is reestablished seconds later and the same set of routes is sent again, causing the same failure and a session reset. If the session stays up long enough, some of the newly received routes might get propagated and will flap again when the session is reset.
The cyclical behavior continues until a manual intervention.

I don’t understand why network OS vendors allow the cyclical BGP session behavior (as it’s pretty obvious things will not get better by themselves once the same session is reset a few times), but there’s at least something we can do on Cisco IOS with Embedded Event Manager: shut down the offending neighbor after seeing the BGP-3-NOTIFICATION syslog message often enough in a short period of time.

Update 2020-12-29: Many modern network operating systems provide similar functionality; worst case, you can deploy a Python script on most of them.

The following EEM applet is triggered when the BGP-3-NOTIFICATION syslog message occurs more often than three times in the last 60 seconds. It extracts the BGP neighbor ID from the syslog message, fetches local AS number from an SNMP variable and shuts down the offending neighbor using neighbor shutdown BGP router configuration command.

EEM 3.0 applet source code

event manager applet BGPNotification
 event syslog occurs 3 pattern "BGP-3-NOTIFICATION" period 60
 action 100 regexp "neighbor\s+([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)"   →
   "$_syslog_msg" match id
 action 200 if $_regexp_result eq 1
 action 300  info type snmp oid bgp.2.0 get-type exact
 action 400  cli command "enable"
 action 410  cli command "configure terminal"
 action 420  cli command "router bgp $_info_snmp_value"
 action 430  cli command "neighbor $id shutdown"
 action 500  syslog msg "Shut down BGP neighbor $id"
 action 510  info type routername
 action 520  mail server $_mail_smtp to $_mail_rcpt →
   from "$_info_routername@$_mail_domain" →
   subject "ALERT: BGP neighbor $id shutdown due to excessive notifications" →
   body "\n$_syslog_msg"
 action 999 end

Notes

The EEM applet does not track session failures per neighbor. It might hit a random BGP neighbor that just happens to have a bad-hair microsecond while another neighbor is flapping;
There is no recovery logic – once the neighbor is shut down you have to reenable it manually;
You better have pretty good monitoring in place to be alerted when this hammer-of-Thor strikes.

8 comments:

Guest 28 August 2009 17:02

May be bgp dampening can help?

shivlu jain 29 August 2009 10:54

When the neighbor is shut down after matching the criteria, is there any way to unshut through programming after some time.

regards
shivlu jain

Ivan Pepelnjak 29 August 2009 17:24

Not directly. EBGP session loss does not count as a BGP route flap. The BGP flap dampening would only kick in in the downstream routers.

Ivan Pepelnjak 29 August 2009 17:25

Sure. Just some more programming ;)

Gość 03 September 2009 08:44

cyclical BGP session behavior is for one simple reason - if you are working after hours (usually at night) and you screw something you don't need to call upstream technician every time you change something.

Ivan Pepelnjak 03 September 2009 14:41

Sounds reasonable, but why does it have to flap every 10 seconds for eternity? Built-in exponential backoff with a cap would make a lot of sense.

sami 28 October 2009 16:38

i would like to thanks every one who acting int this website ........

thanks

Ivan Pepelnjak 28 October 2009 16:51

You're welcome ;) It's nice to receive feedback like this 8-)

Recent posts in the same categories

BGP

8 comments: