BGP: the Tragedy of the Commons

Every now and then someone looks at a few recent BGP incidents (from fat fingers to more dubious ones) and says “we need a better BGP”.

It’s like being unable to cope with your kids or your team members because you don’t have the guts to tell them NO and trying to solve the problem by implementing new procedures and rules.

Like anything designed on a few napkins BGP has its limit. They’re well known, and most of them have to do with trusting your neighbors instead of checking what they tell you.

The solutions to the problem are pretty simple and have been known for decades (BCP38 was published in May 2000). In a nutshell you have to:

  • Build a global repository of who owns what address space;
  • Document who connects to whom and what their peering policies are;
  • Filter the updates received from your customers and peers based on the information from those repositories;
  • Filter the traffic from addresses that are obviously spoofed.

We have most of the tools we need to get the job done; you’ll find them described in Best Current Practice (BCP) 194. It’s also not impossible to get the job done from the operational perspective. NTT has been doing it for quite a while; Job Snijders described their approach to practical BGP filtering in a NANOG67 presentation.

Unfortunately you’ll always find ISPs (including some so-called Tier-1 providers) who couldn’t care less about fixing things and making global Internet a better place, because implementing those rules might impact their sloppy customers, and it’s always easier to give in to your customer’s (or your kid’s) screaming instead of telling them “you can’t have the candy because you haven’t followed the rules”

The “only” problem of getting things done is that like in any dysfunctional family the kids (= customers) could go shopping around for someone more permissive, and they’ll always find another ISP with lower prices, more relaxed rules, and connectivity to a dysfunctional transit provider.

Even worse than individual sloppy ISPs – there are Internet Exchange Points running route servers with no filters. Job Snijders got so sick-and-tired of them that he added a public column-of-shame to his IXP overview spreadsheet. Not that it would help much; Geoff Huston has been producing deaggregation and excessive BGP updates reports for years with absolutely no visible effect.

Being good engineers who hate confrontations, we’re trying to sneak our way around those problems with various cryptographic tools (like RPKI) instead of fixing the source of the problem: chaotic (or non-existent) operational practices of some major players.

Unfortunately, you can never solve people- or process problems with new technology, you can just make them more convoluted and harder to troubleshoot. What we’d really need to have are driving licenses for ISPs, and some of them should be banned for good due to repetitive drunk driving. Alas, I don’t see that happening in my lifetime.

For more details, watch the Network Security Fallacies part of the How Networks Really Work webinar, and the Internet Routing Security webinar.

Latest blog posts in BGP in Data Center Fabrics series

2 comments:

  1. "Alas, I don’t see that happening in my lifetime." - where does this negativity come from? How about ...

    "Route Bazaar: Automatic Interdomain Contract Negotiation"
    http://www.h2020-endeavour.eu/sites/www.h2020-endeavour.eu/files/u57/Route_Bazaar_HotOS_2015.pdf
  2. BGP at the edge, BGP at the core [ not at the BGP free MPLS core ] , BGP at the Data Centers,
    BGP at the edge of ISP [ transit , settlement free , peer , lateral or as a customer ] ,

    Policies at the edge requires control on sourcing routes . Can i reach the source via this ASN , filter all the routes except the routes that are peer originated and their customers if there are.
    Policies restrict BOGON AS , Bogon/Martian prefixes , Deprecated prefixes ,

    Implenting BGP policies requires MOP( Method of Procedure ) to be validated atleast by three peers [ two internal ] and one [ peering neighbor ] incase of Peering policies for inbound and outbound.

    We have standard filter at the edge of any ISP filtering as to restrict and even RTBH [ Remotely triggered black hole ] communities in the standard ISP core filtering.

    BGP Graceful shut equivalent of ISIS Overload and OSPF max metric router LSA are all implemented in almost all the vendors to make sure BGP reroutes traffic [ match community and setting LP as 0 ) before withdrawing it to avoid the BGP Blackhole for the time it calculates the next best path and to drain traffic gracefully.

    BGP Policies are nowadays mostly communities that we use for each specific event , there is a community and we are also having support for 32 bit ie 4 byte bgp community equivalent to 32 bit ASN number.

    BGP Policies has to be peer reviewed and has to be implemented via a script using scripts to configure a device rather than human addition to prevent and script has to just copy paste the config on the device rather than and then various checkpoints has to be implemented such as ADj Rib In and ADj RIB Out has to be validated after each addition, modification and removal.

    BGP ORF can be used to update policies which is part of Route Refresh message,
Add comment
Sidebar