BGP Configuration Made Simple with Cumulus Linux

BGP is without doubt the most scalable routing protocol, which made it a popular choice for large-scale deployments from service provider networks to enterprise WAN/VPN networks and even data centers. Its only significant drawback is the tedious configuration process (which almost reminds me of writing COBOL programs decades ago).

The Cumulus Networks routing team decided to change that and added numerous BGP configuration enhancements to Quagga (now FRR), the routing daemon used by Cumulus Linux. You might want to watch the Data Center Architectures video from Cumulus’ Network Field Day 9 presentation for the introduction to the topic before delving into the details.

Configure BGP neighbor on an interface

Here’s how you’d usually configure a BGP neighbor:

router bgp <as-number>
  neighbor <address> remote-as <as-number>

This is how you can do it with Quagga in Cumulus Linux:

router bgp <as-number>
  neighbor <interface> remote-as <as-number>
Do I have to mention how easy it is to create a template configuration for a ToR switch running BGP on four uplinks? Now try doing that with traditional BGP configuration.

After configuring an interface neighbor, the router tries to figure out the neighbor’s IP address. That’s easy to do for IPv6 (use link-local addresses), and for interfaces with /30 or /31 IPv4 subnets… but Cumulus Linux allows you to run BGP (IBGP or EBGP) over unnumbered interfaces.

You still need a numbered interface on every box – preferably a loopback interface – to access the box and to give the box an IPv[46] address to use in ICMP replies and other switch-generated traffic (syslog, SNMP…).

It was always possible to run IBGP over unnumbered interfaces: configure an IGP, run IBGP between loopback interfaces, and let IGP sort out how to exchange traffic between the BGP speakers. But how do you get EBGP to run over unnumbered interfaces without using IGP?

Cumulus routing team used one of the more obscure BGP-related RFCs to get the job done. RFC 5549 documents how you exchange IPv4 BGP prefixes with IPv6 next hops (for more details see the RIPE presentation by AMS-IX), and when you combine that with the ability to run EBGP across link-local IPv6 addresses, you get a one-line BGP neighbor configuration that is totally independent from the IPv4/IPv6 addressing in your network and thus easy to turn into a template and automate.

Final question: how do you get the LLA of your neighbor? Easy: listen to its ND or RA messages.

Notes and caveats:

  • You can use the interface-based BGP neighbors with devices from other vendors if you use /30 or /31 subnet mask on the interface (in that case Cumulus Linux uses standard BGP);
  • You don’t have to run IPv6 in your network to get this feature to work on unnumbered interfaces – just make sure you haven’t disabled IPv6 on the interfaces you want to use in BGP configuration;
  • You can use this feature for IPv6 BGP sessions with any third-party box that supports BGP over LLA (Cisco’s IPv6/security guru Eric Vyncke published an RFC describing this idea… but unfortunately that had no impact on messy configuration you have to use in Cisco IOS to get it to work);
  • If you want to use interface-based BGP neighbors over unnumbered IPv4 interfaces, the BGP neighbor has to support RFC 5549, and I’m not aware of any major vendor doing that;
  • Obviously this feature works only over point-to-point interface. You’d need something like dynamic BGP neighbors from Cisco IOS for multi-access (example: mGRE) scenarios.

Getting rid of AS numbers

The other major pain of BGP configuration syntax (particularly when you’re trying to generate standard configuration templates) is the requirement to specify neighbor AS number in the neighbor statement. Here’s how Cumulus routing team solved that problem:

router bgp <as-number>
  neighbor <interface> remote-as internal
  neighbor <interface> remote-as external

The remote-as internal part of the neighbor statement is obvious – use my own BGP AS number. One has to wonder why nobody else is using this syntax; maybe they’re too busy copying industry-standard CLI.

The remote-as external feature is where Cumulus engineers got slightly creative. BGP speakers advertise their AS number in the BGP OPEN message, and the usual behavior is to close the session if the AS number in the incoming OPEN message doesn’t match the number configured on the BGP neighbor statement. Instead of that, the new code they wrote for Quagga ignores the strict AS check, accepts the AS number advertised by the neighbor, and uses it later on in the same way as if it would have been configured in the router configuration.

Do I have to tell you not to use this feature with untrusted neighbors? It’s OK to trust an adjacent switch in your data center (or use Prescriptive Topology Manager if you want to be sure your data center is wired correctly), but definitely NOT OK to trust your customers or peering partners. See RFC 7454 for more details on what not to do with BGP.

Summary

Regardless of what I think about the whole concept of whitebox switching, I love creative solutions;) It’s refreshing to see how startups with no legacy codebase to protect solve annoyances that have been bothering us for decades. Great job!

Disclosure: Cumulus Networks was indirectly covering some of the costs of my attendance at the Network Field Day 9 event. More…

Latest blog posts in BGP in Data Center Fabrics series

14 comments:

  1. is the "here's how you do it in quagga" code section a misprint? looks the same as the "normal bgp" code section
    Replies
    1. nm. /me opens his eyes a little wider and notices "interface" vs "address"
  2. In regard to the "remote-as internal", Juniper has that. You just set the bgp group to "type internal" and it will use your local AS.

    Replies
    1. Ericsson (ex. Redback) also has "neighbor zzzzz internal" in the non-vpn contexts, which does not accept "remote-as" in the neighbor definition.
  3. COBOL. Now I am curious if someone for a laugh mapped a BGP config to a COBOL program. Identification, Data,Environment, Procedure Divisions with sections mapping to BGP sections etc. LOL
  4. Hello Ivan (or any one else willing to make a recommendation):
    What book(s) or other resource would you recommend to help learn BGP?

    Thanks.
    Replies
    1. This one was really good a decade ago: http://www.ciscopress.com/store/internet-routing-architectures-9781578702336

      Not sure what to recommend these days, but then the basics of BGP haven't changed that much.
  5. "accepts the AS number advertised by the neighbor"
    What if the other side has the same "external" configuration with no AS configured?
    Secondly, does this mean that quagga will only listen to open messages but never send them? Because what would happen if it sent an open message w/o ASN?
    Replies
    1. Local AS number is always configured, so the AS number a router has to send in the BGP OPEN message is known. The only change is that the router doesn't check the AS number in the incoming BGP OPEN message but accepts whatever is there as the remote AS.
    2. Ah yes, it makes perfect sense, thanks Ivan!
  6. Ivan
    Any comments on the number of ebgp routes a TOR white box should hold considering a medium-large data centre? Agree with vary will design..
    Replies
    1. Obvious it depends heavily on your design.

      Best case: one subnet per ToR switch.
      Worst case: one host route per VM.
  7. Are the keywords "external" or "internal" even necessary ?
  8. Intrigued by the documented simplicity, I wanted to find out how far this can be reproduced in Junos and what is going on at the IETF about learning remote peer AS numbers. There is an I-D https://datatracker.ietf.org/doc/draft-acee-idr-lldp-peer-discovery/ that uses LLDP. But no implementation in Junos yet. Long story short, I ended up using IPv6 RA to cook up a working prototype using on box automation and wrote a blog post about it: https://marcelwiget.wordpress.com/2018/06/21/bgp-over-unnumbered-interfaces-automated/
Add comment
Sidebar