Dissecting IBGP+EBGP Junos Configuration

Networking engineers familiar with Junos love to tell me how easy it is to configure and operate IBGP EVPN overlay on top of EBGP IP underlay. Krzysztof Szarkowicz was kind enough to send me the (probably) simplest possible configuration (here’s another one by Alexander Grigorenko)

To learn more about EVPN technology and its use in data center fabrics, watch the EVPN Technical Deep Dive webinar.
routing-options {
 router-id 192.168.0.1;
 autonomous-system 65000;  global AS used for BGP Overlay (EVPN) and for RT autogeneration
}
protocols {
 bgp {
  group IBGP-OVERLAY {
   type internal;  IBGP (local AS used from routing-options section) 
   local-address 192.168.0.1;
   family evpn {
    signaling;
   }
   neighbor 192.168.0.4 {
    description Spine-4-Loopback;
   }
   neighbor 192.168.0.5 {
    description Spine-5-Loopback;
   }
  }
  group EBGP-UNDERLAY {
   type external;                        EBGP
   local-as 65001 no-prepend-global-as;  local AS used for EBGP underlay
   neighbor 10.0.0.1 {                   Spine 4 physical interface
    peer-as 65004;
   }
   neighbor 10.0.0.3 {                   Spine 5 physical interface
    peer-as 65005;
   }
  }
 }
}

Most BGP implementations (including Junos) have a single BGP routing process with a single BGP AS number, so it’s hard to see how you could easily use IBGP and EBGP with the same neighbor (spine switch). Let’s dissect the configuration to see how it’s done:

  • The overlay group of BGP neighbors is easy to understand: we’re running IBGP with them (assuming their AS number matches our AS number). All leaf switches use the same AS number configuring in the routing-options section. Currently you cannot use different AS numbers for leaf switches with Junos if you want to use automatic route targets.
See Alexander’s example for more details on what needs to happen on the spine switches.
  • Underlay sessions use EBGP – the remote AS number is supposed to be different from local AS number. Somewhat hard to do when you want all switches to be in the same autonomous system (for EVPN reasons).
At this point you should probably ask yourself “why don’t they just use IBGP everywhere if they’re so keen on having IBGP for EVPN”. Yep, good question…

There are a few tricks you can use to make this work:

  • If you don’t run EVPN on the spine switches but use virtual routers as IBGP route reflectors, use the same AS number on all leaf switches, and disable the AS-path checks. If I got it right you’d need to use loops on leaf switches and advertise-peer-as on spine switches.
  • Alternatively, the leaf switches could pretend to be in a different AS number. You could use local-as to achieve that… but that would result in weird AS-paths containing both local AS number (65001) and global AS number (65000). Needless to say, you’d need loops configured on remote leaf switch to get the BGP path accepted (because the local AS is already in the AS-path).
  • Krzysztof used another trick: no-prepend-global-as removes the global AS number (65000) from the advertised AS-path, resulting in what seems to be a perfect EBGP underlay… until you have to troubleshoot it at 2AM on a Sunday morning (see also: troubleshooting models by Russ White).

Last question: is this capability unique to Junos? Of course not. You have to use local-as <asn> no-prepend replace-as on Nexus OS (and something very similar on Cisco IOS) to get exactly the same behavior.

Is the industry-standard CLI (read: Cisco IOS) equivalent of Junos configuration as confusing as some people claim? Of course not – use BGP neighbor templates. The only difference between Junos and Cisco IOS (or Nexus-OS) configuration is that in Junos you split neighbors into groups and define group parameters next to list of neighbors, whereas in Cisco IOS you define neighbor templates, and then apply them to individual neighbors.

What you should really ask yourself though is: Why the **** are we discussing this pile of **** - why don’t we use a simple design that everyone has a chance of understanding and that works reasonably well at scale we need… like IBGP-over-IGP? Glad you asked – I get called all sorts of names when saying that out loud, or pointing out that IGP is probably good enough for what you need. Maybe it's time to step back and ask "What problem are we trying to solve?"

Master EVPN and Data Center Fabrics

You can use these ipSpace.net webinars (all of them available with standard ipSpace.net subscription) to learn more about EVPN and data center fabrics:

Finally, if you’re looking for a guided and mentored tour with plenty of peer- and instructor support, check out the Building Next-Generation Data Center online course.

Latest blog posts in BGP in Data Center Fabrics series

17 comments:

  1. For me the Junos configuration is very intuitive. The main reason why we use EBGP in the underlay is because we want to be as cool as FANG.
    Replies
    1. Agree. To be as cool as FANG is really motivating.

      But I have more concrete example - lets imagine you've recently built your perfect shiny L3-only IP fabric, and you used EBGP as the only routing protocol because of the number of factors:
      - size of the fabric
      - expected number of client prefixes
      - routing on the host (FRR guys would be happy)
      - fashion, coolnes, etc
      No problems with that? Does anybody in 2018 could say that you introducing unnecessary complexity and should use OSPF+IBGP?

      But now you have some very important project that need L2-connectivity via your fabric. It needs this NOW, because it's already behind deadline.
      What could you do? Argue that this is not right, that your shiny fabric cannot do that?

      And now think about complexity of introducing EVPN in your already running fabric for this two design choises (EBGP+EBGP or IBGP+EBGP).
    2. The answer to that is simple. We don't allow applications with layer 2 requirement. If an application has a strict layer 2 requirement then it's probably the wrong application.
    3. @Alex: Your scenario might make sense if you use virtual router as IBGP RR, but not if you run IBGP with the spines on top of EBGP with the spines (which is what every single Junos example thrown at me does).

      Please don't try to tell me it's less risky to add convoluted IBGP-over-EBGP setup than to enable another address family on EBGP. I had higher opinion of Junos software quality.
    4. @Anonymous: Try to tell this your CxO after they spend so much money on your shiny new fabric

      @Ivan: They tend to grossly oversimplify things in whitepapers for sake of simplicity/volume/number of devices/etc.
      Of course there is no reason to use spines as the RR - the most crucial point of IBGP+EBGP design is to separate overlay from underlay (to not bother spines with customer routes).
      RRs SHOULD (if not MUST) be placed on border leafs or on separate virtual routers.
      Spines could be as dumb as possible in this design.

      For introducing EVPN in your fabric with this design you just need to configure new separate BGP session to RR, not affecting production traffic, not affecting spines and all other leafs that not need EVPN.
      If you need to add another address family to existing EBGP session - devices at least need to tear down and reestablish it after negotiating new capabilities.
  2. Just to confirm, this is all because Junos doesn’t allow you to run EBGP underlay with EBGP overlay (because automatic route targets break), or did I miss something?
    Replies
    1. I haven't figured out what the actual Junos limitation is yet, but there must be a reason they're so adamant about using IBGP for EVPN.
    2. This is exactly the cornerstone of our discussion.
      Let me repeat once again - JunOS support ALL of the aforementioned options:
      - iBGP overlay + IGP underlay
      - eBGP on interface addresses (2 AFI/SAFI for each session)
      - iBGP overlay + eBGP underlay
      - eBGP overlay (between loopbacks) + eBGP underlay

      The iBGP overlay + eBGP underlay design is recommended, but NOT required.

      They promote this design for number of reasons:
      - this is most scalable solution
      - this design provides clear and logical separation of overlay/underlay (at least it looks so in configuration)
      - you can use any device you like for the spine role - it is not participating in EVPN at all
      - they can simply do this complex BGP stuff because this is JunOS :)
  3. I'm sorry, but it seems that Krzysztof didn't bother to check his example config in the lab.
    no-prepend-global-as knob is not necessary here. Here's why:
    "1. If the route is received from an internal BGP (IBGP) peer, the AS path includes the local AS number prepended before the global AS number.
    2. The local AS number is used instead of the global AS number if the route is an external route, such as a static route or an interior gateway protocol (IGP) route that is imported into BGP."
    https://www.juniper.net/documentation/en_US/junos/topics/concept/bgp-local-as-introduction.html
    What routes Leaf switch need to advertise over EBGP session? Only local routes - loopback and direct attached interfaces.
    Therefore see item 2 above.
    So, half of the "complexity" (=strange words, that average enterprise engineer is not understand) of this config is already gone.

    As regards the complexity of the internal implementation of all this stuff - why do you think that this is more complex than RT rewrite on every hop for every EVPN route in EBGP-only auto-RT case? And why does this internal BGP process complexity worries average network engineer?

    Is JunOS cannot do other design options? Of course it can.
    "Currently you cannot use different AS numbers for leaf switches with Junos if you want to use automatic route targets." - this is not exactly true. If you don't configure any AS number under routing-options stanza, and use local-as for both BGP groups - auto-RT works, just like magic (without any RT rewriting, it just doesn't include AS number in autogenerated RT).
    Trick question - what happens under the curtains of BGP process in this case? I think only JunOS developers could say. Does it matter to me - not really, if it works fine.

    Every design has its pros and cons. And I can't understand why you keep telling that EBGP-only cons are acceptable, but IBGP cons are not. (Because FRR can't do IBGP design yet? - I hope this is not the primary reason :) )

    Should average enterprise engineer bother about all that? Of course not - they simply use OSPF+IBGP, and this is perfect choice.
    But if you should use EBGP in underlay (for any reason, see my reply above for more details), then you definitely should consider both options and weigh pros and cons of each one.
    Replies
    1. Do you want to say that Ivan is wrong with his assumptions? Well that would be a majesty insult.
    2. Everyone frequenting this blog has to deal with anonymous douchebags lately. Having a technical discussion with people who have the guts to attach their names to their comments is refreshing compared to that drivel.
    3. @Alex:

      "And I can't understand why you keep telling that EBGP-only cons are acceptable, but IBGP cons are not." << Please go and reread what I'm saying. IBGP+IGP is perfectly fine. EBGP-only is fine. IBGP-over-EBGP is stretching things too far (in my opinion) and there's absolutely no technical reason to do it (apart from potential vendor-specific implementation challenges).

      "Should average enterprise engineer bother about all that? Of course not - they simply use OSPF+IBGP, and this is perfect choice." << then maybe you should consider that the top 0.1% don't need (and probably don't read) my blog posts.
    4. "and there's absolutely no technical reason to do it" << of course there is not, if you plan to use spines as the RRs. But this is just bad implementation of good design.
      If you use separate RRs and don't bother spines with customer routes - then you might see the target point of all that complexity.
  4. We have an evpn-vxlan fabric with auto RT, ebgp underlay/ibgp overlay and it is working just fine. We chose ebgp for the underlay due to concerns with scaling to beyond 100 racks with 2 leafs per rack.

    Juniper Professional Services designed and implemented it. Juniper PS designed it as a unique ASN per leaf and unique ASN per spine as well. Working just fine without using the 'no-prepend-global-as' knob.


    leaf:

    /* route for out of band mgmt out em0 interface */
    set routing-options static route 0.0.0.0/0 next-hop 10.225.50.1
    set routing-options static route 0.0.0.0/0 no-readvertise
    set routing-options router-id 10.228.0.73
    set routing-options autonomous-system 65200
    set routing-options forwarding-table export PL-LOAD-BALANCE
    set protocols bgp log-updown
    set protocols bgp graceful-restart
    set protocols bgp group UNDERLAY-IPFABRIC type external
    set protocols bgp group UNDERLAY-IPFABRIC mtu-discovery
    set protocols bgp group UNDERLAY-IPFABRIC import PL-IPFABRIC-IN
    set protocols bgp group UNDERLAY-IPFABRIC export PL-IPFABRIC-OUT
    set protocols bgp group UNDERLAY-IPFABRIC local-as 65305
    set protocols bgp group UNDERLAY-IPFABRIC bfd-liveness-detection minimum-interval 350
    set protocols bgp group UNDERLAY-IPFABRIC bfd-liveness-detection multiplier 3
    set protocols bgp group UNDERLAY-IPFABRIC bfd-liveness-detection session-mode single-hop
    set protocols bgp group UNDERLAY-IPFABRIC multipath multiple-as
    set protocols bgp group UNDERLAY-IPFABRIC neighbor 10.228.2.8 peer-as 65201 # <-- spine1
    set protocols bgp group UNDERLAY-IPFABRIC neighbor 10.228.3.8 peer-as 65202 # <-- spine2
    set protocols bgp group OVERLAY-EVPN type internal
    set protocols bgp group OVERLAY-EVPN local-address 10.228.0.73
    set protocols bgp group OVERLAY-EVPN family evpn signaling
    set protocols bgp group OVERLAY-EVPN local-as 65200
    set protocols bgp group OVERLAY-EVPN multipath
    set protocols bgp group OVERLAY-EVPN neighbor 10.228.0.65 # <-- spine1 lo0.0 acting as a RR
    set protocols bgp group OVERLAY-EVPN neighbor 10.228.0.66 # <-- spine2 lo0.0 acting as a RR
    set policy-options community COMM_ESI members target:65200:9999
    set switch-options vtep-source-interface lo0.0
    set switch-options route-distinguisher 10.228.0.73:1
    set switch-options vrf-import PL-EVPN-IN
    set switch-options vrf-target target:65200:9999
    set switch-options vrf-target auto


    set policy-options policy-statement PL-EVPN-IN term COMMON-ESI from community COMM_ESI
    set policy-options policy-statement PL-EVPN-IN term COMMON-ESI then accept
    set policy-options policy-statement PL-IPFABRIC-IN term LOOPBACKS from route-filter 10.228.0.0/24 prefix-length-range /32-/32
    set policy-options policy-statement PL-IPFABRIC-IN term LOOPBACKS then accept
    set policy-options policy-statement PL-IPFABRIC-IN term REJECT then reject
    set policy-options policy-statement PL-IPFABRIC-OUT term LOOPBACKS from route-filter 10.228.0.0/24 prefix-length-range /32-/32
    set policy-options policy-statement PL-IPFABRIC-OUT term LOOPBACKS then accept
    set policy-options policy-statement PL-IPFABRIC-OUT term REJECT then reject
  5. Who cares about Juniper (Junos) with their 1% market share? I don't.
  6. said Barnes & Noble about Amazon, said Blockbuster about Netflix, etc. In the network space, in 2018, this is not a very enlightened comment.
    Replies
    1. Trolling comments (like the you replied to) are rarely enlightened ;)
Add comment
Sidebar