Dissecting IBGP+EBGP Junos Configuration
Networking engineers familiar with Junos love to tell me how easy it is to configure and operate IBGP EVPN overlay on top of EBGP IP underlay. Krzysztof Szarkowicz was kind enough to send me the (probably) simplest possible configuration (here’s another one by Alexander Grigorenko)
routing-options {
router-id 192.168.0.1;
autonomous-system 65000; global AS used for BGP Overlay (EVPN) and for RT autogeneration
}
protocols {
bgp {
group IBGP-OVERLAY {
type internal; IBGP (local AS used from routing-options section)
local-address 192.168.0.1;
family evpn {
signaling;
}
neighbor 192.168.0.4 {
description Spine-4-Loopback;
}
neighbor 192.168.0.5 {
description Spine-5-Loopback;
}
}
group EBGP-UNDERLAY {
type external; EBGP
local-as 65001 no-prepend-global-as; local AS used for EBGP underlay
neighbor 10.0.0.1 { Spine 4 physical interface
peer-as 65004;
}
neighbor 10.0.0.3 { Spine 5 physical interface
peer-as 65005;
}
}
}
}
Most BGP implementations (including Junos) have a single BGP routing process with a single BGP AS number, so it’s hard to see how you could easily use IBGP and EBGP with the same neighbor (spine switch). Let’s dissect the configuration to see how it’s done:
- The overlay group of BGP neighbors is easy to understand: we’re running IBGP with them (assuming their AS number matches our AS number). All leaf switches use the same AS number configuring in the routing-options section. Currently you cannot use different AS numbers for leaf switches with Junos if you want to use automatic route targets.
- Underlay sessions use EBGP – the remote AS number is supposed to be different from local AS number. Somewhat hard to do when you want all switches to be in the same autonomous system (for EVPN reasons).
There are a few tricks you can use to make this work:
- If you don’t run EVPN on the spine switches but use virtual routers as IBGP route reflectors, use the same AS number on all leaf switches, and disable the AS-path checks. If I got it right you’d need to use loops on leaf switches and advertise-peer-as on spine switches.
- Alternatively, the leaf switches could pretend to be in a different AS number. You could use local-as to achieve that… but that would result in weird AS-paths containing both local AS number (65001) and global AS number (65000). Needless to say, you’d need loops configured on remote leaf switch to get the BGP path accepted (because the local AS is already in the AS-path).
- Krzysztof used another trick: no-prepend-global-as removes the global AS number (65000) from the advertised AS-path, resulting in what seems to be a perfect EBGP underlay… until you have to troubleshoot it at 2AM on a Sunday morning (see also: troubleshooting models by Russ White).
Last question: is this capability unique to Junos? Of course not. You have to use local-as <asn> no-prepend replace-as on Nexus OS (and something very similar on Cisco IOS) to get exactly the same behavior.
Is the industry-standard CLI (read: Cisco IOS) equivalent of Junos configuration as confusing as some people claim? Of course not – use BGP neighbor templates. The only difference between Junos and Cisco IOS (or Nexus-OS) configuration is that in Junos you split neighbors into groups and define group parameters next to list of neighbors, whereas in Cisco IOS you define neighbor templates, and then apply them to individual neighbors.
What you should really ask yourself though is: Why the **** are we discussing this pile of **** - why don’t we use a simple design that everyone has a chance of understanding and that works reasonably well at scale we need… like IBGP-over-IGP? Glad you asked – I get called all sorts of names when saying that out loud, or pointing out that IGP is probably good enough for what you need. Maybe it's time to step back and ask "What problem are we trying to solve?"
Master EVPN and Data Center Fabrics
You can use these ipSpace.net webinars (all of them available with standard ipSpace.net subscription) to learn more about EVPN and data center fabrics:
- EVPN Technical Deep Dive will tell you all you need to know about EVPN technology;
- Leaf-and-Spine Fabric Architectures describes typical fabric designs, including using EVPN to build mixed L2+L3 fabrics;
- Data Centers for Networking Engineers and Data Center Fabrics should be the first steps in your journey if you know nothing about data center networking;
Finally, if you’re looking for a guided and mentored tour with plenty of peer- and instructor support, check out the Building Next-Generation Data Center online course.
But I have more concrete example - lets imagine you've recently built your perfect shiny L3-only IP fabric, and you used EBGP as the only routing protocol because of the number of factors:
- size of the fabric
- expected number of client prefixes
- routing on the host (FRR guys would be happy)
- fashion, coolnes, etc
No problems with that? Does anybody in 2018 could say that you introducing unnecessary complexity and should use OSPF+IBGP?
But now you have some very important project that need L2-connectivity via your fabric. It needs this NOW, because it's already behind deadline.
What could you do? Argue that this is not right, that your shiny fabric cannot do that?
And now think about complexity of introducing EVPN in your already running fabric for this two design choises (EBGP+EBGP or IBGP+EBGP).
Please don't try to tell me it's less risky to add convoluted IBGP-over-EBGP setup than to enable another address family on EBGP. I had higher opinion of Junos software quality.
@Ivan: They tend to grossly oversimplify things in whitepapers for sake of simplicity/volume/number of devices/etc.
Of course there is no reason to use spines as the RR - the most crucial point of IBGP+EBGP design is to separate overlay from underlay (to not bother spines with customer routes).
RRs SHOULD (if not MUST) be placed on border leafs or on separate virtual routers.
Spines could be as dumb as possible in this design.
For introducing EVPN in your fabric with this design you just need to configure new separate BGP session to RR, not affecting production traffic, not affecting spines and all other leafs that not need EVPN.
If you need to add another address family to existing EBGP session - devices at least need to tear down and reestablish it after negotiating new capabilities.
Let me repeat once again - JunOS support ALL of the aforementioned options:
- iBGP overlay + IGP underlay
- eBGP on interface addresses (2 AFI/SAFI for each session)
- iBGP overlay + eBGP underlay
- eBGP overlay (between loopbacks) + eBGP underlay
The iBGP overlay + eBGP underlay design is recommended, but NOT required.
They promote this design for number of reasons:
- this is most scalable solution
- this design provides clear and logical separation of overlay/underlay (at least it looks so in configuration)
- you can use any device you like for the spine role - it is not participating in EVPN at all
- they can simply do this complex BGP stuff because this is JunOS :)
no-prepend-global-as knob is not necessary here. Here's why:
"1. If the route is received from an internal BGP (IBGP) peer, the AS path includes the local AS number prepended before the global AS number.
2. The local AS number is used instead of the global AS number if the route is an external route, such as a static route or an interior gateway protocol (IGP) route that is imported into BGP."
https://www.juniper.net/documentation/en_US/junos/topics/concept/bgp-local-as-introduction.html
What routes Leaf switch need to advertise over EBGP session? Only local routes - loopback and direct attached interfaces.
Therefore see item 2 above.
So, half of the "complexity" (=strange words, that average enterprise engineer is not understand) of this config is already gone.
As regards the complexity of the internal implementation of all this stuff - why do you think that this is more complex than RT rewrite on every hop for every EVPN route in EBGP-only auto-RT case? And why does this internal BGP process complexity worries average network engineer?
Is JunOS cannot do other design options? Of course it can.
"Currently you cannot use different AS numbers for leaf switches with Junos if you want to use automatic route targets." - this is not exactly true. If you don't configure any AS number under routing-options stanza, and use local-as for both BGP groups - auto-RT works, just like magic (without any RT rewriting, it just doesn't include AS number in autogenerated RT).
Trick question - what happens under the curtains of BGP process in this case? I think only JunOS developers could say. Does it matter to me - not really, if it works fine.
Every design has its pros and cons. And I can't understand why you keep telling that EBGP-only cons are acceptable, but IBGP cons are not. (Because FRR can't do IBGP design yet? - I hope this is not the primary reason :) )
Should average enterprise engineer bother about all that? Of course not - they simply use OSPF+IBGP, and this is perfect choice.
But if you should use EBGP in underlay (for any reason, see my reply above for more details), then you definitely should consider both options and weigh pros and cons of each one.
"And I can't understand why you keep telling that EBGP-only cons are acceptable, but IBGP cons are not." << Please go and reread what I'm saying. IBGP+IGP is perfectly fine. EBGP-only is fine. IBGP-over-EBGP is stretching things too far (in my opinion) and there's absolutely no technical reason to do it (apart from potential vendor-specific implementation challenges).
"Should average enterprise engineer bother about all that? Of course not - they simply use OSPF+IBGP, and this is perfect choice." << then maybe you should consider that the top 0.1% don't need (and probably don't read) my blog posts.
If you use separate RRs and don't bother spines with customer routes - then you might see the target point of all that complexity.
Juniper Professional Services designed and implemented it. Juniper PS designed it as a unique ASN per leaf and unique ASN per spine as well. Working just fine without using the 'no-prepend-global-as' knob.
leaf:
/* route for out of band mgmt out em0 interface */
set routing-options static route 0.0.0.0/0 next-hop 10.225.50.1
set routing-options static route 0.0.0.0/0 no-readvertise
set routing-options router-id 10.228.0.73
set routing-options autonomous-system 65200
set routing-options forwarding-table export PL-LOAD-BALANCE
set protocols bgp log-updown
set protocols bgp graceful-restart
set protocols bgp group UNDERLAY-IPFABRIC type external
set protocols bgp group UNDERLAY-IPFABRIC mtu-discovery
set protocols bgp group UNDERLAY-IPFABRIC import PL-IPFABRIC-IN
set protocols bgp group UNDERLAY-IPFABRIC export PL-IPFABRIC-OUT
set protocols bgp group UNDERLAY-IPFABRIC local-as 65305
set protocols bgp group UNDERLAY-IPFABRIC bfd-liveness-detection minimum-interval 350
set protocols bgp group UNDERLAY-IPFABRIC bfd-liveness-detection multiplier 3
set protocols bgp group UNDERLAY-IPFABRIC bfd-liveness-detection session-mode single-hop
set protocols bgp group UNDERLAY-IPFABRIC multipath multiple-as
set protocols bgp group UNDERLAY-IPFABRIC neighbor 10.228.2.8 peer-as 65201 # <-- spine1
set protocols bgp group UNDERLAY-IPFABRIC neighbor 10.228.3.8 peer-as 65202 # <-- spine2
set protocols bgp group OVERLAY-EVPN type internal
set protocols bgp group OVERLAY-EVPN local-address 10.228.0.73
set protocols bgp group OVERLAY-EVPN family evpn signaling
set protocols bgp group OVERLAY-EVPN local-as 65200
set protocols bgp group OVERLAY-EVPN multipath
set protocols bgp group OVERLAY-EVPN neighbor 10.228.0.65 # <-- spine1 lo0.0 acting as a RR
set protocols bgp group OVERLAY-EVPN neighbor 10.228.0.66 # <-- spine2 lo0.0 acting as a RR
set policy-options community COMM_ESI members target:65200:9999
set switch-options vtep-source-interface lo0.0
set switch-options route-distinguisher 10.228.0.73:1
set switch-options vrf-import PL-EVPN-IN
set switch-options vrf-target target:65200:9999
set switch-options vrf-target auto
set policy-options policy-statement PL-EVPN-IN term COMMON-ESI from community COMM_ESI
set policy-options policy-statement PL-EVPN-IN term COMMON-ESI then accept
set policy-options policy-statement PL-IPFABRIC-IN term LOOPBACKS from route-filter 10.228.0.0/24 prefix-length-range /32-/32
set policy-options policy-statement PL-IPFABRIC-IN term LOOPBACKS then accept
set policy-options policy-statement PL-IPFABRIC-IN term REJECT then reject
set policy-options policy-statement PL-IPFABRIC-OUT term LOOPBACKS from route-filter 10.228.0.0/24 prefix-length-range /32-/32
set policy-options policy-statement PL-IPFABRIC-OUT term LOOPBACKS then accept
set policy-options policy-statement PL-IPFABRIC-OUT term REJECT then reject