BGP in EVPN-Based Data Center Fabrics
EVPN is one of the major reasons we’re seeing BGP used in small and mid-sized data center fabrics. In theory, EVPN is just a BGP address family and shouldn’t impact your BGP design. However, suboptimal implementations might invalidate that assumption.
I've described a few EVPN-related BGP gotchas in BGP in EVPN-Based Data Center Fabrics, a section of Using BGP in Data Center Leaf-and-Spine Fabrics article.
Alex raised several valid points in his comments to this blog post. While they don’t fundamentally change my view on the subject, they do warrant a more nuanced explanation.
Can you please explain your logic behind this statement?
What's the difference between "underlay IGP + overlay iBGP" and "underlay eBGP + overlay iBGP" cases?
In latter case you just need to use two different AS numbers on leaf switch - one unique eBGP AS per each ToR and one single AS for iBGP overlay.
Or, in other words, you need two independent BGP sessions on ToR switch - one for underlay and another for overlay.
Is it sane? We probably disagree on this aspect.
Is it supported? I would love to see how many vendors officially support this (apart from the one that I've seen using this design), but I won't waste my time investigating it.
Is it easy to understand? Yet again, I would love to see how you explain this to the guy that has to do troubleshooting @ 2AM on Sunday morning.
Will someone go and design a customer network this way? Sure.
Will they be upset when the customer reads my blog post? Probably.
In JunOS it is simple and logical - clear separation of underlay and overlay:
alex@vQFX1# show protocols bgp
group underlay {
type external;
export direct;
local-as 65011;
family inet {
unicast;
}
multipath multiple-as;
neighbor 192.168.0.0 { ### SPINE1
peer-as 65001;
}
neighbor 192.168.0.4 { ### SPINE2
peer-as 65002;
}
}
group overlay {
type internal;
local-as 65000;
local-address 11.11.11.11; ### loopback
family evpn {
signaling;
}
multipath;
neighbor 2.2.2.2; ### EVPN RR1
neighbor 1.1.1.1; ### EVPN RR2
}
I don't think it's more complex and confusing than no-nexthop-change option in case of eBGP only design.
I hope you'll find some time to look at Juniper design options for EVPN fabrics, they have some pretty good stuff there.
For example, this book https://www.juniper.net/us/en/training/jnbooks/day-one/data-center-technologies/data-center-deployment-evpn-vxlan/ part 3 regarding eBGP+iBGP design.
Apart from that, they managed to implement proper EVPN multihoming, not MLAG-dependent cludges.
Just because someone has a non-zero market share doesn't mean that
(A) I'll actively track everything they do. There are too many more interesting things out there, and I never claimed to be an industry analyst. With many vendors I have friendly SEs who send me an occasional email saying "read this".
(B) I agree with what they're doing just because they are Vendor X. Every vendor got upset with my vies every now and then. Now it looks it's Juniper's turn ;)
(C) What they're doing makes sense. Remember that we're talking about reasonably-sized data center fabrics... and even if you have a data center fabric use case that needs EVPN at scale where IGP is no longer a viable underlay option, I'd guess it's an outlier due to very peculiar circumstances.
Finally, I was never talking only about configuration complexity (and related IOS or EOS configuration has similar complexity as Junos) but the complexity of what's going on behind the scenes. Also, you might have missed the "ignore AS-path check" tweak (or is Junos turning off BGP loop prevention logic by default?)
As for "MLAG-dependent kludges", I'm pointing that out every time I talk about EVPN and MLAG, but because they did one thing right doesn't mean that everything else they do makes equal sense.
As I wrote above, I think we can agree that we disagree on whether this is sane and move on. At least I will.
I used to think about eBGP and iBGP in this design as two completely independent protocols, which doesn't share routes between each other. So I don't really understand your concern about loop prevention here... Maybe I'm missing something.
But anyway, thank you for detailed explanation of your point of view. I really appreciate your hard work of clarifying so complex (sadly) world of modern networking.
If you want to run IBGP between ToR switches, they all have to be in the same AS. If the spine is in a different AS, you have the "I'm receiving EBGP prefixes originating from my own AS" problem, which usually requires "allowas-in" tweak or whatever it's called on a specific platform.
And, sorry, again I don't understand your logic... In this design spine does not receive any EVPN routes at all. All that spine see is just eBGP IPv4 routes from leafs, like in simple L3-only fabric.
Leaf switch doesn't need "to be" in one single AS - it uses one AS number for eBGP, and another completely independent AS number for iBGP. Look for config example above. This is complete BGP config - AS number is not configured under routing-options stanza. So in this example leaf switch in AS 65011 for eBGP and in AS 65000 for iBGP simultaneously.
In your own words - "complexity should belong to edges".
If you need RR for iBGP (of course you do) - just use your DC GW routers or, even better, virtual routers like vMX.
Most organizations that deal with networks of this size try to keep layer-2 domains as small as possible or move the problem to the real network edge - the hypervisors.
I have read documentation from a Vendor that claims they offer two variants of EVPN;
IGP underlay + iBGP Overlay
eBGP underlay + eBGP Overlay
The latter would seem neat as a single eBGP session from leaf to spine would support two "address families" (IPv4 for underlay and EVPN for Overlay). However, as Alex suggests above, it would seem to result in a bloated forwarding tables on the spines as they have to retain (and process) all the EVPN routes.
However, the vendor claims to have a config "switch" that prevents this - does this ring true? If so, it would seem to suggest their eBGP model would be preferable?
Regards,
James
And also think about the possibility of EVPN overlay implementation in already running eBGP IP fabric - would you want to add another family to already running eBGP sessions?
Do you think PBB-EVPN has a part to play within the DC?
You might consider it incorrect, and I have no problem with that. I might also consider anonymous comments saying "my concoction works perfectly assuming you know what you're doing" highly irrelevant as it clearly applies to a zillion of "just because you could doesn't mean that you should" ideas, and your "assuming you know what you're doing" remark only validates my opinion. Thanks for that ;)
Would love to know more if you could share the details (offline).