TL&DR: It’s 2020, and VXLAN with EVPN is all the rage. Thank you, you can stop reading.
On a more serious note, I got this questions from an Johannes Spanier after he read my do we need complex data center switches for NSX underlay blog post:
Would you agree that for smaller NSX designs (~100 hypervisors) a much simpler Layer2 based access-distribution design with MLAGs is feasible? One would have two distribution switches and redundant access switches MLAGed together.
I would still prefer VXLAN for a number of reasons:
- No dependency on MLAG (assuming you get the edge design right). MLAG (or any other technology that makes two devices pretend they’re a single device through convoluted state sharing) is a brittle kludge. I’ve seen several data center meltdowns caused by MLAG bugs, and haven’t heard of one caused by an IP forwarding bug for a very long time.
- I wouldn’t run a bridging-only fabric without STP. STP is the only protection you have against stupidities like cable errors or technicians testing cables by plugging TX and RX ports together (although modern connectors make that less likely). Unfortunately, STP is way worse than MLAG.
- I prefer having a “we know what we’re doing” behavior in my backbone instead of “we haven’t heard anything so it must be OK to forward”.
- Call me old-fashioned but I prefer routing over bridging in my transport infrastructure.
- Finally, even though my friends at Avaya (now Extreme) tell me how well SPB works (and I believe them), I still prefer running a protocol that has been around for decades (like OSPF or IS-IS for IPv4/IPv6) hoping someone else already hit all the major bugs.
Alternatively, to put my question into another perspective: For “small” designs is there any problem leaf-spine VXLAN EVPN solves that STP free distribution/access cannot?
Robustness, stability, standard-based, multi-vendor (not that I would focus on this one)… pick one (or a few). Also note that you don’t have to run EVPN, VXLAN on top of simple IP network is good enough.
If your data center design follows the traditional every VLAN on every leaf switch approach (lovingly known as “let’s turn our data center into a thick yellow cable"), you’ll be equally well off having a static VXLAN flood list containing every ToR switch in your fabric. Obviously I’d hope you’d automate the generation of that flood list, but with four to six switches (all you need for ~100 hypervisors) you just might get by configuring it manually.
And now for the ubiquitous list of where you could find way more details. You could go for our data center course (available with Expert Subscription) or these webinars (available with Standard Subscription):