TL&DR: It’s 2020, and VXLAN with EVPN is all the rage. Thank you, you can stop reading.
On a more serious note, I got this questions from an Johannes Spanier after he read my do we need complex data center switches for NSX underlay blog post:
Would you agree that for smaller NSX designs (~100 hypervisors) a much simpler Layer2 based access-distribution design with MLAGs is feasible? One would have two distribution switches and redundant access switches MLAGed together.
I would still prefer VXLAN for a number of reasons:
I published a blog post describing how complex the underlay supporting VMware NSX still has to be (because someone keeps pretending a network is just a thick yellow cable), and the tweet announcing it admittedly looked like a clickbait.
[Blog] Do We Need Complex Data Center Switches for VMware NSX Underlay
Martin Casado quickly replied NO (probably before reading the whole article), starting a whole barrage of overlay-focused neteng-versus-devs fun.
A little while ago I explained why you can’t use more than 4K VXLAN segments on a ToR switch (at least with most ASICs out there). Does that mean that you’re limited to a total of 4K virtual ethernet segments?
Of course not.
You could implement overlay virtual networks in software (on hypervisors or container hosts), although even there the enterprise products rarely give you more than a few thousand logical switches (to use NSX terminology)… but that’s a product, not technology limitation. Large public cloud providers use the same (or similar) technology to run gazillions of tenant segments.
Got this interesting question from one of my readers:
BGP EVPN message carries both VNI and RT. In importing the route, is it enough either to have VNI ID or RT to import to the respective VRF?. When importing routes in a VRF, which is considered first, RT or the VNI ID?
A bit of terminology first (which you’d be very familiar with if you ever had to study how MPLS/VPN works):
Got an interesting set of questions from a networking engineer who got stuck with the infamous “let’s push the **** down the stack” challenge:
So I am a rather green network engineer trying to solve the typical layer two stretch problem.
I could start the usual “friends don’t let friends stretch layer-2” or “your business doesn’t really need that” windmill fight, but let’s focus on how the vendors are trying to sell him the “perfect” solution:
I’m running two workshops in Zurich in the next 10 days:
- Comparing VMware NSX and Cisco ACI (and how EVPN and VXLAN fit into the big picture) on Thursday, November 28th;
- Explaining how you could use VXLAN with EVPN to build infrastructure for active-active data centers on Tuesday, December 3rd.
I published the slide deck for the NSX versus ACI workshop a few days ago (and you can already download it if you have a paid ipSpace.net subscription) and it’s full of new goodness like ACI vPod, multi-pod ACI, multi-site ACI, ACI-on-AWS, and multi-site NSX-V and NSX-T.
One of my readers sent me a question along these lines…
VXLAN Network Identifier is 24 bit long, giving 16 us million separate segments. However, we have to map VNI into VLANs on most switches. How can we scale up to 16 million segments when we have run out of VLAN IDs? Can we create a separate VTEP on the same switch?
VXLAN is just an encapsulation format and does not imply any particular switch architecture. What really matters in this particular case is the implementation of the MAC forwarding table in switching ASIC.
A Network Artist left a lengthy comment on my Brief History of VMware NSX blog post. He raised a number of interesting topics, so I decided to write my replies as a separate blog post.
Using Geneve is an interesting choice to be made and while the approach has it’s own Pros and Cons, I would like to stick to VXLAN if I were to recommend to someone for few good reasons.
The main reason I see for NSX-T using Geneve instead of VXLAN is the need for additional header fields to carry metadata around, and to implement Network Services Header (NSH) for east-west service insertion.
Remember how Arista promoted VXLAN coupled with deep buffer switches as the perfect DCI solution a few years ago? Someone took Arista’s marketing too literally, ran with the idea and combined VXLAN-based DCI with traditional MLAG+STP data center fabric.
While I love that they wrote a blog post documenting their experience (if only more people would do that), it doesn’t change the fact that the design contains the worst of both worlds.
Here are just a few things that went wrong:
An attendee in my Building Next-Generation Data Center online course was asked to deploy numerous relatively small OpenStack cloud instances and wanted select the optimum virtual networking technology. Not surprisingly, every $vendor had just the right answer, including Arista:
We’re considering moving from hypervisor-based overlays to ToR-based overlays using Arista’s CVX for approximately 2000 VLANs.
As I explained in Overlay Virtual Networking, Networking in Private and Public Clouds and Designing Private Cloud Infrastructure (plus several presentations) you have three options to implement virtual networking in private clouds:
Got this remark from a reader after he read the VXLAN and Q-in-Q blog post:
Another area where there is a feature gap with EVPN VXLAN is Private VLANs with VXLAN. They’re not supported on either Nexus or Juniper switches.
I have one word on using private VLANs in 2019: Don’t. They are messy and hard to maintain (not to mention it gets really interesting when you’re combining virtual and physical switches).
Antonio Boj sent me this interesting challenge:
Is there any way to avoid, prevent or at least mitigate bridging loops when using VXLAN with EVPN? Spanning-tree is not supported when using VXLAN encapsulation so I was hoping to use EVPN duplicate MAC detection.
MAC move dampening (or anything similar) doesn’t help if you have a forwarding loop. You might be able to use it to identify there’s a loop, but that’s it… and while you’re doing that your network is melting down.
One of my subscribers sent me a question along these lines (heavily abridged):
My customer is running a colocation business, and has to provide L2 connectivity between racks, sometimes even across multiple data centers. They were using Q-in-Q to deliver that in a traditional fabric, and would like to replace that with multi-site EVPN fabric with ~100 ToR switches in each data center. However, Cisco doesn’t support Q-in-Q with multi-site EVPN. Any ideas?
As Lukas Krattiger explained in his part of Multi-Site Leaf-and-Spine Fabrics section of Leaf-and-Spine Fabric Architectures webinar, multi-site EVPN (VXLAN-to-VXLAN bridging) is hard. Don’t expect miracles like Q-in-Q over VNI any time soon ;)
Christoph Jaggi asked me a few questions about using VXLAN with EVPN to build data center fabrics and data center interconnects (including active/active data centers). The German version was published on Inside-IT, here’s the English version.
He started with an obvious one:
What is an active-active data center and why would I want to use an active-active data center?
Numerous organizations have multiple data centers for load sharing or disaster recovery purposes. They could use one of their data centers and have the other(s) as warm or cold standby (active/backup setup) or use all data centers at the same time (active/active).
A friend of mine told me about a “VXLAN is insecure, the sky is falling” presentation from RIPE-77 which claims that you can (under certain circumstances) inject packets into VXLAN virtual networks from the Internet.
Welcome back, Captain Obvious. Anyone looking at the VXLAN packet could immediately figure out that there’s no security in VXLAN. I pointed that out several times in my blog posts and presentations, including Cloud Computing Networking (EuroNOG, September 2011) and NSX Architecture webinar (August 2013).