Found this “gem” describing the differences between layer-2 and layer-3 on an unnamed $vendor web site.
Layer 2 is mainly concerned with the local delivery of data frames between network devices on the same network or local area network (LAN).
So far so good…
TL&DR: It’s 2020, and VXLAN with EVPN is all the rage. Thank you, you can stop reading.
On a more serious note, I got this questions from an Johannes Spanier after he read my do we need complex data center switches for NSX underlay blog post:
Would you agree that for smaller NSX designs (~100 hypervisors) a much simpler Layer2 based access-distribution design with MLAGs is feasible? One would have two distribution switches and redundant access switches MLAGed together.
I would still prefer VXLAN for a number of reasons:
A long while ago I got into an hilarious Tweetfest (note to self: don’t… not that I would ever listen) starting with:
Which feature and which Cisco router for layer2 extension over internet 100Mbps with 1500 Bytes MTU
The knee-jerk reaction was obvious: OMG, not again. The ugly ghost of BRouters (or is it RBridges or WAN Extenders?) has awoken. The best reply in this category was definitely:
I cannot fathom the conversation where this was a legitimate design option. May the odds forever be in your favor.
A dozen “this is a dumpster fire” tweets later the problem was rephrased as:
Got an interesting set of questions from a networking engineer who got stuck with the infamous “let’s push the **** down the stack” challenge:
So I am a rather green network engineer trying to solve the typical layer two stretch problem.
I could start the usual “friends don’t let friends stretch layer-2” or “your business doesn’t really need that” windmill fight, but let’s focus on how the vendors are trying to sell him the “perfect” solution:
One of my readers sent me a question along these lines…
VXLAN Network Identifier is 24 bit long, giving 16 us million separate segments. However, we have to map VNI into VLANs on most switches. How can we scale up to 16 million segments when we have run out of VLAN IDs? Can we create a separate VTEP on the same switch?
VXLAN is just an encapsulation format and does not imply any particular switch architecture. What really matters in this particular case is the implementation of the MAC forwarding table in switching ASIC.
Greg Cusanza in #BRK3192 just announced #Azure Extended Network, for stretching Layer 2 subnets into Azure!
As I know a little bit about how networking works within Azure, and I’ve seen something very similar a few times in the past, I was able to figure out what’s really going on behind the scenes in a few seconds… and got reminded of an old Russian joke I found somewhere on Quora:
After solving the BGP configuration challenge (could you imagine configuring BGP in a leaf-and-spine fabric with just a few commands in 2015), they did the same thing with EVPN configuration, where they decided to implement the simplest possible design (EBGP-only fabric running EBGP EVPN sessions on leaf-to-spine links), resulting in another round of configuration simplicity.
The March 2019 Packet Pushers Virtual Design Clinic had to deal with an interesting question:
Our server team is nervous about full-scale DR testing. So they have asked us to stretch L2 between sites. Is this a good idea?
The design clinic participants were a bit more diplomatic (watch the video) than my TL&DR answer which would be: **** NO!
Let’s step back and try to understand what’s really going on:
A good friend of mine who prefers to stay A. Nonymous for obvious reasons sent me his “how I lost my data center to a broadcast storm” story. Enjoy!
Small-ish data center with several hundred racks. Row of racks supported by an end-of-row stack. Each stack with 2 x L2 EtherChannels, one EC to each of 2 core switches. The inter-switch link details don’t matter other than to highlight “sprawling L2 domains."
VLAN pruning was used to limit L2 scope, but a few VLANs went everywhere, including the management VLAN.
Topology changes are a bane of large STP-based networks, and when they become a serious challenge you could probably use a tool that could track down what’s causing them.
I’m sure there’s a network management tool out there that can do just that (please write a comment if you know one); Eder Gernot decided to write his own while working on a hands-on assignment in the Building Network Automation Solutions online course. Like most course attendees he published the code on GitHub and might appreciate pull requests ;)
Wonder what else course attendees created in the past? Here’s a small sample.
One of my readers sent me this email after reading my Loop Avoidance in VXLAN Networks blog post:
Not much has changed really! It’s still a flood/learn bridged network, at least in parts. We count 2019 and talk a lot about “fabrics” but have 1980’s networks still.
The networking fundamentals haven’t changed in the last 40 years. We still use IP (sometimes with larger addresses and augmentations that make it harder to use and more vulnerable), stream-based transport protocol on top of that, leak addresses up and down the protocol stack, and rely on technology that was designed to run on 500 meters of thick yellow cable.
We all know about catastrophic headline-generating failures like AWS East-1 region falling apart or a major provider being down for a day or two. Then there are failures known only to those who care, like losing a major exchange point. However, I’m becoming more and more certain that the known failures are not even the tip of the iceberg - they seem to be the climber at the iceberg summit.
Antonio Boj sent me this interesting challenge:
Is there any way to avoid, prevent or at least mitigate bridging loops when using VXLAN with EVPN? Spanning-tree is not supported when using VXLAN encapsulation so I was hoping to use EVPN duplicate MAC detection.
MAC move dampening (or anything similar) doesn’t help if you have a forwarding loop. You might be able to use it to identify there’s a loop, but that’s it… and while you’re doing that your network is melting down.
Supposedly it was a problem with the management network used by their optical gear, but it looks a lot like a layer-2 network spanning 15 data centers and no control-plane policing on the managed devices… proving yet again that large-scale layer-2 networks are a really bad idea.
Layer 2 Fabrics can't be extended beyond 2 Spine switches. I had a long argument with a $vendor guys on this. They don't even count SPB as Layer 2 fabric and so forth.
The root cause of this myth is the lack of understanding of what layer-2, layer-3, bridging and routing means. You might want to revisit a few of my very old blog posts before moving on: part 1, part 2, what is switching, layer-3 switches and routers.