After (hopefully) agreeing on what routing, bridging, and switching are, let’s focus on the first important topic in this area: how do we get a packet across the network? Yet again, there are three fundamentally different technologies:
- Source node knows the full path (source routing)
- Source node opened a path (virtual circuit) to the destination node and uses that path to send traffic
- The network performs hop-by-hop destination-address-based packet forwarding.
More details in the Getting Packets Across the Network video.
When I still cared about CCIE certification, I was always tripped up by the weird scenario with (A) mismatched ARP and MAC timeouts and (B) default gateway outside of the forwarding path. When done just right you could get persistent unicast flooding, and I’ve met someone who reported average unicast flooding reaching ~1 Gbps in his data center fabric.
One would hope that we wouldn’t experience similar problems in modern leaf-and-spine fabrics, but one of my readers managed to reproduce the problem within a single subnet in FabricPath with anycast gateway on spine switches when someone misconfigured a subnet mask in one of the servers.
If you’re working solely with IP-based networks, you’re probably quick to assume that hop-by-hop destination-only forwarding is the only packet forwarding paradigm that makes sense. Not true, even today’s networks use a variety of forwarding mechanisms, most of them called some variant of routing or switching.
What exactly is the difference between the two, and what is bridging? I’m answering these questions (and a few others like what’s the difference between data-, control- and management planes) in the Bridging, Routing and Switching Terminology video.
While packets should never be reordered in transit in transparent bridging, there’s no such guarantee in IP networks, and IP applications should tolerate out-of-order packets.
One of my regular readers who designs and builds networks supporting VoIP applications disagreed with that citing numerous real-life examples.
Of course he was right, but let’s get the facts straight first:
Let’s agree for a millisecond that you can’t find any other way to migrate your workload into a public cloud than to move the existing VMs one-by-one without renumbering them. Doing a clumsy cloud migration like this will get you the headaches and the cloud bill you deserve, but that’s a different story. Today we’ll talk about being clumsy the right and the wrong way.
There are two ways of solving today’s challenge:
Found this “gem” describing the differences between layer-2 and layer-3 on an unnamed $vendor web site.
Layer 2 is mainly concerned with the local delivery of data frames between network devices on the same network or local area network (LAN).
So far so good…
TL&DR: It’s 2020, and VXLAN with EVPN is all the rage. Thank you, you can stop reading.
On a more serious note, I got this questions from an Johannes Spanier after he read my do we need complex data center switches for NSX underlay blog post:
Would you agree that for smaller NSX designs (~100 hypervisors) a much simpler Layer2 based access-distribution design with MLAGs is feasible? One would have two distribution switches and redundant access switches MLAGed together.
I would still prefer VXLAN for a number of reasons:
A long while ago I got into an hilarious Tweetfest (note to self: don’t… not that I would ever listen) starting with:
Which feature and which Cisco router for layer2 extension over internet 100Mbps with 1500 Bytes MTU
The knee-jerk reaction was obvious: OMG, not again. The ugly ghost of BRouters (or is it RBridges or WAN Extenders?) has awoken. The best reply in this category was definitely:
I cannot fathom the conversation where this was a legitimate design option. May the odds forever be in your favor.
A dozen “this is a dumpster fire” tweets later the problem was rephrased as:
Got an interesting set of questions from a networking engineer who got stuck with the infamous “let’s push the **** down the stack” challenge:
So I am a rather green network engineer trying to solve the typical layer two stretch problem.
I could start the usual “friends don’t let friends stretch layer-2” or “your business doesn’t really need that” windmill fight, but let’s focus on how the vendors are trying to sell him the “perfect” solution:
One of my readers sent me a question along these lines…
VXLAN Network Identifier is 24 bit long, giving 16 us million separate segments. However, we have to map VNI into VLANs on most switches. How can we scale up to 16 million segments when we have run out of VLAN IDs? Can we create a separate VTEP on the same switch?
VXLAN is just an encapsulation format and does not imply any particular switch architecture. What really matters in this particular case is the implementation of the MAC forwarding table in switching ASIC.
Greg Cusanza in #BRK3192 just announced #Azure Extended Network, for stretching Layer 2 subnets into Azure!
As I know a little bit about how networking works within Azure, and I’ve seen something very similar a few times in the past, I was able to figure out what’s really going on behind the scenes in a few seconds… and got reminded of an old Russian joke I found somewhere on Quora:
After solving the BGP configuration challenge (could you imagine configuring BGP in a leaf-and-spine fabric with just a few commands in 2015), they did the same thing with EVPN configuration, where they decided to implement the simplest possible design (EBGP-only fabric running EBGP EVPN sessions on leaf-to-spine links), resulting in another round of configuration simplicity.
The March 2019 Packet Pushers Virtual Design Clinic had to deal with an interesting question:
Our server team is nervous about full-scale DR testing. So they have asked us to stretch L2 between sites. Is this a good idea?
The design clinic participants were a bit more diplomatic (watch the video) than my TL&DR answer which would be: **** NO!
Let’s step back and try to understand what’s really going on:
A good friend of mine who prefers to stay A. Nonymous for obvious reasons sent me his “how I lost my data center to a broadcast storm” story. Enjoy!
Small-ish data center with several hundred racks. Row of racks supported by an end-of-row stack. Each stack with 2 x L2 EtherChannels, one EC to each of 2 core switches. The inter-switch link details don’t matter other than to highlight “sprawling L2 domains."
VLAN pruning was used to limit L2 scope, but a few VLANs went everywhere, including the management VLAN.
Topology changes are a bane of large STP-based networks, and when they become a serious challenge you could probably use a tool that could track down what’s causing them.
I’m sure there’s a network management tool out there that can do just that (please write a comment if you know one); Eder Gernot decided to write his own while working on a hands-on assignment in the Building Network Automation Solutions online course. Like most course attendees he published the code on GitHub and might appreciate pull requests ;)
Wonder what else course attendees created in the past? Here’s a small sample.