One of my readers sent me this email after reading my Loop Avoidance in VXLAN Networks blog post:
Not much has changed really! It’s still a flood/learn bridged network, at least in parts. We count 2019 and talk a lot about “fabrics” but have 1980’s networks still.
The networking fundamentals haven’t changed in the last 40 years. We still use IP (sometimes with larger addresses and augmentations that make it harder to use and more vulnerable), stream-based transport protocol on top of that, leak addresses up and down the protocol stack, and rely on technology that was designed to run on 500 meters of thick yellow cable.
Even worse, we still believe (in most parts) we should do bridging within a subnet and routing across subnets. Until we admit we were wrong and start doing routing on IP addresses (see also: CLNP and OSI stack) we won’t be making any progress.
Broken **** misusing transparent bridging behavior obviously doesn’t help - the real reason we have to support the brokenness is all the weird stuff that would stop working if we go from bridging to routing.
Sure, the classic bridging has partially been replaced with “fabrics” which hopefully are routed these days (heck, we had routed L2 with fabricpath or trill many years ago!).
While TRILL and FabricPath were almost routing on layer-2 they could never get away from the fundamentals of bridging: lack of address summarization, data-plane based learning, and flooding - the three characteristics that make bridging inherently non-scalable.
But with a few exceptions like your blog (thank you!) nobody, especially not the vendors, focus on the big problem: the edge, where things get a bit out of control.
No wonder: dealing with real-life problems is not sexy, often doesn’t sell, and usually turns out to be a huge morass.
As you mentioned, it is not particular hard to implement reactive fixes: run an active protocol to detect the loop; you mention STP but vendors should have defined a separate L2 probe, the job is important enough to have a separate protocol. Send a probe, detect a packet with a fabric-owned src-MAC on another fabric-edge port, shutdown the particular port(s).
STP does the job just fine unless someone actively messes it up:
- Properly configured fabric edge ports would send BPDUs and shut down on receiving one ( root guard )
- If a node passing packets between two interfaces doesn’t recognize BPDUs, loop prevention works because BPDUs make it to the other end and trigger root guard.
- If a node acts like 802.1 bridge, STP stops the loop.
Unfortunately, some people believe it’s better to turn off STP (because everyone tells them STP is a mess), and at least some vendors are more interested in pretending there’s no problem than in fixing it by shutting down the offending node/port.
Could we “solve” the problem by adding another STP-like protocol (like VMware beacon frames)? I’m positive there are vendors out there doing something along these lines with proprietary protocols (would appreciate pointers in the comments).
However, I have a nasty feeling that trying to create a standard protocol to solve that challenge would quickly bring us into “turtles all the way down” scenario - who would define what MAC address should be used for that protocol, and what happens when you connect two fabrics together?
The only real solution I see is to:
- Admit that data-link layer connects adjacent nodes;
- Stop pretending that we can use a technology that was designed to connect nodes to a shared cable when implementing multi-site networking;
- Use routing regardless of whether it uses IP prefixes, IP addresses or even MAC addresses to build robust and stable networks.