Commentary: We’re stuck with 40 years old technology

One of my readers sent me this email after reading my Loop Avoidance in VXLAN Networks blog post:

Not much has changed really! It’s still a flood/learn bridged network, at least in parts. We count 2019 and talk a lot about “fabrics” but have 1980’s networks still.

The networking fundamentals haven’t changed in the last 40 years. We still use IP (sometimes with larger addresses and augmentations that make it harder to use and more vulnerable), stream-based transport protocol on top of that, leak addresses up and down the protocol stack, and rely on technology that was designed to run on 500 meters of thick yellow cable.

Even worse, we still believe (in most parts) we should do bridging within a subnet and routing across subnets. Until we admit we were wrong and start doing routing on IP addresses (see also: CLNP and OSI stack) we won’t be making any progress.

Broken **** misusing transparent bridging behavior obviously doesn’t help - the real reason we have to support the brokenness is all the weird stuff that would stop working if we go from bridging to routing.

Sure, the classic bridging has partially been replaced with “fabrics” which hopefully are routed these days (heck, we had routed L2 with fabricpath or trill many years ago!).

While TRILL and FabricPath were almost routing on layer-2 they could never get away from the fundamentals of bridging: lack of address summarization, data-plane based learning, and flooding - the three characteristics that make bridging inherently non-scalable.

But with a few exceptions like your blog (thank you!) nobody, especially not the vendors, focus on the big problem: the edge, where things get a bit out of control.

No wonder: dealing with real-life problems is not sexy, often doesn’t sell, and usually turns out to be a huge morass.

As you mentioned, it is not particular hard to implement reactive fixes: run an active protocol to detect the loop; you mention STP but vendors should have defined a separate L2 probe, the job is important enough to have a separate protocol. Send a probe, detect a packet with a fabric-owned src-MAC on another fabric-edge port, shutdown the particular port(s).

STP does the job just fine unless someone actively messes it up:

  • Properly configured fabric edge ports would send BPDUs and shut down on receiving one (BPDU guard and-or root guard depending on whether you’re trying to detect a forwarding loop or an external switch)
  • If a node passing packets between two interfaces doesn’t recognize BPDUs, loop prevention works because BPDUs make it to the other end and trigger root guard.
  • If a node acts like 802.1 bridge, STP stops the loop.

Unfortunately, some people believe it’s better to turn off STP (because everyone tells them STP is a mess), and at least some vendors are more interested in pretending there’s no problem than in fixing it by shutting down the offending node/port.

Could we “solve” the problem by adding another STP-like protocol (like VMware beacon frames)? I’m positive there are vendors out there doing something along these lines with proprietary protocols (would appreciate pointers in the comments).

However, I have a nasty feeling that trying to create a standard protocol to solve that challenge would quickly bring us into “turtles all the way down” scenario - who would define what MAC address should be used for that protocol, and what happens when you connect two fabrics together?

The only real solution I see is to:

  • Admit that data-link layer connects adjacent nodes;
  • Stop pretending that we can use a technology that was designed to connect nodes to a shared cable when implementing multi-site networking;
  • Use routing regardless of whether it uses IP prefixes, IP addresses or even MAC addresses to build robust and stable networks.
I eventually will address most of the challenges described in this blog post in How Networking Really Works webinar. I hope to have the first part ready in June 2019.

11 comments:

  1. Routing or label switching?
    Replies
    1. At the ingress node it’s always routing even if the ingress node is a web proxy server (the approach used by large content providers). Whether you do hop-by-hop routing decision or insert path information (label) into the packet is another story.
  2. I think you meant BPDU Guard feature. Root Guard feature is a different use case.
  3. Well Fabricpath did have a form of summerization -(conversation learning in the core switches)
    Had Cisco went a little further with the Fabricpath packet construct and use of end node id field they could have been on to something.

    But hey, for those 80s protocols isn't this a testament to the hardiness of these 40 year old protocols? They work no matter how they are twisted.
    Replies
    1. Summarization = abstracting address information, making multiple endpoints reachable through a single forwarding entry

      Conversation learning = cache-based forwarding - install forwarding entry from control plane into data plane only when needed.

      Unfortunately, every single cache-based forwarding scheme I’ve seen eventually falls apart when faced with cache trashing or cache overflow. Conversational learning is no exception, as people who tried to use SVIs on F2 linecards quickly figured out.

      As for “they work no matter how they are twisted”, I’m positive anyone experiencing a meltdown after a bridging loop would tend to disagree.
    2. Apologies on my lack of context “they work no matter how they are twisted”. I wasn't referring to meltdowns from twisted use I meant the fact that these protocols have lasted so long, still in use working as intended predictably as designed, still in our toolbox, for we haven't developed(sadly by now) the "be all protocol" over any medium to date that fixes everything we encountered using our old tools to manage the applications above them. But then again I recall some great posts by you and the gang here about topics covering fixing the applications too.
  4. Regarding manufacturer-specific loop detection mechanisms, Nortel/Avaya/Extreme had a protocol called Simple Loop Prevention Protocol (SLPP) that was moderately tunable. I'm pretty far removed from that world now and have no idea if it's still in production or not, but it's at least available on older Nortel and Avaya ERS switches. Michael F. McNamara wrote a blog post about it here:

    https://blog.michaelfmcnamara.com/2007/12/simple-loop-prevention-protocol-slpp/
  5. There are also other loop detection/prevention mechanisms available depending on what topology you have, such as Cisco's Resilient Ethernet Protocol (REP), HP's Rapid Ring Protection Protocol (RRPP), and other vendors' similar offerings built for ring topologies.

    The overly simplified explanation of REP (and I believe RRPP functions similarly) is that you specify a link to keep in a blocking state, then all the other switches send eachother keepalives on their ring ports. When a break is detected, an emergency message is sent around the ring to bring that blocked link from before into a forwarding state.
    Replies
    1. REP wow I remember that. I recall a cisco data sheet on that back in mid 00s curious if anyone used it. Thanks for bringing that back up.
  6. The whole networking industry got it very wrong with the VXLAN technology, that was one of the industry's biggest blunders.
    The VXLAN project of DC folks is a good example of short sighted goals and desire to reinvent the wheel (SP folks had VPLS around for years when VXLAN came to be).
    SP folks then came up with EVPN as a replacement for VPLS and DC folks then shoehorned it on top of VXLAN.
    Then micro-segmentation buzzword came along and DC folks quickly realized that there's no field in the VXLAN header to indicate common access group nor the ability to stack VXLAN headers on top of each other (or some tried with custom VXLAN spin offs) so DC folks came up with a brilliant idea -let's maintain access lists! -like it's 90's again. As an SP guy I'm just shaking my head thinking did these guys ever heard of L2-VPNs which were around since inception of MPLS? (obviously not telling people about mac addresses they should not be talking to is better than telling everyone and then maintaining ACLs) in SP sector we learned that in the 90s.
    Oh and then there's the Traffic-Engineering requirement to route mice flows around elephant flows in the DC, not mentioning the ability to seamlessly steer traffic flows right from VMs through DC borders and the across MPLS core which is impossible with VXLAN islands in form of DCs hanging off of MPLS core.
    With regards to the MAC aggregation point, PBB-EVPN…
    So there you have it, all the challenges DC folks are trying to solve with half-baked solutions have already been solved years ago in the SP sector.
Add comment
Sidebar