Building Network Automation Solutions
6 week online course starting in September 2017

Is STP Really Evil?

Maxim Gelin sent me an interesting question:

Can you please explain to me, why is STP supposed to be evil? What's wrong with STP?

STP’s fundamental problem is that it’s a fail-close, not a fail-open protocol.

Ethernet bridges (later renamed to layer-2 switches) were designed to be transparent plug-and-pray devices that you could drop anywhere into the network and hope they’ll work. They could not rely on having a control-plane protocol between adjacent nodes (like most modern routing protocols do) – lack of control-plane communication indicated lack of adjacent bridges.

That’s all nice and dandy until a bridge loses its mind, and stops sending BPDUs (control plane activity) while still forwarding traffic (data plane activity). Adjacent bridges think they have hosts plugged into the affected ports (this is the fail close part), and start forwarding traffic through those ports, resulting in a nice forwarding loop (been there, seen that).

A bridge with hung control plane would not forward BPDUs between its ports (which would stop the forwarding loop), because the forwarding entry for the STP multicast address still punts packets to the CPU.

Fail-open or fail-close?

This section was inserted on August 1st 2014 to (hopefully) reduce the terminology confusion.

As Chris Marget mentioned in his comment, the "fail-open" or "fail-close" is a clunky terminology bound to be misunderstood (as evidenced by numerous other comments).

Being an oldtimer, I always see computer networks as part of generic electrical circuits and switching landscape – for me, "fail-close" = "pass current or traffic on failure" and "fail-open" = "stop passing current or traffic".

Other people think about computer networks in valve or door analogies. For them "fail close" means "the door or valve is closed on failure – there’s no traffic" and "fail open" obviously means "the door or valve is opened on failure, and the traffic passes".

In the context of this blog post "fail close" means "a failed/confused bridge continues to forward the traffic, and the bridged network will send the traffic across such bridge." You might have a different opinion on what "open" or "close" means, and it’s as valid as any… but quoting Cisco’s documentation won’t make your point any more valid (it just proves that the writer of that document agrees with your view of what opens or closes on failure). I would however appreciate a pointer to a more authoritative source (although I doubt it exists).

Back to bridging and STP

The solution to the confused bridge traffic forwarding problem is quite simple: Cisco IOS has bridge assurance – you configure a port to expect an adjacent bridge, and the port doesn’t forward traffic if it doesn’t receive BPDUs from the other end.

Fail-close nature of STP isn't its only drawback. The original STP had numerous other challenges, from slow convergence to lack of VLAN awareness. Unfortunately the IEEE decided to keep heaping kludges on top of STP until the whole thing nearly toppled over – it’s like trying to build the global Internet by tinkering with RIP ad nauseam instead of designing BGP.

The generic solution to this particular problem (and a few others, including hosts turning into bridges) seems to be extremely simple: allow a switch port to be a host-facing port (implicitly configuring BPDU guard and a few other things) or a fabric port (implicitly configuring bridge assurance and VLAN trunking). Why hasn’t any vendor implemented such a simple concept? I can’t figure it out – your comments are most welcome!

26 comments:

  1. Technology has its legacies. Technology has a history. Technology makes history. It’s important to understand history to understand the kludges, as you call them. It all began with the Yellow cable. The original, coaxial Yellow cable is the first medium for IEEE 802.3, Ethernet. It’s a shared medium. Any station attached to the medium can communicate with each other. With stations directly attached to a single, common medium no extra device, no layer-2 bridge (switch) is needed for communication. Initially it was expected that each LAN segment would have its own router to interconnect to another LAN segment. Radia Perlman worked at DEC, the company that together with Intel and Xerox defined the 10 Mb/s DIX standard. In http://www.networkworld.com/article/2202492/lan-wan/living-legends--radia-perlman--layer-3-wizard.html Radia Perlman explains: “‘Routing between links was Layer 3's job. When I tried to argue that we may need to forward packets from one Ethernet link to another, the reply was 'Our customers would never want to do that'. Their perception was Layer 3 was just unnecessary bytes on the wire.’ But it didn't take long for that shortsightedness to become obvious. Customers did, after all, want to talk from one Ethernet to another.”
    So Radia Perlman invented STP constrained by several requirements. For a station it should not matter if it communicates to a station on the same LAN segment or via a bridge to a station on another LAN segment. Since Ethernet has no TTL value, however, frames could loop forever on a ring. Thus, STP needs to maintain the illusion of a single shared medium. This is why STP builds a tree.
    Later on with the introduction of twisted pairs as the wired medium, Ethernet’s physical bus topology converged to a physical star topology requiring a central hub that would create the illusion of a shared medium. After all a hub forwards any incoming frames to all ports but the port a frame was received from. With the introduction of learning bridges (switches) Ethernet could have transformed into something even more powerful. However, with a huge installed basis compatibility becomes a key issue and revolution turns into evolution.
    We all know that STP and large L2 domains have their (severe) limitations. However, without understanding the past it’s very easy to complain about how things have developed.
    One of the disadvantages of STP that you didn’t mention is the waste of capacity. Because of the tree structure redundant paths cannot be used. Thus, frames might travel much farther than the physical topology would require. Shortest Path Bridging (SPB, 802.1aq) solves all of (R/M)STP’s problems boosting Ethernet into another dimension. At its core SPB runs IS-IS – another protocol invented by Radia Perlman …

    ReplyDelete
    Replies
    1. You must be new to my blog ;) Welcome!

      See also:

      http://blog.ipspace.net/2010/07/bridges-kludge-that-shouldnt-exist.html
      http://blog.ipspace.net/2010/07/bridging-and-routing-is-there.html
      http://blog.ipspace.net/2010/07/bridging-and-routing-part-ii.html

      ... and a few others.

      Delete
    2. Ivan, keep up the work. Reading the comments to your posts always makes me happy.

      Delete
  2. Ivan

    you write, STP’s fundamental problem is that it’s a fail-close, not a fail-open protocol.

    Should this read, STP’s fundamental problem is that it’s a fail-open, not a fail-close protocol?

    ReplyDelete
    Replies
    1. Fail-Close like an electrical circuit. A closed circuit means it's a completed one and electrons pass. An open circuit is incomplete and as such electrons don't pass. Fail-Open would mean no traffic if forwarded until something closes the circuit.

      Delete
    2. Thanks, I now get it!

      Delete
  3. My impression is that SPB is intended to be a replacement for STP (along with other things as well). Too bad hardly anyone seems interested in implementing it, or anything else new outside of the data center...

    ReplyDelete
  4. Ivan,

    "Fail-open nature of STP isn't its only drawback..." should instead read "Fail-closed nature of STP isn't its only drawback...". It leads to confusion.

    ReplyDelete
  5. Let's imagine for a second that we create a networking containing Cisco devices only.

    Bridge Assurance works nice in a pure STP topology (802.1w and 802.1s only, 802.1d does not support BA) since it has blocked ports by definitions and these ports prohibit L2 loop creation. That's fine for a campus environment (if your hardware/software supports it as well :) )

    But in a pure DC environment where you deployed Nexus 5k/2k with the vPC feature, Cisco definitely does not recommend to enable Bridge Assurance other than on the vPC peer link.

    :)

    Nicolas

    ReplyDelete
  6. Bridging with STP sure beats bridging *without* STP! :)

    The biggest problem with STP in my opinion is that it's a dangerous combination of "just works", "difficult to diagnose" and "somewhat fragile", making it easy to screw up in ways that are hard to understand while it's broken.

    There have been some wonderfully documented cases of folks who should know better pushing STP beyond its breaking point. The STP process didn't stop responding, but it couldn't do its job anymore.

    No mention of loopguard? It's a bit like bridge assurance (in one direction, anyway), and quite a bit more commonly available.

    Fail open/closed is clunky language that begets confusion. It's worse in the case of bump-in-the-wire devices with bypass relays, but the confusion has already appeared here in the comments.

    I tend to view the close/open question from the electrical (switch) perspective, but a quick google of "ethernet tap fail open" demonstrates that folks marketing and using network devices sometimes mean exactly the opposite.

    ReplyDelete
    Replies
    1. "Fail open/closed is clunky language that begets confusion" - OK, got the point (= marketing destroyed another perfectly good term ;). What would you call STP then? Forwarding-when-confused? ;))

      Delete
    2. I've always thought of STP as controlling a flood (broadcast), and a "flood gate" controls that. So a fail-open means the gate is open, allowing floods....

      Delete
    3. Ivan wrote: "In the context of this blog post "fail close" means "a failed/confused bridge continues to forward the traffic, and the bridged network will send the traffic across such bridge." You might have a different opinion on what "open" or "close" means, and it’s as valid as any… but quoting Cisco’s documentation won’t make your point any more valid (it just proves that the writer of that document agrees with your view of what opens or closes on failure). I would however appreciate a pointer to a more authoritative source (although I doubt it exists)."

      If we are networking engineers or trainee in networking, must remember that the main activity in a digital device is 0 and 5 volts, and because that, the Ivan's point of open and close is correct, like a elemental electronic circuit.

      Delete
  7. "STP’s fundamental problem is that it’s a fail-close, not a fail-open protocol"

    I think you mean:

    STP’s fundamental problem is that it’s a fail-open, not a fail-closed protocol.

    ReplyDelete
    Replies
    1. Did you read the other comments?

      Delete
    2. The comments section of your posts are the best place to get free smiles in the morning.

      Delete
  8. I have used this quote from a Cisco doc in my attempts to get people to understand this concept. From a “Cisco Validated Design” document entitled “Data Center Design – IP Network Infrastructure” (pps 35-36, October 8, 2009):

    “Transparent bridging is the result of a long technological evolution that was guided by the desire to keep the property of the thick coaxial cable that was the base for the original Ethernet networks. Transparent means that the stations using the service are not aware that the traffic they are sending is bridged; they are not participating in the bridging effort. The technology is similarly transparent to the user, and a high end Ethernet switch running STP is still supposed to be plug-and-play, just like a coaxial cable or a repeater were. As a result, unlike routers, bridges have to discover whether their ports are connected to peer bridges or plain hosts. In particular, in the absence of control message reception on a port, a bridge will assume that it is connected to a host and will provide connectivity. Therefore, the most significant differences between routing and bridging with STP (spanning tree protocol) are as follows:
    • A routing protocol identifies where to send packets.
    • STP identifies where not to send frames.
    The obvious consequence is that if a router fails to receive the appropriate updates, the parts of the network that were relying on this router for communication will not be able to reach each other. This failure tends to be local, as the communication within those distant network parts is not affected. If a bridge misses control information, it will instead open a loop. As it has been observed, this will most likely impact the whole bridging domain.”

    I summarize this using language that aligns with network security discussions, in which "fail closed" is a safe failure condition vs "fail open":

    "To summarize the two major points being made in the quote from Cisco:
    1. a routed interface will “fail closed” with no impact on any other routed interfaces, while a set of transparently bridged ports defined as a single virtual LAN will “fail open” with an impact on all network components of the VLAN
    2. the scope of a routed network failure is limited to the ports connected to the routed interface, while the scope of a bridged VLAN failure can impact all networking equipment in the entire bridging domain (entire data center, or multiple data centers in the case of stretched VLANs)

    ReplyDelete
    Replies
    1. The CVD is right on the money. Routing = fail closed. STP = fail open.

      If you want STP to fail close you need bridge assurance.

      It's a share Ivan's posts gets mixed up.

      Delete
  9. I agree with Ivan almost everything in this post except how I see fail-open/close part but it is okay.
    For the implicit enablement of bpdu-guard or BA , since they are opposite things, when bpdu-guard says if I see bpdu from here , I will not allowed and take an action , while BA expects bidirectional hello messages , for the switches where the uplink logic applies it might be enabled IMO. But in the data center it may not be an easy since server and switch places may change and when you enable bpdu-guard for the server implicitly, if same port would be changed with switch, port Stp expectation automatically should change as BA. Is this possible, maybe yes. Once switch port see the Ethernet source mac address , from the vendor assigned part of MAC address , switch could act based on it. You might say connected device might be belong to Cisco switch and server since many vendor has switches, servers, firewalls so on in the data center , then for the different product type vendor could assigned MAC addresses hierarchically such as 00-00-01 is our switches , 00-00-02 is servers so on.

    ReplyDelete
    Replies
    1. You might have to configure a switch port to be an edge or a fabric port - too much plug-and-pray is never a good thing (although you could use LLDP to figure out which ports are fabric ports).

      OTOH, if you don't know whether another switch or a server is connected to a port of a DC switch, you might have bigger problems than STP on your hands ;)

      Delete
  10. "That’s all nice and dandy until a bridge loses its mind, and stops sending BPDUs (control plane activity) while still forwarding traffic (data plane activity). Adjacent bridges think they have hosts plugged into the affected ports (this is the fail close part), and start forwarding traffic through those ports, resulting in a nice forwarding loop (been there, seen that)."

    If a bridge loses its mind??

    You are in essence saying you need a configuration variable (such as "bridge assurance") to make a switch stop forwarding traffic in case it, or an adjacent switch, has a buggy control plane implementation!

    A better rule might be, just buy well-tested switches.

    A correctly implemented bridge/switch must guarantee that it processes spanning tree with the highest level of priority, and treats send/receive BPDUs on the network with the highest level of priority.

    Sorry, but once you implement a control plane protocol in a buggy manner, all bets are off. It's your fault for buying a switch which does not function as specified.

    ReplyDelete
    Replies
    1. So you believe someone solved the halting problem and buy only switches that have proven bug-free code? Come on...

      The switch that lost its mind (in my case) came from one of the large vendors, and the loss of control plane was caused by a slow memory leak.

      Delete
  11. Thanks GOD Cisco came with fabric path :)

    ReplyDelete
  12. Hello Ivan,

    You have mentioned the comment "because the forwarding entry for the STP multicast address still punts packets to the CPU".

    Can you elucidate a bit please?
    If I am mistaken not a BPDU being punted to CPU is still processed (though any PDU will be punted to CPU irrespective the platform).

    What's the correlation you are tying to make here?

    By the way, your blog is like a new Network Engineer born out of silos.

    Sincerely,
    Saeed Ansari

    ReplyDelete
    Replies
    1. The point is that you cannot expect the BPDUs to be flooded across a hung bridge because the HW punts them to the hung CPU, so adjacent bridges cannot discover the loop.

      Delete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.