STP loops strike again

Vasilis sent me an interesting campfire story. It started with a common mistake:

An external partner of my company used an Ethernet cable and connected two switchport interfaces of one of our access switches .

Being a conscientious networking engineer, he had the usual safeguards in place ...

I have to mention that these access ports are configured with Root, BDPU guard as well as Portfast.

... but he nonetheless experienced a total LAN meltdown:

This had a severe effect (unavailability) to our LAN. L2 loops caused our L2 switches, Core Switches and Routers to reach 100% CPU!

Root cause analysis turned out a surprising fact:

I thought that BPDU and Root guard were enough to protect the access ports of our LAN but found out it was not true.

Fortunately, the Cisco documentation explaining the problem is quite explicit:

It should be noted that short-living loops are not prevented by Root or BPDU Guards, if two portfast-enabled ports are connected directly or through the hub

The problem? Looped ports won’t shut down until a BPDU packet is sent through one of them, and a single broadcast (for example, ARP packet) sent in that interval can cause a network meltdown.

The solution? Vasilis found a solution similar to those proposed in comments to my Preventing Bridging Loops Without STP post: use switchport port-security and limit the number of MAC addresses accepted on the switch port.

Unfortunately, this solution works primarily in campus environments; you cannot use it in virtualized data centers with moving VMs as you can never predict how many VMs (and MAC addresses) will reside within a physical server.

Better ideas? Please share them in the comments!

Short summary: Bridging (also known as layer-2 switching) sucks (and some implementations suck more than others). We need to move to L3-based solutions.

35 comments:

  1. Hello! I'm just wondering about; Is the storm-control feature (configured with carefully measured thresholds) combined with BPDU guard can help you! (With storm control you can limit the number of the broadcast, multicast and unicast frames received on a port).

    http://www.cisco.com/en/US/docs/switches/lan/catalyst2950/software/release/12.1_22ea/SCG/swtrafc.html#wp1229854

    ReplyDelete
  2. I believe "unicast" refers to "unknown unicast" in a storm control sense.

    ReplyDelete
  3. Anonymous coward24 April, 2012 09:35

    Bridging doesn't suck, it just needs to be implemented properly. Storm control is always a good idea. MAC filtering / limiting in campus networks is very effective. For DCs TRILL is a very good option, particularily for those that have already architected with the requirement for some form of layer 2 within their DC.

    ReplyDelete
  4. Yes! You're right!

    ReplyDelete
  5. Storm-control is best practise.
    We should ask Vasilis what is the size of his VLANs (how many hosts/ports) because this surprise me that the storm is so brutal (too many broadcast? Multicast?)!

    ReplyDelete
  6. Ivan Pepelnjak24 April, 2012 10:37

    I guess it depends a lot on the switch type you're using, its CPU power, and its control-plane protection. Once you get a single broadcast in a forwarding loop (ARP request would be a perfect fit as it has to hit the switch's CPU as well), it can swamp the CPU, which then stops processing and sending BPDU, resulting in a neat meltdown.

    ReplyDelete
  7. Youssef El Fathi24 April, 2012 12:23

    You can also use Dynamic ARP Inspection and rate limiting incoming ARP Packets.

    ReplyDelete
  8. While STP loops can be bad news they should never crash a switch, the OS should be able to deal with high CPU utilization and fairly schedule tasks. As part of acceptance testing I always encourage tests that focus on how a switch deals with very high CPU utilization, by running switches in the test network at 90%+ CPU utilization and ensuring the control plane integrity (e.g. no neighbors are dropped, STP is stable, MLAG/vPC peers remain active, etc...). This is a challenge of a converged control plane many vendors are pushing while it does give you fewer switches to manage it also has the ability to take out more switch ports, as well as drive higher CPU utilization on the switch...

    ReplyDelete
  9. Yeah - it would be nice if all of the devices deployed in the top of our racks employed storm-control...
    https://supportforums.cisco.com/message/3422955

    ReplyDelete
  10. Dmitri Kalintsev25 April, 2012 03:31

    - Rate limit BUM (well, maybe separately BU and M)
    - Explicitly shut down ports that are not supposed to be connected to anything

    There's another interesting case when Rx and Tx on the same optical port are plugged together. There's a command that detects looped ports and shuts them down (don't ask me what it is though).

    ReplyDelete
    Replies
    1. Dmitri, sounds like you are referring to UDLD, unidirectional link detection. This will err-disable a port when unidirectional traffic is detected when it is not expected, usually on fiber interfaces.

      Delete
  11. L3 switching throughout the campus, starting from the wiring closet switch. Done.

    ReplyDelete
    Replies
    1. LOL!! If only campus admins were willing to do that, it would solve so many problems........

      Delete
  12. "I have to mention that these access ports are configured with Root, BDPU guard as well as Portfast."

    Wrong. Well, it depends I guess if he configured portfast on port or globally. If configured on port then the port will always be in portfast mode but if configured globally then the port would revert out of its portfast mode if a BPDU is received. Another common mistake is to use both bpdu filter and root/bpdu guard on a port. The filter will take precedence and the guards will never be used.

    Brads solution is of course the best but maybe not always an option.

    Storm control and port-security should also be configured if possible.

    The ARP situation you mention is very interesting, I've seen this before, one single packet can wreak havoc.

    Another interesting situation can occur when customer creates loop. Say that you have customers connecting to a l2 network via some triple play box and they accidentally create a loop locally. Even if your backbone is protected from the backbones perspective this is not really a loop. It's just a lot of traffic coming from the customer which could be ARP or other traffic and this is very difficult to protect against unless you own the CPE or at least configure it.

    ReplyDelete
  13. In practice I have seen that if you configure loopguard, udld on the uplink ports of the switches, the core/distribution switch will shutdown the uplink port of the access switch where the loop was created, limiting the impact of the loop to the rest of the LAN.
    Probably this happens because the affected switch is in 100% cpu and cannot send BPDUs, udld packets

    ReplyDelete
  14. Brandon Mangold25 April, 2012 21:43

    Ivan, one small thought and I know you are certainly aware of this but "STP loop" is a misnomer. Spanning-Tree doesn't cause loops, it prevents them. The issue was actually caused by a bridging loop that spanning-tree didn't detect in time because of incompatible STP enhancements.

    STP is a great protocol ... for the 1990s.

    ReplyDelete
  15. Ivan Pepelnjak25 April, 2012 22:06

    You're absolutely right. Me and my sloppy writing ... have to get better.

    ReplyDelete
  16. I've similarly seen this when an IP phone will bridge two vlans if it's plugged into the switch twice (by a user of course ;) ) and the switchports aren't on the same vlan. Any aux voice vlan bpdu's won't be understood by the native data vlan (untagged).

    As for bpdu filter, it always helps to remember the different behavior if enabled globally vs. on the switchport directly; If you enable bpdu filter globally and portfast on a switchport, it will prevent that switchport from sending BPDUs. However, if that port receives a BPDU, the port loses it's portfast state, disables BPDUfilter and becomes a normal STP port. If you enable bpdu filter on a switchport directly, then that disables STP on that port -- ie. wont send BPDUs and it will ignore inbound BPDUs.

    ReplyDelete
  17. That stops the issue from propagating any further than the affected switch - all good. But you'd still have an issue on that switch, if two ports are looped, right?

    ReplyDelete
  18. I first read this as 'iPhone' and almost laughed :)

    ReplyDelete
  19. The affected switch is exposed only to broadcasts from the users on that switch, not the entire network, so the severity is vastly minimized. Worst case scenario one switch melts. Not the whole network.

    ReplyDelete
  20. Ivan, broadcast,multicast and unicast-unknown rate-limit / storm control is the best solution. Having a loop detection protocol like Extreme's ELRP is a great tool , but I don't know abou any (non-STP) based equivalent in IOS. Cheers!

    P.S. Also its better if you have a MRTG/RRD graphs, drawing the non-unicast pps per port.

    ReplyDelete
  21. I definitely agree. Usually it always starts and ends with a poor design.

    ReplyDelete
  22. no mdix auto

    What are the chances that a cross-over cable gets used in this or any scenario?

    ReplyDelete
  23. Regarding limiting the number of MAC addresses accepted on the switch port, what is a decent default for this value?

    We can't really set that to 1, as anyone experimenting with virtual machines will run into issues. Is setting the mac address limit to something like 20 still helpful? I would hope that setting it to 20 would be high enough to prevent causing issues with any legitimate activity, but still low enough to prevent loops.

    ReplyDelete
  24. Ivan Pepelnjak01 May, 2012 18:21

    Are people supposed to be experimenting with virtual machines in production campus environment (where the original question came from)? If the answer is "Yes", should you support their activities if they can't configure NAT on VMware workstation (or equivalent)?

    One MAC address is a bit tough, but more than a few doesn't make sense. Also, you can age secure MAC addresses if you wish.

    ReplyDelete
  25. Was it STP, improper/inadequate configuration or bad implementation/software that caused the loop? If it was bad implementation/software, then I don't see how any alternative can be safer unless STP developers are bottom of the pole. If improper/inadequate configuration, not sure if IP or SDN will safeguard a network from an uninformed operator. If indeed it's STP to blame, then it would be interesting to know what flaws remain with a robust STP implementation aside from ones that we cannot prove don't exist with the alternatives given the same business requirements (for example: in a vm-server only network there is no data plane learning at the vm (host) edge regardless of whether sdn, ip or stp).

    ReplyDelete
  26. Ivan Pepelnjak01 May, 2012 21:38

    In this particular case, I would say it's improper implementation.

    The switch should send several BPDUs on a portfast port after it transitions to enabled and delay forwarding for a second or two (to check for potential loops).

    Whatever alternate protocol (LLDP would be the best bet) could be used to detect loops in case someone thinks using BPDUs is not a good idea, and the port should not go into forwarding state until the basic are-we-looping checks have been performed.

    ReplyDelete
  27. People do a lot of things on a production campus environment when that campus environment happens to be a college campus :-)

    ReplyDelete
  28. On our campus , it is one and only one :-) (their are some exceptions, but they are tracked). And indeed, nobody needs to mess with VMs on his desktop (think:backtrack vm :-). go to the lab if you want to do that. What if multiple VMs are eating up all your dhcp addresses ?

    ReplyDelete
  29. This is so true. I experienced once a complete blackout in a Layer3 routed network (with redundancy though). It originated from a loop in a connected L2 network. The loop made the OSPF connection flap, updating about 3000 routes. This core switch then sent OSPF LSA updates to all its distribution switches (some 14.....) while these all sent the updates to the second core switch: result: he couldn't cope and crashed..while the first one was flapping.....nice..complete L3 meltdown...

    ReplyDelete
  30. Just tried the same - switch is blocking one of two interfaces with portfast enabled correctly.
    Maybe smth else is not mensioned, e.g. bpdufilter?

    ReplyDelete
  31. isnt it possible to use control plane policing to prevent 100/% cpu utilization.

    ReplyDelete
  32. Bring back DECnet. Anyone watched Radia Perlman's GoogleTalk video? She talks about how she lost her battle to have layer 3 networking to short-sighted management that thought no one would ever talk to other networks. Radia says in her own words that she thought spanning tree was a stopgap and never thought it would still be around. Take a look at what she is doing with TRILL.

    ReplyDelete
    Replies
    1. Welcome! You must be a pretty new reader - do search for TRILL and Radia on my blog ;)

      Delete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.