STP and Expert Beginners

Maxim and myself continued our STP discussion and eventually agreed that while STP might not be the best protocol out there (remember: it had to run on Z80 CPU), it’s the only standardized thing that prevents nasty forwarding loops, prompting Maxim to ask another seemingly simple question:

What's so wrong with STP, that there are STP haters out there turning it off wherever they see it?

Welcome to the wonderful world of Expert Beginners.

Imagine you’re facing a problem where VMs get cut off from the network after a server-to-switch link is reestablished or where it takes workstations “forever” to connect to the network. You might even figure out that the switch port stays stuck in something called listening for half a minute. Googling around you find the listening state has something to do with something called STP, and you have no clue why you’d need something called STP in your network. Next step: googling for configuration commands that will turn off STP.

Please note that the VM-related behavior is effectively a broken VMware vSwitch design decision. SMB engineers shouldn’t have been forced to deal with stupidities like this one.

Alternatively, you might have configured portfast and BPDU guard in the past (based on yet another Google search result), and get hit by a Windows VM entering bridging mode. You want to stop all this nonsense for good, and the only way to do that is to turn off STP.

Do I have to mention that lack of BPDU handling is yet another vSwitch problem, and that VMware still doesn’t get it?

It would be exceedingly easy to blame the expert beginners making these mistakes, but in reality it’s sad when you figure out many pointy-haired bosses think their engineers need no training, and even worse to realize that many IT practitioners think “fake it till you make it” is not a bad idea.

8 comments:

  1. A few points

    1) Infrastructure personnel want every port hot and active. I don't buy 200 interfaces to only be able to use 100 of them.

    2) The admins you highlighted in your linked post should be fired. VPN between 2 VMs on the same network - stupid on display.

    3) I have watched switches with BPDU guard allow VMware to become the root bridge. That was interesting.

    4) No STP, no issue...
  2. G.8032 is yet another standardized L2 loop prevention protocol. In fact a lot of people are preferring it over xSTP.
    Replies
    1. Do any major vendors implement this?
    2. Yes. Calix, Cisco, Juniper and I'm sure many other vendors support ERPS (a.k.a. ITU-T G.8032). Another similar protocol is Cisco's REP. ERPS and REP can provide very fast convergence (~50ms) after a topology change in addition to loop prevention. Unlike spanning-tree, you can't just turn these protocols on and let them do their magic. The engineer needs to do some pre-planning to determine what network paths will be REP segments or ERPS domains; you know, the old trick called using the human brain. Upon hearing this last point, most folks toss the ERPS / REP whitepaper in the wastebasket.
    3. Yes. Calix, Cisco, Juniper and I'm sure many other vendors support ERPS (a.k.a. ITU-T G.8032). Another similar protocol is Cisco's REP. ERPS and REP can provide very fast convergence (~50ms) after a topology change in addition to loop prevention. Unlike spanning-tree, you can't just turn these protocols on and let them do their magic. The engineer needs to do some pre-planning to determine what network paths will be REP segments or ERPS domains; you know, the old trick called using the human brain. Upon hearing this last point, most folks toss the ERPS / REP whitepaper in the wastebasket.
    4. ERPS and EAPS-MRRP-like solutions are good but they only works with actual ring topologies.
  3. If this is anything like extreme networks eaps it isn't loop prevention as much as it is a very nice l2 redundancy protocol.

    You (network admin) provision primary and fail over paths through the network and it works nicely. Some nimrod bridges the network elsewhere and you're still toast.

    Eaps was very nice when i used it; reading the Cisco config guide leads me to think 8032 is just samesame.
  4. Some folks get STP and failure mixed up. In the old days if there was an stp related code bug or something where STP "caused" a loop or in a re-convergence STP did not work properly thus resulting in a loop then yes you can say it was an STP failure.

    However, if your network is not planned/designed to utilize STP properly and you have an STP convergence from a HW outage or something and STP works, blocks the ports to create a loop but that resultant topology is causing other issues with our servers etc it is not an STP issue. STP did its job.
Add comment
Sidebar