A while ago I described what it takes to integrate TRILL backbone with the legacy equipment running Spanning Tree Protocol (STP). Unfortunately, Brocade decided to use a non-standard approach to BPDU handling when implementing their TRILL-like VCS fabric. VDX switches running in fabric mode can either drop incoming BPDU frames or transport them transparently across the fabric to other edge ports. Although VDX switches support STP, RSTP and MSTP (as well as RootGuard and BPDUGuard) when configured as standalone switches, the STP processing is disabled when you configure fabric mode; VCS fabric looks like a huge shared LAN segment to the end hosts and core switches.
2013-03-31: Network OS 4.0 and above supports Distributed Spanning Tree (DiST), for more details read this blog post.
This approach to handling STP might make perfect sense from an architectural perspective, more so to pure VMware shops (vSwitch does not run STP and performs split-horizon bridging by design). Unfortunately, everyone else cannot ignore STP as it’s way too easy to configure bridging between redundant NICs in servers running any other operating system. Robust data center networks thus use BPDU guard on edge ports to block any server port acting as a bridge. At the very minimum, you should use root guard on server ports to prevent STP topology changes triggered by misconfigured STP process running on a server. It seems most vendors are in agreement BPDU Guard and RootGuard are crucial to stable L2 network operation; Brocade claims they are key features [...] to protect network spanning tree operation.
Never configure a PortFast port without BPDU guard or you’ll soon discover that it takes a single click in Hyper-V to melt down your network.
VCS fabric as core transport
If you’re deploying Brocade’s VCS fabric as a Data Center network core or as a transport layer between top-of-rack switches (or between switches embedded in blade enclosures) you’re relatively safe: the access switches (ToR or embedded) should kick out the rogue servers/virtual machines; running STP across the VCS fabric is just an extra disaster-preventing precaution.
VCS fabric in access layer
You probably should not connect servers directly to VCS fabric due to the way it handles BPDUs. VCS fabric’s current implementation gives you only two dismal options:
Ignore BPDUs on the edge ports, risking the stability of the whole data center (see the above warning).
Transport BPDUs across VCS fabric to the core switches. Unfortunately, the core switches are not the right place to implement STP protection. You could decide to configure BPDU guard or root guard on the core switches, in which case you risk cutting off the whole VCS fabric (and all servers connected to it) if a single server starts sending BPDUs. Or you could do nothing, exposing the core switches to the whims of STP configurations of individual servers, allowing any rogue server to bring down the whole network with repetitive bogus topology changes.
While the shortest-path bridging standards (802.1aq/SPB and TRILL) seem convoluted and overloaded with features, most of those features made it into the standard for a good reason. Cutting corners when implementing standards is always a long-term problem.
You’ll learn more about modern data center architectures in my Data Center 3.0 for Networking Engineers webinar (buy a recording). The details of VMware networking (including the vSwitch behavior) are described in VMware Networking Deep Dive webinar (register here). Both webinars are also part of the yearly subscription package.
Update 2011-06-12: After lengthy e-mail exchange with Jon Hudson, Global Solutions Architect @ Brocade Networks, I slightly reworded two sentences to ensure nobody would assume I "imply a willful choice" or worse. While editing the post, I also included information on STP support by VDX switches running in non-fabric mode and the fact that multiple vendors (including Brocade) support "a Cisco solution to a problem" (BPDU Guard and Root Guard). In fact, both features are supported by VDX switches running in non-fabric mode.
In my personal opinion (I hope I'm still entitled to one), I would wait for Brocade to implement Appointed Forwarder in their TRILL code and enable full set of STP features (already present in the code) in VCS fabric mode before deploying VCS fabric in my network.