QFabric Part 4 – Spanning Tree Protocol

2021-01-03: Even though QFabric was an interesting architecture (and reverse-engineering it was a fun intellectual exercise), it withered a few years ago. Looks like Juniper tried to bite off too much.

Initial release of QFabric Junos can run STP only within the network node (see QFabric Control Plane post for more details), triggering an obvious question: “what happens if a server multihomed to a server node starts bridging between its ports and starts sending BPDUs?”. Some fabric solutions try to ignore STP (the diplomats would say “they are transparent to STP”) but fortunately Juniper decided to do the right thing.

This is the answer I got from Juniper after asking them about STP handling in the server nodes (the documentation is a bit vague):

When STP BPDUs hit a server node in QFabric, the BPDUs are trapped to the server node’s CPU. This CPU drops the BPDUs (does not forward it anywhere else) but then the system takes corrective measure by recognizing this as a misconfiguration (on the server side) and shutting down the corresponding port in the server node. The software is intelligent in that the port comes out of this automatic error-disable mode after a period of time in which no BPDUs have been received on the port (presumably after things have been rectified in the offending server).

QFabric server nodes thus implement automatic BPDU guard, but Juniper went a step further than Cisco: the port is reenabled if the connected device stops sending BPDUs. Cool.

However, it gets even better:

Not only does QFabric protect against server-side misconfigurations, it has in-built protection for cabling errors. So, for instance, if two ports of a server node were connected back-to- back, the system would detect that and disable both ports.

Extremely easy to do with LLDP (disable a port if you receive LLDP messages coming from yourself), but not commonly done. Good job.

By now you should be wondering why Juniper decided to implement BPDU guard from the start while Brocade VCS fabric is still struggling with it. The fundamental difference is in their MLAG implementation (more about MLAG): while you can configure Link Aggregation Group (LAG) within a single QFabric server (or network) node, Brocade can terminate LAG member on any device in the VCS fabric.

BPDUs are received only on one link in the LAG group; VCS fabric would thus need a mechanism for coordinated shutdown of LAG member ports, which it obviously still lacks.

4 comments:

  1. As far as I know Brocade implemented "BPDU drop" feature in NOS v2.1
  2. I see our little discussion last night inspired you :-).
  3. Cisco switches can also automatically recover err-disabled ports by BPDU guard after a defined timer. The feature is not enabled by default though...
  4. Why would a device send BPDUs if its port has been disabled? Is QFabric not showing link down to the offending port? That wouldn't be cool.
Add comment
Sidebar