If you ask any Data Center networking engineer about his worst pains, I’m positive Spanning Tree Protocol (STP) will be very high on the shortlist. In a well-designed fully redundant hierarchical network where every device connects to at least two devices higher in the hierarchy, you lose half the bandwidth to STP loop prevention whims.
Of course you can try to dance around the problem:
- Push routing as far south (down is no longer popular with Data Center vendors) as you can ... and get a violent kickback from the server admins when they realize they cannot move the VMs at will anymore;
- Play with per-VLAN costs in PVST+ or MSTP, ensuring the need for constant supervision and magnificent job security;
- Deploy hot-off-the-press technologies like TRILL or FabricPath.
... or you could decide to use a more humble approach and deploy multi-chassis link aggregation.
Link Aggregation Basics
Link aggregation is an ancient technology that allows you to bond multiple parallel links into a single virtual link (from the STP perspective). With parallel links being replaced by a single link, STP detects no loops and all the physical links can be fully utilized.
For whatever reason, vendors like to use all other terms but link aggregation. You’ll hear about port channel, Etherchannel, link bonding or multi-link trunking.
Multi-Chassis Link Aggregation
Imagine you could pretend two physical boxes use a single control plane and coordinated switching fabrics ... then the links terminated on two physical boxes actually terminate within the same control plane and you could aggregate them. Welcome to the wonderful world of Multi-Chassis Link Aggregation (MLAG).
MLAG nicely solves the STP problem: no bandwidth is wasted and close-to-full redundancy is retained (make sure you always read the smallprint to understand what happens if the switch hosting the control plane fails).
Standardization? No, thanks
MLAG is obviously a highly desirable tool in your design/deployment toolbox ... but no vendor (including those that promote their standard-based open approach) has taken the pains to start the standardization effort. Proprietary technology lock-in is obviously still a lucrative approach.
The architectural approaches used by individual vendors are widely different: sometimes they completely separate the control plane from the switching matrix (high-end solution from Juniper), turn one of the control planes into half-comatose state (Cisco with VSS), use cooperative control planes (Cisco with vPC) or a stacking (preferably called distributed or intelligent) solution (Cisco, HP and Juniper).
You’ll get a high-level overview of all virtualization, LAN reference architectures, multi-chassis link aggregation, port extenders and large-scale bridging (including TRILL and FabricPath) in my Data Center 3.0 for Networking Engineers webinar (buy a recording or yearly subscription).