Multi-chassis link aggregation (MLAG) basics
If you ask any Data Center networking engineer about his worst pains, I’m positive Spanning Tree Protocol (STP) will be very high on the shortlist. In a well-designed fully redundant hierarchical network where every device connects to at least two devices higher in the hierarchy, you lose half the bandwidth to STP loop prevention whims.
Of course you can try to dance around the problem:
- Push routing as far south (down is no longer popular with Data Center vendors) as you can ... and get a violent kickback from the server admins when they realize they cannot move the VMs at will anymore;
- Play with per-VLAN costs in PVST+ or MSTP, ensuring the need for constant supervision and magnificent job security;
- Deploy hot-off-the-press technologies like TRILL or FabricPath.
... or you could decide to use a more humble approach and deploy multi-chassis link aggregation.
Link Aggregation Basics
Link aggregation is an ancient technology that allows you to bond multiple parallel links into a single virtual link (from the STP perspective). With parallel links being replaced by a single link, STP detects no loops and all the physical links can be fully utilized.
For whatever reason, vendors like to use all other terms but link aggregation. You’ll hear about port channel, Etherchannel, link bonding or multi-link trunking.
Multi-Chassis Link Aggregation
Imagine you could pretend two physical boxes use a single control plane and coordinated switching fabrics ... then the links terminated on two physical boxes actually terminate within the same control plane and you could aggregate them. Welcome to the wonderful world of Multi-Chassis Link Aggregation (MLAG).
MLAG nicely solves the STP problem: no bandwidth is wasted and close-to-full redundancy is retained (make sure you always read the smallprint to understand what happens if the switch hosting the control plane fails).
Standardization? No, thanks
MLAG is obviously a highly desirable tool in your design/deployment toolbox ... but no vendor (including those that promote their standard-based open approach) has taken the pains to start the standardization effort. Proprietary technology lock-in is obviously still a lucrative approach.
The architectural approaches used by individual vendors are widely different: sometimes they completely separate the control plane from the switching matrix (high-end solution from Juniper), turn one of the control planes into half-comatose state (Cisco with VSS), use cooperative control planes (Cisco with vPC) or a stacking (preferably called distributed or intelligent) solution (Cisco, HP and Juniper).
More information
You’ll get a high-level overview of all virtualization, LAN reference architectures, multi-chassis link aggregation, port extenders and large-scale bridging (including TRILL and FabricPath) in my Data Center 3.0 for Networking Engineers webinar (buy a recording or yearly subscription).
The concept is easy to sell to management and it works very well - until something goes wrong that is - then all hell breaks loose. As you rightly said "understand what happens if the switch hosting the control plane fails" or even a downstream switch for that matter. A number of years ago I rolled it out on a large campus and, a few catastrophic failures later, remove it all.
Maybe I am a curmudgeon. I'd like to think that it's just healthy paranoia. :)
I feel that someone needs to go to bat for Nortel / Avaya, since they were doing SMLT way before Cisco's VSS came out to play.
http://www.trcnetworks.com/nortel/Data/Swiches/8600/pdfs/Split_multi_Link_Trunking.pdf
...and a highlight:
"Spanning Tree Protocol is disabled on the SMLT ports"
Not relying on STP for redundancy is one thing. Switching it off is a whole other thing. Deploy SMLT and hope that nobody ever loops two edge switches? No thanks.
Nortel has an extra twist: R(outed)SMLT, which is kind of like VPC + HSRP. Except there's no virtual router IP. Give your routers x.x.x.1 and x.x.x.2. Configure .1 as the gateway on end systems. If .1 fails, .2 assumes the dead router's address.
If there's a power outage, and only .2 boots back up? You're done. (though there's a write-status-to-nvram fix for this)
Come to think of it, it's a lot like vPC in that regard!
Basic deployment scenarios:
http://www116.nortel.com/docs/bvdoc/ene_tech_pubs/SMLT_and_RSMLT_Deployment_Guide_V1.1.pdf
Campus design guide (outlines link aggregation and loop detection deployment):
http://www142.nortelnetworks.com/mdfs_app/enterprise/TCGs/pdf/NN48500-575_2.0_Large_Campus_TSG.pdf
Configuration Guide for SMLT (includes some of the better technical information):
http://www142.nortelnetworks.com/mdfs_app/enterprise/ers8600/5.1/pdf/NN46205-518_02.01_Configuration-Link-Aggregation.pdf
Configuration Guide for RSMLT (both chassis share layer 3 information like OSPF/BGP state)
http://www142.nortelnetworks.com/mdfs_app/enterprise/ers8600/5.1/pdf/NN46205-523_02.02_Configuration_IP_Routing.pdf
http://h3c.com/portal/Technical_Support___Documents/Technical_Documents/
for all equipmen, then move to each swicth model if needed - seems IRF is supported on many models - 12k, 9500E, 7500E,58xx's. Could not see if IRF between different models is possible.
Fairly thin on
http://h3c.com/portal/Products___Solutions/Technology/IRF/
One of config guides can be found on
http://h3c.com/portal/download.do?id=1038276
There seem to be no restriction on STP, in fact it seems this supports even MPLS and many other features.. I haven't had a chance to lay my hands on any of these products, above is only by reading documents :-)
There was a prolonged marketing guy catfight @ networkworld between cisco's VSS and nortel SMLT/RSMLT.
Thanks!
STP loop prevention turns off (sets them to "blocking") half of the links in a dual-tree design displayed in the first diagram (blocked links are grayed out in the diagram). STP itself uses very little bandwidth.
http://blog.ioshints.info/2010/09/vmotion-elephant-in-data-center-room.html
Cisco VSS is the way of the future as it will do away with STP. ;)
BTW, VSS is just stacking-on-steroids; I prefer vPC.
1) Simple Loop Protection Protocol - SMLT switches send probes down their SMLT links to the closet switch. If you see the SLPP hello packet return on another interface you know that you have a loop condition
2) Control Plane limiting for Broadcast / Multicast traffic can be configured on a per port basis. I configure it on my SMLT links to down the interface and/or VLAN that is sending excess broadcast or multicast traffic
In both solutions, this is configured on both SMLT switches with the trigger thresholds set to different levels (5 SLPP hello probes vs 50) so that only one side of the SMLT should be disabled during a downstream loop.
Chassis (12500, 9500, 7500): Currently up to 2 devices can be clustered. Rumor is that will be increased to 4 in the future.
5820: Up to 9 devices can be clustered
5800, 5500, 5120: Up to 8 devices can be clustered
Certain mixed devices can be clustered using IRF, specifically 5820's and 5800's.
IRF Clustering is fully stateful, and supports basically all the regular switch featuresets. With regards to STP, an environment that uses all IRF on the Core or Aggregation devices can remove STP from the environment, and use LACP to provide path redundancy instead.
I believe that the MSR routers support IRF as well, but I haven't configured it myself.