It’s clear that major hypervisor vendors consider MAC-over-IP to be the endgame for virtual networking; they’re still squabbling about the best technology and proper positioning of bits in various headers, but the big picture is crystal-clear. Once they get there (solving “a few” not-so-trivial problems on the way), and persuade everyone to use virtual appliances, the network will have to provide seamless IP transport, nothing more.
At that moment, large-scale bridging will finally become a history (until the big layer pendulum swings again) and one has to wonder whether there’s any data center future for TRILL, SPB, FabricPath and other vendor-specific derivatives.
How large is Large-Scale anyway?
Most customers want to have large-scale bridging solutions to support VM mobility. The current state of the hypervisor market (vSphere 5, June 2012) is as follows:
- 32 hypervisor hosts in a high-availability cluster (where the VMs move automatically based on load changes);
- 350 hosts in a virtual distributed switch (vDS); you cannot move a running VM between two virtual distributed switches, and having hundreds of vSphere hosts with the classic vSwitch is a management nightmare.
The maximum reasonable size of a large-scale bridging solution is thus around 700 10GE ports (I don’t think there are too many applications that are able to saturate more than two 10GE server uplinks).
Assuming a Clos fabric with two spine nodes and 3:1 oversubscription at leaf nodes, you need ~120 non-blocking line-rate 10GE ports on the spine switch (or four spine switches with 60 10GE ports or 16 40GE ports) ... the only requirement is that STP should not block any links, which can be easily solved with multi-chassis link aggregation. There are at least five major data center switch vendors with products matching these requirements.
There are scenarios where you really need larger bridging domains (example: link customer’s physical servers with their VMs for thousands of customers); if your data center network is one of them, you should talk to Nicira or wait for Cisco's VXLAN-to-VLAN gateway. On the other hand, implementing large-scale bridging to support stretched HA clusters doesn’t make much sense.
Is there any need for TRILL?
As you’ve seen in the previous section, you can build layer-2 fabrics that satisfy reasonable real-life requirements with leaf-and-spine architecture using multi-chassis link aggregation (MLAG) between leaf and spine switches. Is TRILL thus useless?
Actually, it’s not. Every single MLAG solution is a brittle kludge that can potentially result in a split-brain scenario, and every MLAG-related software bug could have catastrophic impact (VSS or VPC bugs anyone?). It’s much simpler (and more reliable) to turn the layer-2 network into a somewhat-routed network transporting MAC frames and rely on time-proven features of routing protocols (IS-IS in this particular case) to work around failures. From the technology perspective, TRILL does make sense until we get rid of VLANs and move to MAC-over-IP.
TRILL is so awesome some vendors want to charge extra for the privilege to use it
Compared to MLAG, TRILL reduces the network complexity and makes the layer-2 fabric more robust ... but I’m not sure we should be paying extra for it (I’m looking at you, Cisco). After all, if a particular vendor’s solution results in a more stable network, I just might buy more gear from the same vendor, and reduce their support costs.
Charging a separate license cost for TRILL (actually FabricPath) just might persuade me to stick with MLAG+STP (after all, it does work most of the time), potentially making me unhappy (when it stops working). I might also start considering alternate vendors, because every single vendor out there supports MLAG+STP designs.
Server-facing MLAG – the elephant in the room
If your data center runs exclusively on VMware, you don’t need server-facing link aggregation; after all, the static LAG configuration on the switch and IP-based hash on vSphere is a kludge that can easily break.
Unfortunately, Brocade is the only vendor that integrated TRILL-like VCS Fabric with MLAG. You can use MLAG with Cisco’s FabricPath, but they’re totally separated – MLAG requires VPC. We haven’t really gained much if we’ve got rid of MLAG on inter-switch links by replacing STP with TRILL, but remained stuck with VPC to support MLAG on server-to-switch links.
Assuming you’re not yet ready to go down the VXLAN/NVGRE path, TRILL is definitely a technology worth considering, just to get rid of STP and MLAG combo. However, be careful:
- Unless you’re running VMware vSphere or other similarly network-impaired operating system that’s never heard of LACP, you’ll probably need multi-chassis LAG for redundancy reasons ... and thus Brocade VCS Fabric is the only option if you don’t want to remain stuck with VPC.
- Even when running TRILL in the backbone, I would still run the full range of STP features on the server-facing ports to prevent potential bridging loops created by clueless physical server or VM administrators (some people actually think it makes sense to bridge between two virtual NICs in a regular VM). Cisco is the only vendor offering TRILL-like fabric and STP in the same product at the same time (while VDX switches from Brocade work just fine with STP, they start pretending it doesn’t exist the moment you configure VCS fabric).
Hmm, seems like there are no good choices after all. Maybe you should still wait a little bit before jumping head-on into the murky TRILL waters.
You know I have to mention a few webinars at the end of every blog post. Here’s the list of the most relevant ones (you get all of them with the yearly subscription)