Most recently launched data center switches use the Trident 2 chipset, and yet we know almost nothing about its capabilities and limitations. It might not work at linerate, it might have L3 lookup challenges when faced with L2 tunnels, there might be other unpleasant surprises… but we don’t know what they are, because you cannot get Broadcom’s documentation unless you work for a vendor who signed an NDA.
Interestingly, the best source of Trident 2 technical information I found so far happens to be the Cisco Live Nexus 9000 Series Switch Architecture presentation (BRKARC-2222). Here are a few tidbits I got from that presentation and Broadcom’s so-called datasheet.
I often get three questions about TRILL: Are the TRILL standards finalized? Has anyone implemented it? Is it useful?
Short answers: Yes, No, Maybe (although I remain unconvinced).
It’s clear that major hypervisor vendors consider MAC-over-IP to be the endgame for virtual networking; they’re still squabbling about the best technology and proper positioning of bits in various headers, but the big picture is crystal-clear. Once they get there (solving “a few” not-so-trivial problems on the way), and persuade everyone to use virtual appliances, the network will have to provide seamless IP transport, nothing more.
At that moment, large-scale bridging will finally become a history (until the big layer pendulum swings again) and one has to wonder whether there’s any data center future for TRILL, SPB, FabricPath and other vendor-specific derivatives.
Yesterday I described how the limited flow setup rates offered by most commercially-available switches force the developers of production-grade OpenFlow controllers to drop the microflow ideas and focus on state abstraction (people living in a dreamland usually go in a totally opposite direction). Before going into OpenFlow-specific details, let’s review the existing forwarding state abstraction technologies.
The last week has been “interesting” – I created the draft slide deck for the Data Center Fabric Architectures webinar on November 16th (register here) and sent the relevant slides to all vendors mentioned in the presentation to give them a chance to fix my errors – every vendor got at least the scorecard describing my understanding of their solution.
If you’re not working for a data center fabric vendor (in which case please read the other today’s post), you’ll probably enjoy the excellent analogy Ethan Banks made after reading my TRILL-over-WAN post:
Think of a network topology like a road map. There's boulevards, major junction points, highways, dead ends, etc. Now imagine what that map looks like after it's been nuked from orbit: flat. Sure, we blew up the world, but you can go in a straight line anywhere you want.
... and don’t forget to be nice to the people asking for inter-DC VM mobility ;)
Remember how I foretold when TRILL first appeared that someone would be “brave” enough to reinvent WAN bridging and brouters that we so loved to hate in the early 90’s? The new wave of the WAN bridging craze has started: RFC 6361 defines TRILL over PPP (because bridging-over-PPP is just not good enough). Just because you can doesn’t mean you should.
It seems that most networking vendors consider the Flat Earth architectures the new bonanza ... everyone is running to join the gold rush, from Cisco’s FabricPath and Brocade’s VCS to HP’s IRF and Juniper’s upcoming QFabric. As always, the standardization bodies are following the industry with a large buffet of standards to choose from: TRILL, 802.1ag (SPB), 802.1Qbg (EVB) and 802.1bh (Port extenders).
A tweet from J Michel Metz has alerted me to a “Why TRILL won't work for data center network architecture” article by Anjan Venkatramani, Juniper’s VP of Product Management. Most of the long article could be condensed in two short sentences my readers are very familiar about: Bridging does not scale and TRILL does not solve the traffic trombone issues (hidden implication: QFabric will solve all your problems)... but the author couldn’t resist throwing “FCoE over TRILL” bone into the mix.
A while ago I described what it takes to integrate TRILL backbone with the legacy equipment running Spanning Tree Protocol (STP). Unfortunately, Brocade decided to use a non-standard approach to BPDU handling when implementing their TRILL-like VCS fabric. VDX switches running in fabric mode can either drop incoming BPDU frames or transport them transparently across the fabric to other edge ports. Although VDX switches support STP, RSTP and MSTP (as well as RootGuard and BPDUGuard) when configured as standalone switches, the STP processing is disabled when you configure fabric mode; VCS fabric looks like a huge shared LAN segment to the end hosts and core switches.
2013-03-31: Network OS 4.0 and above supports Distributed Spanning Tree (DiST), for more details read this blog post.
Every Data Center fabric technology has to integrate seamlessly with legacy equipment running the venerable Spanning Tree Protocol (STP) or one of its facelifted incarnations (for example, RSTP or MST). The alternative, called rip-and-replace when talking about other vendors’ boxes or synchronized upgrade when promoting your wares (no, I haven’t heard it yet, but I’m sure it’s coming), is simply indigestible to most data center architects.
TRILL and Cisco’s proprietary Fabric Path take a very definitive stance: the new fabric is the backbone of the network routing TRILL-encapsulated layer-2 frames across bridged segments (TRILL) or contiguous backbone (Fabric Path). Both architectures segment the original STP domain into small chunks at the edges of the network as shown in the following figure:
The emerging Ethernet bridging technologies have been hyped to an extent where the lines between them completely blurred, resulting in statements like “we need DCB and TRILL for FCoE”. Actually, none of that is true, but let’s focus on DCB and TRILL first.
Someone had a “borderless data center mobility” dream a few years ago and managed to infect a few other people, resulting in a networking industry pandemic that is usually exhibited by the following “facts”:
- Unhindered Virtual Machine mobility across the globe is the absolute prerequisite for any business agility. Wrong. There are other field-proven solutions and although inter-site VM mobility has been demonstrated, it’s still a half-baked idea and has many caveats.
- You can only reach that Holy Grail by extending your layer-2 domains across vast distances. Totally wrong. It would be easier to fix L3 routing and signaling protocols than to invent completely new technologies trying to fix L2 problems. Users of Microsoft NLB are might disagree ... in which case I wish them luck in scaling their architecture.
- Large-scale bridging is absolutely mandatory if you want to build cloud solutions with tens of thousands of servers. Not sure about that. Google is there, Facebook, Twitter and Amazon are (at least) close, large web hosting providers have been around for years ... and yet they somehow managed to survive with existing technologies and good network designs.
Just today XKCD published a very relevant comic, so I can skip my usually sarcastic comments and focus on the plethora of emerging large-scale bridging standards and implementations. Let’s walk through them:
A comment by Brad Hedlund has sent me studying the differences between TRILL and 802.1aq and one of the first articles I’ve stumbled upon was a nice overview which claimed that the protocols are very similar (as they both use IS-IS to select shortest path across the network). After studying whatever sparse information there is on 802.1aq (you might want to read Greg Ferro’s fascination with IEEE paywall) and the obligatory headache, I’ve figured out that the two proposals have completely different forwarding paradigms. To claim they’re similar is the same as saying DECnet phase V and MPLS Traffic Engineering are similar because they both use IS-IS.
Based on the readers’ comments on my “Bridging and Routing: is there a difference?” post (thanks you!), here are a few more differences between bridging and routing:
Cost. Layer-2 switches are almost always cheaper than layer-3 (usually combined layer-2/3) switches. There are numerous reasons for the cost difference, including:
- Mass-market low-end switches are usually simple bridges. Low-cost high-speed bridging silicon is thus readily available.
- MAC address lookup is simpler than IP table lookup and easier to implement in silicon. You need simple CAM (Content Addressable Memory) to perform MAC address lookup and TCAM (Ternary CAM) with additional output logic to perform longest-IP-prefix matching.
- Layer-3 switches are expected to perform IP packet filtering. Implementing access lists in hardware (usually with even larger TCAM) is expensive.