The Data Center Fabric architectures
Have you noticed how quickly fabric got as meaningless as switching and cloud? Everyone is selling you data center fabric and no two vendors have something remotely similar in mind. You know it’s always more fun to look beyond white papers and marketectures and figure out what’s really going on behind the scenes (warning: you might be as disappointed as Dorothy was). I was able to identify three major architectures (at least two of them claiming to be omnipotent fabrics).
Business as usual
Each networking device (let’s confuse everyone and call them switches) works independently and remains a separate management and configuration entity. This approach has been used for decades in building the global Internet and thus has proven scalability. It also has well-known drawbacks (large number of managed devices) and usually requires thorough design to scale well.
As the long-distance bridging fever spreads across data centers, business-as-usual approach has to replace STP with a more scalable protocol. TRILL and SPB (802.1aq) are the standard candidates; Cisco’s FabricPath is a proprietary alternative.
As long as access-layer switches are not TRILL/SPB-enabled, we need multi-chassis link aggregation (MLAG) to optimize bandwidth utilization. Those of you that worked with multichassis multilink PPP (MMP) in the past would probably agree with me that MLAG is also business as usual.
Example: Cisco’s Nexus 7000 with FabricPath and VPC, Brocade’s VCS
In the Borg architecture (lovingly known as stacking on steroids) numerous switches decide to form a collective and elect the central brain (or outsource the brainy functions to an external device) that controls the whole hive. The cluster of devices appears as a single control- and management-plane entity to the outside world. It’s managed as a single device, has a single configuration and one set of routing adjacencies with the outside world.
Examples: stackable switches, Juniper’s virtual chassis, HP’s IRF, Cisco’s VSS
Like the original Borg, the switch cluster architectures cannot cope well with splits from the central brains. Cisco’s VSS reloads the primary switch when it detects a split brain scenario; HP’s IRF and Juniper’s virtual chassis disable the switches that lose cluster quorum.
While vendors like to talk about all-encompassing fabrics, the current implementations usually limit the number of high-end devices in the cluster to two (Cisco’s VSS, Juniper’s EX8200+XRE200 and HP’s IRF), reducing the Borg architecture to a Siamese twin one.
Furthermore, most implementations of the Borg architecture still limit the switch clusters to devices of the same type. As you cannot combine access- and core-layer switches into the same fabric, you still need MLAG between the access and the core layer.
At the moment, all Borg-like implementations are proprietary.
The Big Brother
Also known as controller-based fabric, this architecture uses dumb(er) switches that perform packet forwarding based on instructions downloaded from the central controller(s). The instructions might be control-plane driven (L3 routing tables downloaded into the switches) or data-plane driven (5-tuples downloaded into the switches to enable per-flow forwarding).
The controller-based approach is ideal for protocol- and architecture prototyping (which is the primary use case for OpenFlow) and architectures with hub-and-spoke traffic flow (wireless controllers), but has yet to be seen to scale in large any-to-any networks.
Is there an architecture that you cannot easily categorize as one of the above? Is there a standard in development for Borg architecture? Have you seen a scalable Big Brother architecture? Please write a comment!
Looking from the outside based on what's publicly available it seems that QFabric falls into the "Borg" category.
It sounds like a good architecture, but due to the lack of in-depth "how this all works" documents I'm still skeptical.
It's not clear that QF falls directly into either of your categories. It will depend on exactly how the QF/D interacts with the QF/N's and QF/I's. I posted a list of questions I've got on the topic for juniper here: http://irq.tumblr.com/post/3612289570/mulling-qfabric
I like your list of questions Aneel.
I asked some of them from Juniper, here is what they shared(in random order):
1. Forklift upgrade, or new installs make the most sence
2. Protocols that are running between the elements are OSPF based, but proprietary
3. No over-subscription in a normal setup(I would imagine there are failure scenarios where it still is the case)
4. Can operate on DSCP/CoS values of the packets ingressing its ports.
5. QF/I a big fabric boxes, not many smart things happen there
In VSS case the active sup populates the (TCAM) tables on the local DFC cards, and also on remote DFC cards - one of them is the remote PFC card on the remote sup.
This is what they say here:
With VCS technology, all configuration and destination information is automatically distributed to each member switch in the fabric. For example, when a server connects to the fabric for the first time, all switches in the fabric learn about that server. In this way, fabric switches can be added or removed and physical or virtual servers can be relocated—without the fabric requiring manual reconfiguration
And, unlike switch stacking technologies, the Ethernet Fabric is masterless. This means that no single switch stores configuration information or controls fabric operations
What they do on the management plane (shared configuration) is another mystery that needs to be explored ;)
After going through the paper is appears to me that only Cisco vPC, Brocade VSS on VDX 6700 and Brocade/Foundry MCT run independent switches with some _proprietary_ protocol between to provide for MLAG functionality. Correct?
The only documented MLAG protocol that I'm aware of is the (future) Juniper's BGP MPLS-based MAC VPN http://tools.ietf.org/html/draft-raggarwa-mac-vpn-01
This is long-time established feature in Fibre Channel, something described as Distributed Services.
Basically, switches in FC network share information (who's logged in, what are their capability and more).
Nothing really big - obviously Brcd being FC-company knows how to use it to their advantage.