The Data Center Fabric architectures

Have you noticed how quickly fabric got as meaningless as switching and cloud? Everyone is selling you data center fabric and no two vendors have something remotely similar in mind. You know it’s always more fun to look beyond white papers and marketectures and figure out what’s really going on behind the scenes (warning: you might be as disappointed as Dorothy was). I was able to identify three major architectures (at least two of them claiming to be omnipotent fabrics).

Business as usual

Each networking device (let’s confuse everyone and call them switches) works independently and remains a separate management and configuration entity. This approach has been used for decades in building the global Internet and thus has proven scalability. It also has well-known drawbacks (large number of managed devices) and usually requires thorough design to scale well.

As the long-distance bridging fever spreads across data centers, business-as-usual approach has to replace STP with a more scalable protocol. TRILL and SPB (802.1aq) are the standard candidates; Cisco’s FabricPath is a proprietary alternative.

As long as access-layer switches are not TRILL/SPB-enabled, we need multi-chassis link aggregation (MLAG) to optimize bandwidth utilization. Those of you that worked with multichassis multilink PPP (MMP) in the past would probably agree with me that MLAG is also business as usual.

Example: Cisco’s Nexus 7000 with FabricPath and VPC, Brocade’s VCS

The Borg

In the Borg architecture (lovingly known as stacking on steroids) numerous switches decide to form a collective and elect the central brain (or outsource the brainy functions to an external device) that controls the whole hive. The cluster of devices appears as a single control- and management-plane entity to the outside world. It’s managed as a single device, has a single configuration and one set of routing adjacencies with the outside world.

Examples: stackable switches, Juniper’s virtual chassis, HP’s IRF, Cisco’s VSS

Like the original Borg, the switch cluster architectures cannot cope well with splits from the central brains. Cisco’s VSS reloads the primary switch when it detects a split brain scenario; HP’s IRF and Juniper’s virtual chassis disable the switches that lose cluster quorum.

While vendors like to talk about all-encompassing fabrics, the current implementations usually limit the number of high-end devices in the cluster to two (Cisco’s VSS, Juniper’s EX8200+XRE200 and HP’s IRF), reducing the Borg architecture to a Siamese twin one.

Furthermore, most implementations of the Borg architecture still limit the switch clusters to devices of the same type. As you cannot combine access- and core-layer switches into the same fabric, you still need MLAG between the access and the core layer.

At the moment, all Borg-like implementations are proprietary.

The Big Brother

Also known as controller-based fabric, this architecture uses dumb(er) switches that perform packet forwarding based on instructions downloaded from the central controller(s). The instructions might be control-plane driven (L3 routing tables downloaded into the switches) or data-plane driven (5-tuples downloaded into the switches to enable per-flow forwarding).

The controller-based approach is ideal for protocol- and architecture prototyping (which is the primary use case for OpenFlow) and architectures with hub-and-spoke traffic flow (wireless controllers), but has yet to be seen to scale in large any-to-any networks.

Anything else?

Is there an architecture that you cannot easily categorize as one of the above? Is there a standard in development for Borg architecture? Have you seen a scalable Big Brother architecture? Please write a comment!

19 comments:

  1. Just want to let you know, that Juniper just released theirs new ultimate dc architecture called QFabric (http://www.juniper.net/us/en/dm/datacenter/) which looks something like Cisco's CRS or Juniper's TX Matrix but for switches. Too bad, because it's juniper, there are not many docs about it available yet. Oh and you can find pretty decent descuss about it at j-nsp@pucknet

    ReplyDelete
  2. edit; http://www.juniper.net/us/en/dm/datacenter/ (previous link doesnt work because of the last bracket)

    ReplyDelete
  3. Ivan Pepelnjak03 March, 2011 09:28

    Thanks for the link. I've been tracking the Juniper Stratus project for a few months; unfortunately not much has been published but a few white papers (I would guess some of the details are still sketchy).

    Looking from the outside based on what's publicly available it seems that QFabric falls into the "Borg" category.

    ReplyDelete
  4. Obviously the lack of in depth tech docs on QFabric is a problem for me. But from what I've seen I do agree on the "borg" assessment. But it is still closer to the Siamese Twins. You still need to connect to single or dual "fabric switches" and then I believe there is also a controller module which handles all of the management. I'm also guessing that you can only have two of these in a QFabric "domain"?

    It sounds like a good architecture, but due to the lack of in-depth "how this all works" documents I'm still skeptical.

    ReplyDelete
  5. Ivan, I've found this paper to be the most useful to understanding how QF is designed: http://www.juniper.net/us/en/local/pdf/whitepapers/2000380-en.pdf

    It's not clear that QF falls directly into either of your categories. It will depend on exactly how the QF/D interacts with the QF/N's and QF/I's. I posted a list of questions I've got on the topic for juniper here: http://irq.tumblr.com/post/3612289570/mulling-qfabric

    ReplyDelete
  6. Anton Yurchenko03 March, 2011 23:49

    From my conversations with Juniper, QF/D was described as serving a purely control plane role. Was also compared to a route-reflector.

    I like your list of questions Aneel.

    I asked some of them from Juniper, here is what they shared(in random order):

    1. Forklift upgrade, or new installs make the most sence
    2. Protocols that are running between the elements are OSPF based, but proprietary
    3. No over-subscription in a normal setup(I would imagine there are failure scenarios where it still is the case)
    4. Can operate on DSCP/CoS values of the packets ingressing its ports.
    5. QF/I a big fabric boxes, not many smart things happen there

    ReplyDelete
  7. Pavel Skovajsa04 March, 2011 15:25

    So, maybe I am looking on this wrong but isn't The Borg model just a trivial example of The Big Brother model?

    In VSS case the active sup populates the (TCAM) tables on the local DFC cards, and also on remote DFC cards - one of them is the remote PFC card on the remote sup.

    ReplyDelete
  8. Ivan Pepelnjak04 March, 2011 17:13

    You're right, I need to change the definition a bit. Control-plane Big Brother = Borg. Have to limit the "Big Brother" model to the data plane (which makes sense anyway, as the controller has to inspect and approve every flow).

    ReplyDelete
  9. Igor Skobkarev08 March, 2011 18:19

    Well, don't you think that Brocade VCS represents a third kind of Ethernet fabric ?

    This is what they say here:
    http://www.brocade.com/downloads/documents/white_papers/Introducing_Brocade_VCS_WP.pdf
    ...
    With VCS technology, all configuration and destination information is automatically distributed to each member switch in the fabric. For example, when a server connects to the fabric for the first time, all switches in the fabric learn about that server. In this way, fabric switches can be added or removed and physical or virtual servers can be relocated—without the fabric requiring manual reconfiguration
    ....
    And, unlike switch stacking technologies, the Ethernet Fabric is masterless. This means that no single switch stores configuration information or controls fabric operations

    ReplyDelete
  10. Ivan Pepelnjak08 March, 2011 19:02

    According to Brocade's documents, they use TRILL-based forwarding and FSPF (which will be replaced by IS-IS) as the routing protocol, so from the control- and data-plane perspective VCS is definitely "business as usual".

    What they do on the management plane (shared configuration) is another mystery that needs to be explored ;)

    ReplyDelete
  11. Igor Skobkarev08 March, 2011 20:31

    My biggest concern in what comes to the datacenter networks is a single point of failure. The VSS/IRF cluster is a single device from the control/management prospective and under certain conditions f.e. when traffic becomes CPU switched (have had this with VSS) the full cluster meltdown follows immediately. funny enough, it is almost impossible to test it in the lab, even under the load.

    After going through the paper is appears to me that only Cisco vPC, Brocade VSS on VDX 6700 and Brocade/Foundry MCT run independent switches with some _proprietary_ protocol between to provide for MLAG functionality. Correct?

    ReplyDelete
  12. Igor Skobkarev08 March, 2011 20:32

    TYPO- ...Brocade VCS not VSS :(

    ReplyDelete
  13. Ivan Pepelnjak09 March, 2011 10:38

    You're right. Brocade MLAG functionality has to be proprietary as TRILL does not address it.

    The only documented MLAG protocol that I'm aware of is the (future) Juniper's BGP MPLS-based MAC VPN http://tools.ietf.org/html/draft-raggarwa-mac-vpn-01

    ReplyDelete
  14. If i understand your architecture models correctly, Cisco Nexus 5000 with the Nexus 2000 fabric extender would be a good example for the Big Brother architecture, right?

    ReplyDelete
  15. Ivan Pepelnjak16 March, 2011 09:50

    Not exactly, in a controller-based LAN (ex: OpenFlow or Cisco's ancient MLS), the controller inspects only the initial packets in the flow. In NX5K+FEX, all the traffic goes through the NX5K, so it's more like an octopus ;)

    ReplyDelete
  16. Can't wait for more QFabric info! Really appears to be a game changer, as it does things that nobody else does. Cisco is really going to have to work hard to catch up with their offerings!

    ReplyDelete
  17. Drunken_and_Grumpy_Pole19 April, 2011 23:41

    Oh, you Ethernet people... :)
    This is long-time established feature in Fibre Channel, something described as Distributed Services.
    Basically, switches in FC network share information (who's logged in, what are their capability and more).
    Nothing really big - obviously Brcd being FC-company knows how to use it to their advantage.

    ReplyDelete
  18. But the span of FC network and the number of FC switches (all time max 256 which in reality is far less) is typically far less in numbers than those in Ethernet world (even before adding virtual ones). Thus FC based solution is not really proven for such environments.

    ReplyDelete
  19. Ivan Pepelnjak04 June, 2011 07:30

    BTW, in the meantime the "management plane mystery" has been solved - they don't do anything special yet, every box is configured and managed independently.

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.