Should We Use Redundant Supervisors?

I had a nice chat with Doug Gourlay from Arista during the Interop Las Vegas and he made an interesting remark along the lines of “in leaf-and-spine fabrics it doesn’t make sense to use redundant supervisors in switches – they cause more problems than they solve.”

As always, in the end it all depends on your environment and use case, but he definitely has a point; good engineering always works better than a heap of kludges.

Going Back 20 Years

In early 90s we didn’t have redundant supervisors (or CPU boards or route processors or whatever they were called). Each box had a single CPU and if you wanted to build a resilient network you did a proper network design – for each function you needed in the network (example: core or aggregation layer) you used two (or more) boxes connected with reasonably fast-converging routing protocol. Problem solved.

We started building Internet using those principles, and smart people still do that. Surprisingly, Internet worked (and continues to work) without intra-box redundancy mechanisms.

Carrier Grade Marketing

Are you old enough to remember the original Internet (and dot.com) bubble? In those days every telecom thought they could find gold nuggets lying in plain sight on the magical plains of planet Internet. Unfortunately, they forgot to leave their old mentality at home when joining the gold rush.

Voice switches (the gear telecom engineers were used to deal with in those days) had all sorts of redundancy. After all, you cannot connect a dumb phone to two voice switches, so you better have a switch that can never crash. Internet is slightly different – good IP-based architectures always relied on smart edge and simple core (virtualization vendors needed more than 10 years to figure that out, but that’s beyond the point).

Regardless of proven facts and working best practices, telecoms wanted to have box-level feature parity between what they knew and what they planned to buy, and networking vendors delivered what the customer wanted – more and more complex boxes with built-in hardware redundancy and all sorts of failover mechanisms, including SSO, NSF and ISSU.

With all the great redundancy features being implemented to improve vendors’ chances in carrier market, it was time to reap the benefits of that investment. Next stop: enterprise networks.

Is It All Just Hype?

It depends. You can always implement redundant or resilient solutions. Resilient is usually better than redundant, but there are cases where boxes with redundant internal architecture come handy due to the tradeoffs you might have to make to implement a resilient design.

Example: campus networks. In campus networks you cannot afford to lose a whole building, but it might be OK to lose half a floor.

A resilient design would use two core switches and an access switch (or more) per floor. Ideally they’d run a fast-converging routing protocol.

In reality, you’re often asked to implement bridging across a whole building, and as there are no standard layer-2 fabric solutions, you have to use spanning tree (and lose half the uplink bandwidth in the process) or MLAG (which increases the complexity of your design). Also, managing tons of small switches manually (because the network management software almost never does everything its vendor has promised) becomes a royal pain.

A core switch with redundant architecture definitely seems like a better option, but do keep in mind that you’ve just traded visible complexity that you understand and are thus able to troubleshoot, with hidden complexity.

Data Center Environments

Data center networks are always considered to be mission critical, and it makes perfect sense to buy an insurance policy in form of redundant hardware architecture, right? Well, no.

Unless you’re Google or Facebook and can afford to lose 50 servers on a ToR switch reload you probably have dual-homed servers connected to two ToR switches, right? Losing one of those switches hurts (you lose half the bandwidth), but not too much. No wonder no ToR switches have redundant supervisors (Juniper’s QFX 5100 is an interesting semi-exception: they can run two copies of Junos as virtual machines on the same CPU).

Losing one of two core switches is a major disaster – half the core switching bandwidth is lost. How do you cope with that? You buy switches with two supervisor boards and hope the internal hardware and software redundancy works as expected.

I’ve seen data center network designs with a single core switch (“we don’t need two because we bought a fully redundant box”). Don’t ever do that, you’ll end up with a redundantly engineered single point of failure.

Now imagine you replace two humongous core switches with a spine layer having 4 or 8 fixed or modular switches. All of a sudden losing a spine switch doesn’t hurt that much. Welcome to the wonderful world of proper network design ;)

More Details

Want to know how to design a modern data center fabric? Watch these webinars:

Latest blog posts in High Availability Switching series

7 comments:

  1. This comment has been removed by the author.
  2. How many times a year would customers use HA using Redundant Supervisors(RS) ?. Hardly once or twice an year ?. Assuming good engineering, why would one build a complex solution that is hardly used and is expensive to develop/test/maintain/support. Add GR to the complexity. In reality, HA using RS is one of weak link for most vendors.

    If one were to use BRCM T2 at the ToR with dual homed server, should you care about bandwidth ?.
  3. Good post and i agree with a lot if it.

    Re campus designs, where do you see things like VSS/MEC in what you say. It is a pair of core switches and you can get the full bandwidth from your access layer uplinks but only one of the switches actually runs a control plane although they both forward.

    Sort of halfway between a pair of core switches and a single switch ?

    ** VSS quad sups though really doesn't seem to offer any significant advantages and introduces more complexity.

    In terms of managing multiple switches, again using Cisco as an example, the 6880ia FEX solution would help alleviate some of those concerns.

    Where i do slightly disagree is when you say ideally you would use L3 from the access layer. Before things like VSS etc. you were totally reliant on STP and the argument to use L3 was quite compelling. But now i am not so sure.

    The problem is L2 is just more flexible in terms of what you may need further down the line. Your example of needing to span a vlan across multiple access layer switches is a great one because it always seems to crop up. Yes you could span a vlan over a L3 network but that adds additional complexity and if you are not careful your network starts to look a bit of a kludge.

    Extending VRFs back to a L3 access layer is also more complex than simply applying the VRFs to the SVIs on the core/distro switches.

    I always feel when designing L3 from the access layer i am going to get a requirement further on where if i had a L2 access layer it would be trivial but with L3, even though possible, it becomes more complex.

    And isn't that what design is about ie. you design not just for now but also for what might be needed in future :-)
    Replies
    1. VSS is even more complex than redundant supervisors within a single box.

      As for L2 versus L3 - it all depends on your environment. In data centers I recommend going with overlay solutions (VXLAN etc.) and small L3 subnets in the transport network (usually two ToR switches), campus is obviously a different story.
  4. Ivan

    Re. the DC. Does that depend on the size ?

    I am just getting up to speed on VXLAN/FabricPath/leaf and spine etc.

    Where i am struggling is seeing in particular how VXLAN and FabricPath fit together or even if they do ie. VXLAN extends L2 across L3, FabricPath seems to be all about L2 but being able to use all links and taking the shortest path.

    And then tieing in leaf and spine ie. with a L3 leaf and spine i can see the argument for VXLAN. With a L2 leaf and spine would it not make sense to simply use FabricPath and extend the vlans back to the racks ie. no need for VXLAN.

    Or does that just not scale very well ?

    I understand VXLAN can only use the paths made available to it so i could see in a large DC there may be an argument for combining VXLAN with FabricPath but i'm just speculating there.

    And with VXLAN being an overlay do you not lose visibility and to some extent the intelliegence of your network infrastructure ?

    I'm not expecting answers to all the above :-), i was just wondering if you cover all those points in the webinars and if so which ones would be good to get.

    I really want to understand it as much from a design perspective as an implementation one.
  5. Where is the answer?
Add comment
Sidebar