Fast Failover: The Challenge

Sometimes you’re asked to design a network that will reroute around a failure in milliseconds. Is that feasible? Maybe. Is it simple? Absolutely not.

In this series of blog posts we’ll start with the basics, explore the technologies that you can use to reach that goal, and discover one or two unexpected rabbit holes.

Fast failover is just one of the topics we’ll discuss in Advanced Routing Protocol Features part of How Networks Really Work webinar.

The Basics

Adapting network forwarding behavior following a link- or node failure includes at least these steps:

  1. Detecting the failure;
  2. Adjusting forwarding behavior near the failure point (when possible);
  3. Disseminating changed topology information;
  4. Recomputing network topology graph (link-state protocols) or adjusting routing tables (distance vector protocols)
  5. Adjusting forwarding tables based on new routing table information.

The moment you start relying on distributed computation (steps 3-4) it’s neigh impossible to reroute around a failure in a few milliseconds; the only way to reach that goal is to have localized redundancy that can be utilized without consulting anyone else.

Please note that you need localized redundancy per destination (Forwarding Equivalence Class in MPLS terms), be it IP prefix, IP address, or MAC address. It’s also perfectly possible to design fast failover for some (critical) destinations but not for others.

There are at least three options to get local redundancy:

  • Redundant equal-cost links, connected to the same downstream node, or to a set of downstream nodes.
  • A link to a feasible successor - an adjacent node that is guaranteed not to use local node for traffic forwarding toward the destination.
  • A tunnel to a distant feasible successor - when all adjacent nodes use local node to reach a destination, you could rely on a tunnel leading to a far-enough node which is guaranteed to use a different path toward the destination.

We’ll cover the details of all three options and the potential implementation mechanisms in a series of blog posts, but before embarking on this journey it makes sense to ask a few simple questions:

  • Do you need fast failover?
  • How fast is good enough?
  • Can you make it work?
  • Is the added complexity worth the effort?

Do You Need Fast Failover?

Unless you’re carrying critical traffic where a temporary disruption might cause loss of life or billions of dollars in damages, the answer is most probably “not really”.

However, if you’re implementing a control system for a nuclear power plant, a video network to support long-distance heart surgeries, or a voice network for 112 (Europe)/911 (US) service please stop reading right now and get expert help.

How Fast Is Good Enough?

Years ago when I was still young and enthusiastic I considered 50 msec failover a holy grail everyone tries to reach… until I attended a presentation by Ian Farrer, who pointed out an oft-overlooked fact: maybe you should read your Service Level Agreement first, and design your network to support what you promised instead of chasing the grail. Maybe you could reach what you have to deliver with a decently-implemented routing protocol.

Can You Make It Work?

Remember the first step in the Basics section: detect the failure? The failover process can’t start until someone realizes there’s been a failure.

Most network designs promising extremely fast failover times rely on some external magic to provide instantaneous failure detection. Most commonly used magic in modern networks is loss of light: the optical cable will be cleanly cut in a microsecond, and the transceiver will report the failure in less than a millisecond.

Really? Sometimes you’re lucky and things do work that way. Sometimes you get gray failures. Sometimes an intermediate box that you have no control over (repeater, media converter) doesn’t propagate loss-of-light condition.

No problem”, a seasoned networking engineer says, “we’ll detect the failure with BFD”. Of course he’s right… but how fast could BFD detect a failure? It depends on how low you can tweak the BFD timers before getting an unstable network, and that depends on the specifics of BFD implementation.

Regardless of what the actual value is, pondering a failover solution faster than BFD failure detection time is a colossal waste of time.

Is the Added Complexity Worth the Effort?

In most cases, the answer is NO, in particular if the failures are rare.

Blog posts in Fast Failover series

3 comments:

  1. I love this write up. Good work. As an aspiring young network engineer...you work keeps my mind imagining... I truly love the efforts

  2. Yes you always learn from Ivan for he always covers the topics of the day very well with a great and sometimes humors perspective on things, plus his training is outstanding.

    This is a great topic.

    I don't know if this what was discussed prior but maybe some consideration on the types of failures should be covered initially to align with the discussion on when fast failover should occur? Maybe a subcategory after item #1 For example L1 failure with a low/med/high category for when fast fail is used plus upward/downward signaling. Outline the same for L2-7 types. Maybe include a simple table of the common types of failures noted these days and align to the Fast Fail over approach expected to be used or when used. When designing we need to know clearly when to "go all in" on fast fail over and when not to plus when to return to recovered state(the failure is corrected and original path is in use- no stickiness). Think dampening mechanisms or false negatives tripping off Fast Fail over. This will tie nicely to the rest of the basics outline listed.

  3. Do we really need ms failover? As usual, it depends... :-)

    If your flows are big compared to your available bandwidth, then probably yes, but only for some critical traffic.

    If you have plenty of bandwidth, then it is better to use simulcast. This the way how radio voice is now handled in ED-137, this is how radar sensor data is 4-times replicated using the ARTAS scheme. In the LAN you could use PRP or something similar.

    But even with simulcast, at the de-duplication point you have a lot to decide. How can I really detect that a flow is bad? Can I do a per-packet de-duplication reliably, or do I have choose the best stream? In the later case, how can I have some stability?

    With packet de-duplication, the problem is different delays. Either you have to have similar delays on alternate paths, or you have to introduce delay compensation. In either way, you will have additional delay. That is the price for your packet-level de-duplication.

    As usual, you have to make a trade-off...

Add comment
Sidebar