Routing Protocols and SD-WAN: Apples and Furbies

Ethan Banks recently wrote a nice blog post detailing the benefits and drawbacks of traditional routing protocols and comparing them with their SD-WAN counterparts.

While I agree with everything he wrote, the comparison between the two isn’t exactly fair – it’s a bit like trying to cut the cheese with a chainsaw and complaining about the resulting waste.

Traditional Routing Protocols 101

The traditional routing protocols were designed to solve endpoint reachability problem in a hop-by-hop destination-only forwarding environment of unknown topology.

The problem space was limited to destination-only forwarding for a good reason: that’s the only forwarding mechanism that scales (assuming decent multi-level summarization).

A typical routing protocol has to solve these challenges:

  • Create local topology information;
  • Discover adjacent routers (preferably automatically);
  • Exchange reachability and topology information with the adjacent routers;
  • Recreate the network topology based on the available topology information;
  • For every known destination, select a set of paths to reach that destination.

The routing protocols choose only the best path based on cost mantra is wrong. Some routing protocols could select suboptimal paths (EIGRP, BGP, potentially also OSPF with LFA), or take in account the network conditions (CSPF in MPLS-TE).

However, as long as we stick to the hop-by-hop forwarding behavior, we’re severely limited in the set of paths a routing protocol can select – a routing protocol can never select a path that could result in a forwarding loop (the proof is left as an exercise for the interested reader).

Finally, all routing protocols assume shared fate between routing protocol updates and the forwarding path. The moment this assumption is broken, we might experience interesting challenges (just ask anyone who had to configure OSPF on partially-meshed Frame Relay in the CCIE lab without using P2MP interfaces).

The situation is particularly bad in IXP environments using BGP route servers, and while people keep proposing solutions to that problem, none of them is anywhere close to perfect.

Routing in SD-WAN environments

Routing in SD-WAN environment is almost trivial:

  • There is no need for auto-discovery (SD-WAN nodes register themselves with the central controller);
  • Network topology is trivial: N sets of flat overlay networks;
  • Network topology never changes (because it’s an overlay network), the only thing that could change is the reachability of the next hop;
  • Reachability information is collected by the central controller and flooded to the end nodes.
SD-WAN routing protocols

SD-WAN routing protocols

Using traditional routing protocols in such an environment is overkill; the only routing protocol that is simple enough for the task at hand is IBGP (and there is at least one SD-WAN vendor that uses BGP behind the scenes). Yes, you could also use RIPv2, but let’s not go there.

The results of a routing protocol in SD-WAN environment (or DMVPN environment) are very simple: a set of destinations, and a set of potential underlay (transport) next-hops for each destination. What happens next is no longer a job of a routing protocol.

Don’t Conflate Load Balancing with Routing

After the SD-WAN controller collects reachability information and distributes it to SD-WAN nodes, each SD-WAN node knows:

  • Which destinations are available;
  • Which transport next hops can be used to reach each destination.
Does that look like an ECMP routing/forwarding table? Sure it does ;) Also note that it doesn’t matter whether you implement your SD-WAN (or hybrid WAN) network with secret sauce made by a startup or multiple DMVPN tunnels, the results are the same.
Load balancing in SD-WAN environment

Load balancing in SD-WAN environment

The above description is obviously slightly oversimplified. For example, smart SD-WAN solutions would allow you to configure transport zones ensuring (for example) the SD-WAN node on Site-A knows it cannot reach TB1 using TA2 source IP address.

The true difference between SD-WAN solutions and traditional forwarding implementations lies in the algorithm used to distribute packets across alternate paths. Most networking devices use some variant of (un)equal cost multipathing, whereas SD-WAN devices typically:

  • Continuously measure end-to-end path quality between local node and transport next-hops (the trivial algorithm being next-hop reachability tracking – if there’s no response from the next hop, it’s obviously down and should not be used);
  • Classify applications in groups (we used to call them Differentiated Services Code Points) and send the application traffic on one of the available paths based on path quality and application requirements.
Please note that you could have done that with multi-topology routing or DiffServ-aware MPLS-TE for years. It just took way too much effort to deploy.
  • Some SD-WAN solutions might perform available bandwidth monitoring (based on increased end-to-end delay) and adjust the packet sending rate, similar to what TCP optimization solutions are doing. For more details, listen to Episode 25 of Software Gone Wild.

However, these operations have nothing to do with routing protocols – they are (like Cisco’s OER or PfR) local decisions made on the device based on current characteristics of transport paths.

In short, don’t blame the routing protocols if you don’t want to configure PfR/OER or if you think they suck.

Could We Do This With Routing Protocols?

We could, but that doesn’t mean we should. There have been numerous attempts to include QoS-awareness in traditional routing protocols, from load parameter in IGRP to QoS-based metrics in OSPF, and Cisco’s Multi-Topology Routing.

While I’ve seen some service providers using QoS-based MPLS TE (DiffServ-Aware TE), I haven’t seen anyone using QoS-based routing protocols (if you have, please write a comment). Either the implementations really suck, or there’s something fundamentally wrong with the whole picture… and I suspect it’s the latter.

You see, shifting traffic across alternate paths works very well in SD-WAN world, because the amount of shifted traffic represents a minuscule part of the overall traffic in the ISP network, whereas shifting traffic based on routing protocol decisions results in significant traffic shifts, which can result in interesting feedback loops in oscillations.

For more details, listen to the Episode 34 of Software Gone Wild in which we discussed network monitoring in the SDN era.

More Details

5 comments:

  1. Assuming you use MPLS-WAN, and ISP's MPLS network is not controlled by your controller, how can you get away with not running routing protocols with your service provider?
    Replies
    1. Think about how you'd do it across Internet where the only thing you know is the default route and your IP address. You can always convert an MPLS/VPN WAN into something that looks like Internet.
  2. I have read your post with interest; let me comment as you requested regarding whether QoS routing protocols exist. ADARA Networks has developed a QoS-based routing protocol (DLSP - Dynamic Link-State Routing Protocol) for its SD-WAN platform. ADARA markets this platform through Hewlett Packard as well as others. I worked with J.J. Garcia-Luna-Aceves who developed DUAL, the basis of Cisco’s EIGRP, and as the inventor of DLSP, I can state that a QoS routing protocol does exist and is scalable without the issues you cite. DLSP computes the link metrics (latency, bandwidth, available bandwidth, loss) dynamically and periodically, and is a true multi-hop/multipath routing protocol, being able to compute multiple paths of unequal cost to the same destination. It is tightly integrated with the forwarding engine, which maps flows to paths according to 1) the path costs, 2) the utilization of the paths, and 3) the QoS requirements of the application. Unlike EIGRP, all the possible paths towards the destination network can be used simultaneously with no routing loops. DLSP does not require any configuration. No traffic engineering configuration is required, the user only needs to specify the QoS requirements (bandwidth, latency, priority) of the application. (An application is identified using one of the following parameters or through a combination of parameters: port number, IP address, network address, DSCP value, or DPI rule). DLSP (and the multipath flow-based forwarding engine) can be used both on overlay networks and physical underlay networks. DLSP is extensible, meaning the metrics which it can use can be modified, unlike traditional legacy routing protocols.
  3. Marcelo is DLSP open standard? Can you please give a link with more information or to a whitepaper...
    Replies
    1. Dan, DLSP is emerging as a candidate for global standard. You can get a White Paper on DLSP in http://www.adaranetworks.com/news-white-paper.php.
Add comment
Sidebar