Software-Defined WAN:Well-Orchestrated Duct Tape?

One of the Software Defined Evangelists has declared 2015 as the Year of SD-WAN, and my podcast feeds are full of startups explaining how wonderful their product is compared to the mess made by legacy routers, so one has to wonder: is SD-WAN really something fundamentally new, or is it just another old idea in new clothes?

Read This First

Don’t misinterpret this blog post. I am not against SD-WAN; in fact, I love some of the ideas I’ve seen so far and the clean and unified architecture of some of the products.

I am, however, disgusted by all the hype cloaked as technical discussions and think the networking engineers (as opposed to marketers or managers) should approach SD-WAN like any other technology and try to understand how it really works and what the real challenges and solutions are.

What Is SD-WAN?

With no definition from a respectable body, let’s fall back to the description on Open Networking User Group web site. Looking at their diagrams, it looks like SD-WAN is the thing that allows you to use the public Internet in parallel with private WAN to reduce the costs.

Wait, what? We’ve been doing that for ages, and most our customers weaned themselves off MPLS/VPN years ago, using solutions like IPsec, DMVPN, or even MPLS/VPN-over-GRE-over-IPsec.

The marketing gurus working for SD-WAN vendors will quickly tell you that what they do is fundamentally different: the thing we’ve been doing in the past is hybrid WAN and the new thing is software defined, uses central controller, and therefore doesn’t have to use a complex plethora of protocols like IKE, IPsec, GRE, NHRP, NBAR, IP SLA, PBR or routing protocols like BGP or OSPF. All that is replaced by some secret sauce proprietary to each startup (yeah, that’s a comforting thought right there).

SD-WAN Behind the Scenes

In its simplest incarnation, SD-WAN (as promoted by a large crowd of startups) allows you to use two WAN transport networks for optimal end-to-end transport.

Typical SD-WAN architecture

Typical SD-WAN architecture

Let’s see what needs to be done to make this work.

As you cannot advertise the non-public address ranges used on your sites to the transport networks (at least not to the public Internet), every SD-WAN solution builds an overlay network. Whether they use GRE, VXLAN, IPsec tunnel mode or any other encapsulation technology is irrelevant.

Some customers want direct connectivity between every pair of sites, requiring a full mesh of tunnels or a multi-access tunnel technology like DMVPN. The details don’t really matter, and in most cases the tunnels get provisioned automatically (not dissimilar to what Open vSwitch or VXLAN implementations do).

You wouldn’t want to transport your internal traffic across Internet unencrypted, would you? Every SD-WAN solution has to solve the traffic encryption problem (hint: there’s a standard way of doing it, called IPsec) and key distribution problem (aka IKE in multi-vendor world).

Before you can start using the SD-WAN overlay network, the SD-WAN network needs to learn the topology of your network. Let’s ignore for the moment the challenges of integrating SD-WAN edge devices with the traditional on-site L2/L3 devices and focus on what’s going on within the SD-WAN cloud.

When an SD-WAN edge node powers up, it has to connect to the controller and register its outside (WAN) IP addresses with the controller. We used NHRP to do that in standard-based networks.

Next, the controller needs to learn the local prefixes available on every site. Whether you use a routing protocol, REST API or a proprietary vendor-specific protocol to exchange those prefixes doesn’t really matter… unless you happen to care about multi-vendor interoperability, which you won’t see in the SD-WAN world for a long time.

It’s so entertaining listening to people who once touted the benefits of multi-vendor networks suddenly promoting the benefits of undocumented proprietary solutions “because they are so much better than routing protocols.”

After discovering the prefixes available on each SD-WAN site, the controller decides which prefixes to use, and sends the best prefixes together with transport network next hops to SD-WAN edge nodes. If this doesn’t sound like description of BGP route reflector, I don’t know what does (apart from the minor detail that almost all SD-WAN vendors use proprietary mechanisms – but I guess you already got that point).

In the ideal case, every site reaches every other site over more than one uplink, so it has to select the best uplink to use, either based on reachability or more complex measurements – the job of legacy tools like BFD or IP SLA.

Finally, once the quality of the links is known, the user traffic has to be sorted into application classes (aka NBAR) and forwarded to the same destination SD-WAN node across one of the uplinks based on pre-programmed policy (does Policy Based Routing sound familiar)?

Some SD-WAN solutions go way beyond simple PBR and use smart congestion measurement, packet retransmission, or even forward error correction to make the best use of available bandwidth while retaining acceptable end-to-end quality. These technologies are nothing new; we’ve seen them in WAN optimization devices for years (and you might remember people who loved to rant how broken WAN optimization is).

Summary

Every SD-WAN solution has to reinvent all the wheels we use in hybrid WAN networks – tunneling, encryption, key exchange, registration of edge nodes, exchange of reachability information, next-hop reachability and end-to-end link quality measurements, application recognition and packet forwarding based on policies, so please don’t tell me how revolutionary these solutions are (RFC 1925 sections 2.11 and 2.5 quickly come to mind).

There is, however, a fundamental difference between a hodgepodge of traditional protocols that were force-fit into a hybrid WAN architecture and SD-WAN – the architects of SD-WAN products were not burdened with legacy implementations, or forced to reuse code base that was meant to solve a totally different problem or protocols that were suboptimal for the job (why is anyone using OSPF in DMVPN networks when it’s clear that BGP scales much better?). The individual features that they use to reinvent the wheels are also tightly integrated, because they were designed from day one to be used together.

The architecture of most SD-WAN products is thus much cleaner and easier to configure than traditional hybrid networks. However, do keep in mind that most of them use proprietary protocols, resulting in a perfect lock-in.

More Details

About the Title

Read this tweet in case you haven’t figured out the joke in the blog post title. Here’s its gist in case Twitter disappears:

Everything can be solved with a right combination of NAT, GRE and PBR ;) Duct tapes of networking.

27 comments:

  1. I see a lot of reinventing the wheel these days with all the SDN hype going on. In many cases they are even reinventing broken wheels. There are protocols that have 20-30 years of exposure to the real world (BGP) so to say that you can design something better than that, takes a pretty big ego.

    I'm not saying there isn't room for innovation but there is no gain in inventing something new if there is already a suitable tool out there.

    What's funny is that people are even repeating the same mistakes of the past. I'm actually writing an article on the subject. I think you may find it interesting.
  2. Reading this I was wondering if you were aware of Internet2's AL2S service (http://www.internet2.edu/products-services/advanced-networking/layer-2-services/#service-features). It essentially offers a GUI portal as well as API/OpenFlow programmability for extending arbitrary VLANs between participating member sites using the Internet2 backbone. In this sense it's a full SDN platform you can use for programming WAN connectivity. This includes being able to program the path between I2 backbone nodes that the VLAN will take, as well as a backup path to swing to should a primary path segment fail. Interestingly this isn't just used by members for various research and education purposes, but is in fact used by I2 itself. I2 is its own customer with this product, and the routed I2 backbone - the IP services most people associate with I2 - runs on top of AL2S itself.

    It's an interesting example of SDN being used for WAN provisioning used in full production for a rather large amount of traffic and with quite a few customers, albeit in a manner put together by I2 and its NOC and software developers themselves rather than buying a product from a vendor.
  3. "When an SD-WAN edge node powers up, it has to connect to the controller and register its outside (WAN) IP addresses with the controller. " - Looks like LISP can do this. outside (WAN)IP address is your RLOC, Controller is MS, and prefix is EID.
  4. Ivan,

    Generally, I agree that the SD-WAN vendors are simply rehashing older technology in generally proprietary ways. Where I begin to disagree is that the new model doesn't offer some advantages we don't currently have.

    The primary differentiation for me is the concept of "local perspective" vs "controller perspective". As engineers we tend to shy away from protocols like PBR because of the local perspective problem. As we implement fancy new ways to forward our traffic as determined by policy/logic (rather than destination IP) we are required to implement those changes across a myriad of devices to avoid issues like asynchronous forwarding. Since every device makes the decisions locally, this policy tends to be unique per device and as such ends up being a pain to manage in the long run. Additionally, changes to the policy usually require a reconfiguration of each device (and again, uniquely since they aren't all the same). Moving to a centralized "controller perspective" version of configuration removes a lot of the hassle of implementing these technologies and by doing so is reducing cost and administrative complexity. That is a good thing and opens up the use of these technologies to organizations who would benefit from them but may not have the advanced level of expertise on staff to manage them in their current state.

    Similarly, the "localized perspective" issue raises it's head again when we talk about state monitoring and correction. You are correct that many engineers are not thrilled with the performance of previous generations of WAN optimization but they suffered from the same localized perspective issues. Traffic quality was determined wholly by the receiving end of the link and because of that it was unreliable to depend on. The "controller perspective" of these new SD-WAN options are able to make decisions with data from both the transmit and receive end of the link. It's not a guarantee but it should dramatically improve detection of line degradation, and consequently which corrective measures to take to resolve them.

    My last point is this... I don't think SD-WAN is meant to appeal to high end engineers. SD-WAN is going to be wildly successful because the people who will buy into the concept (and ultimately pay to have it installed) couldn't describe the features of any of the technology/acronyms you used in your post. It makes the complex easy, which has been one of the promises of SDN from the beginning. None of the tech is new, but it is becoming far more accessible because of this new way of implementing it.
    Replies
    1. So you are saying:
      1) distributed configuration of many devices is a pain. it would be nice to have centralized configuration.
      2) the average networking guy doesn't know all these fancy protocols. it would be nice if they could have a generic way to configure abstract stuff, without the need to understand the underlying protocols.

      I agree. Of course.
      But I don't agree that you need new technology under the hood. You can do it with existing technology and protocols. You just need a new user-interface. Or paradigm. Or approach. Or whatever you want to call it.

      Or is this too simplistic ?
    2. Hi Jordan,

      Thank you for thoughtful comment. I think you might be over-optimistic in regard to the whole "controller-based" hype.

      As Gryz pointed out, there's absolutely no reason you couldn't have done the same thing with existing devices (we just didn't for zillions of reasons). Also, most of the SD-WAN solutions use exactly the same mechanisms as existing hybrid WAN solutions, you just don't see the complexity.

      Finally, speaking of complexity, you wrote "SD-WAN makes the complex easy, which has been one of the promises of SDN from the beginning". I sincerely hope you don't believe that. Do a deep dive into any SDN technology or product, and you'll find plenty of untested complexity with as-of-yet-unknown side effects. What many SDx solutions give you is the _appearance_ of being easy, and I have hoped we've learned how reliable those appearances are based on our experience with single-pane-of-whatever systems.
    3. Gryz,

      I agree with everything you stated. I'm not a fan of the proprietary methods being implemented but am a fan of a centrally controlled WAN architecture. It's should be simply a management shift, not necessarily a technology shift.

      Ivan,

      I knew opening my mouth on your blog was risky :) You may be correct. I may be over-optimistic to the controller hype. I'm not naive enough to believe SD-WAN providers are going to deliver on *everything* they have promised. I do believe they have the ability to deliver enhancements on the simplification of management and centralized visibility mentioned in my previous comment. We don't at all disagree on the fact that they should be using established/standardized tech where possible but I do believe they are working to deliver on a valuable concept if done correctly. Like all things in this world, I could be wrong and it all depends on practical implementation (something we have yet to see in any decent scale).
    4. Jordan,

      "I knew opening my mouth on your blog was risky :)" - it isn't. No networking engineer was ever harmed doing that (I can't guarantee the same for marketers and trolls ;)

      "I do believe they are working to deliver on a valuable concept if done correctly" - we're in total agreement. I just hate all the hype floating around and still think we should focus on viability of underlying technologies. Understanding them is the first step toward understanding their limitations.
  5. Who are you and what have you done with complexity-hating Ivan?! ;)
    Replies
    1. Unfortunately you need all the wheels I mentioned if you want to have a reasonably-well-functioning hybrid WAN (oops. Software-Defined WAN) solution.

      I agree it's complex, but unfortunately we cannot do any better with the technologies we use today (TCP/IP).
    2. Then why not abstract all that complexity into a secret sauce in a shiny new SD-WAN box? After all, isn't that one of the tenets of SDN?
    3. Sure... until you have to figure out why it stopped working ;) You can reformat your Windows laptop when it breaks down; I'm not sure many people can do the same with their WAN network.
    4. True, but I'd rather call the vendor and have them figure it out than me having to troubleshoot 8 different protocols at 3 am on a Saturday night.
    5. ... And this is the point where we'll have to agree to disagree
    6. ... because the vendor service desk will, of course, fix this in no time.
  6. I cant help to think that you are talking about Greg Ferro when you mention this

    - "It’s so entertaining listening to people who once touted the benefits of multi-vendor networks suddenly promoting the benefits of undocumented proprietary solutions “because they are so much better than routing protocols.”

    =)
  7. 'One of the Software Defined Evangelists has declared 2015 as the Year of SD-WAN'.

    I take it there are no prizes for guessing who? :-(
  8. I'd personally use the SD-WAN stuff to get rid of old school engineers in my company that are hogging all the legacy GRE/IKE/whathaveyou gear and refusing to share info. I think that's a great use case. =)
  9. i'm happy that one with the Ivan's standing wrote such declaration of war to sd-FUD-wan. Apart technical analysis i can still hardly imagine the business case behind this, any thoughts?
  10. I think application segmentation will play a big role driving the business case for SD-WAN. The ability to have the application choose the topology brings the application requirement closer to the network. Applications adapting to network metrics ( whatever they are)
    and changing its topology based on certain metrics is pretty cool
    Replies
    1. When you see a single large-scale production deployment of this concept (preferably in enterprise network, not @ Google, Amazon or Facebook), please let me know.
    2. I'm sure there are many POC out there nearly "ready" for production deployment...
    3. Have you been involved in one? Have you seen a technical description of one that would be anywhere near the nirvana you're aiming at?
  11. I love the title of your post… There’s something to be said about a system that is designed to work together from the beginning as supposed to after the fact trying to duct tape different technologies together and call that a system.

    To me, what is exciting with SD-WAN is that visibility is a key use case for SD-WAN. For once, network management isn’t an after thought.
  12. Hey Ivan nice post as usual. You are the best. However ?I got this point

    "The architecture of most SD-WAN products is thus much cleaner and easier to configure than traditional hybrid networks. However, do keep in mind that most of them use proprietary protocols, resulting in a perfect lock-in."

    In this case I should consider that today using tools like DMVPN I'm not locked-in ? Just Curious It´s usual have interoperability in this type of network with multiple nodes in remote sites ? I mean to scale I have to use something like DMVPN for instance, So for me we already live in a lock-in world. Do you agree ?
    Replies
    1. Hi Cristiano,

      I agree with you that DMVPN is sort-of lock-in because there's a single vendor that decided to use this specific mix of standard protocols to implement mGRE overlay on top of IPsec.

      However, every single protocol used in the DMVPN mix is a standard protocol, and apart from NHRP extensions needed to handle NAT traversal reliable, I don't think Cisco modified any of them, so in theory someone else could replicate what they did (modulo any patents - have to write a blog post on that topic, maybe someone knows the answer).

      In any case, as you're dealing with a mixed bag of standard protocols, you have at least a fighting chance of figuring out what's going on, for example by firing up Wireshark and troubleshooting your problem (or at least identifying what the problem is). You have zero chance to do that with undocumented proprietary protocols.

      Do you agree?
  13. O sancta simplicitas! (lat). Great article, thank you.
Add comment
Sidebar