Software-Defined WAN:Well-Orchestrated Duct Tape?
One of the Software Defined Evangelists has declared 2015 as the Year of SD-WAN, and my podcast feeds are full of startups explaining how wonderful their product is compared to the mess made by legacy routers, so one has to wonder: is SD-WAN really something fundamentally new, or is it just another old idea in new clothes?
Read This First
Don’t misinterpret this blog post. I am not against SD-WAN; in fact, I love some of the ideas I’ve seen so far and the clean and unified architecture of some of the products.
I am, however, disgusted by all the hype cloaked as technical discussions and think the networking engineers (as opposed to marketers or managers) should approach SD-WAN like any other technology and try to understand how it really works and what the real challenges and solutions are.
What Is SD-WAN?
With no definition from a respectable body, let’s fall back to the description on Open Networking User Group web site. Looking at their diagrams, it looks like SD-WAN is the thing that allows you to use the public Internet in parallel with private WAN to reduce the costs.
Wait, what? We’ve been doing that for ages, and most our customers weaned themselves off MPLS/VPN years ago, using solutions like IPsec, DMVPN, or even MPLS/VPN-over-GRE-over-IPsec.
The marketing gurus working for SD-WAN vendors will quickly tell you that what they do is fundamentally different: the thing we’ve been doing in the past is hybrid WAN and the new thing is software defined, uses central controller, and therefore doesn’t have to use a complex plethora of protocols like IKE, IPsec, GRE, NHRP, NBAR, IP SLA, PBR or routing protocols like BGP or OSPF. All that is replaced by some secret sauce proprietary to each startup (yeah, that’s a comforting thought right there).
SD-WAN Behind the Scenes
In its simplest incarnation, SD-WAN (as promoted by a large crowd of startups) allows you to use two WAN transport networks for optimal end-to-end transport.
Let’s see what needs to be done to make this work.
As you cannot advertise the non-public address ranges used on your sites to the transport networks (at least not to the public Internet), every SD-WAN solution builds an overlay network. Whether they use GRE, VXLAN, IPsec tunnel mode or any other encapsulation technology is irrelevant.
Some customers want direct connectivity between every pair of sites, requiring a full mesh of tunnels or a multi-access tunnel technology like DMVPN. The details don’t really matter, and in most cases the tunnels get provisioned automatically (not dissimilar to what Open vSwitch or VXLAN implementations do).
You wouldn’t want to transport your internal traffic across Internet unencrypted, would you? Every SD-WAN solution has to solve the traffic encryption problem (hint: there’s a standard way of doing it, called IPsec) and key distribution problem (aka IKE in multi-vendor world).
Before you can start using the SD-WAN overlay network, the SD-WAN network needs to learn the topology of your network. Let’s ignore for the moment the challenges of integrating SD-WAN edge devices with the traditional on-site L2/L3 devices and focus on what’s going on within the SD-WAN cloud.
When an SD-WAN edge node powers up, it has to connect to the controller and register its outside (WAN) IP addresses with the controller. We used NHRP to do that in standard-based networks.
Next, the controller needs to learn the local prefixes available on every site. Whether you use a routing protocol, REST API or a proprietary vendor-specific protocol to exchange those prefixes doesn’t really matter… unless you happen to care about multi-vendor interoperability, which you won’t see in the SD-WAN world for a long time.
After discovering the prefixes available on each SD-WAN site, the controller decides which prefixes to use, and sends the best prefixes together with transport network next hops to SD-WAN edge nodes. If this doesn’t sound like description of BGP route reflector, I don’t know what does (apart from the minor detail that almost all SD-WAN vendors use proprietary mechanisms – but I guess you already got that point).
In the ideal case, every site reaches every other site over more than one uplink, so it has to select the best uplink to use, either based on reachability or more complex measurements – the job of legacy tools like BFD or IP SLA.
Finally, once the quality of the links is known, the user traffic has to be sorted into application classes (aka NBAR) and forwarded to the same destination SD-WAN node across one of the uplinks based on pre-programmed policy (does Policy Based Routing sound familiar)?
Some SD-WAN solutions go way beyond simple PBR and use smart congestion measurement, packet retransmission, or even forward error correction to make the best use of available bandwidth while retaining acceptable end-to-end quality. These technologies are nothing new; we’ve seen them in WAN optimization devices for years (and you might remember people who loved to rant how broken WAN optimization is).
Summary
Every SD-WAN solution has to reinvent all the wheels we use in hybrid WAN networks – tunneling, encryption, key exchange, registration of edge nodes, exchange of reachability information, next-hop reachability and end-to-end link quality measurements, application recognition and packet forwarding based on policies, so please don’t tell me how revolutionary these solutions are (RFC 1925 sections 2.11 and 2.5 quickly come to mind).
There is, however, a fundamental difference between a hodgepodge of traditional protocols that were force-fit into a hybrid WAN architecture and SD-WAN – the architects of SD-WAN products were not burdened with legacy implementations, or forced to reuse code base that was meant to solve a totally different problem or protocols that were suboptimal for the job (why is anyone using OSPF in DMVPN networks when it’s clear that BGP scales much better?). The individual features that they use to reinvent the wheels are also tightly integrated, because they were designed from day one to be used together.
The architecture of most SD-WAN products is thus much cleaner and easier to configure than traditional hybrid networks. However, do keep in mind that most of them use proprietary protocols, resulting in a perfect lock-in.
More Details
- Software-Defined WAN (SD-WAN) Overview webinar (free) describes the basics of SD-WAN and typical SD-WAN components and architectures.
- Cisco SD-WAN Foundations and Design Aspects webinar (free) describes Cisco’s SD-WAN solution (formerly known as Viptela)
- You’ll find my grumpy take on SD-WAN in the SDN Use Cases webinar (subscriber-only).
About the Title
Read this tweet in case you haven’t figured out the joke in the blog post title. Here’s its gist in case Twitter disappears:
Everything can be solved with a right combination of NAT, GRE and PBR ;) Duct tapes of networking.
I'm not saying there isn't room for innovation but there is no gain in inventing something new if there is already a suitable tool out there.
What's funny is that people are even repeating the same mistakes of the past. I'm actually writing an article on the subject. I think you may find it interesting.
It's an interesting example of SDN being used for WAN provisioning used in full production for a rather large amount of traffic and with quite a few customers, albeit in a manner put together by I2 and its NOC and software developers themselves rather than buying a product from a vendor.
Generally, I agree that the SD-WAN vendors are simply rehashing older technology in generally proprietary ways. Where I begin to disagree is that the new model doesn't offer some advantages we don't currently have.
The primary differentiation for me is the concept of "local perspective" vs "controller perspective". As engineers we tend to shy away from protocols like PBR because of the local perspective problem. As we implement fancy new ways to forward our traffic as determined by policy/logic (rather than destination IP) we are required to implement those changes across a myriad of devices to avoid issues like asynchronous forwarding. Since every device makes the decisions locally, this policy tends to be unique per device and as such ends up being a pain to manage in the long run. Additionally, changes to the policy usually require a reconfiguration of each device (and again, uniquely since they aren't all the same). Moving to a centralized "controller perspective" version of configuration removes a lot of the hassle of implementing these technologies and by doing so is reducing cost and administrative complexity. That is a good thing and opens up the use of these technologies to organizations who would benefit from them but may not have the advanced level of expertise on staff to manage them in their current state.
Similarly, the "localized perspective" issue raises it's head again when we talk about state monitoring and correction. You are correct that many engineers are not thrilled with the performance of previous generations of WAN optimization but they suffered from the same localized perspective issues. Traffic quality was determined wholly by the receiving end of the link and because of that it was unreliable to depend on. The "controller perspective" of these new SD-WAN options are able to make decisions with data from both the transmit and receive end of the link. It's not a guarantee but it should dramatically improve detection of line degradation, and consequently which corrective measures to take to resolve them.
My last point is this... I don't think SD-WAN is meant to appeal to high end engineers. SD-WAN is going to be wildly successful because the people who will buy into the concept (and ultimately pay to have it installed) couldn't describe the features of any of the technology/acronyms you used in your post. It makes the complex easy, which has been one of the promises of SDN from the beginning. None of the tech is new, but it is becoming far more accessible because of this new way of implementing it.
1) distributed configuration of many devices is a pain. it would be nice to have centralized configuration.
2) the average networking guy doesn't know all these fancy protocols. it would be nice if they could have a generic way to configure abstract stuff, without the need to understand the underlying protocols.
I agree. Of course.
But I don't agree that you need new technology under the hood. You can do it with existing technology and protocols. You just need a new user-interface. Or paradigm. Or approach. Or whatever you want to call it.
Or is this too simplistic ?
Thank you for thoughtful comment. I think you might be over-optimistic in regard to the whole "controller-based" hype.
As Gryz pointed out, there's absolutely no reason you couldn't have done the same thing with existing devices (we just didn't for zillions of reasons). Also, most of the SD-WAN solutions use exactly the same mechanisms as existing hybrid WAN solutions, you just don't see the complexity.
Finally, speaking of complexity, you wrote "SD-WAN makes the complex easy, which has been one of the promises of SDN from the beginning". I sincerely hope you don't believe that. Do a deep dive into any SDN technology or product, and you'll find plenty of untested complexity with as-of-yet-unknown side effects. What many SDx solutions give you is the _appearance_ of being easy, and I have hoped we've learned how reliable those appearances are based on our experience with single-pane-of-whatever systems.
I agree with everything you stated. I'm not a fan of the proprietary methods being implemented but am a fan of a centrally controlled WAN architecture. It's should be simply a management shift, not necessarily a technology shift.
Ivan,
I knew opening my mouth on your blog was risky :) You may be correct. I may be over-optimistic to the controller hype. I'm not naive enough to believe SD-WAN providers are going to deliver on *everything* they have promised. I do believe they have the ability to deliver enhancements on the simplification of management and centralized visibility mentioned in my previous comment. We don't at all disagree on the fact that they should be using established/standardized tech where possible but I do believe they are working to deliver on a valuable concept if done correctly. Like all things in this world, I could be wrong and it all depends on practical implementation (something we have yet to see in any decent scale).
"I knew opening my mouth on your blog was risky :)" - it isn't. No networking engineer was ever harmed doing that (I can't guarantee the same for marketers and trolls ;)
"I do believe they are working to deliver on a valuable concept if done correctly" - we're in total agreement. I just hate all the hype floating around and still think we should focus on viability of underlying technologies. Understanding them is the first step toward understanding their limitations.
I agree it's complex, but unfortunately we cannot do any better with the technologies we use today (TCP/IP).
- "It’s so entertaining listening to people who once touted the benefits of multi-vendor networks suddenly promoting the benefits of undocumented proprietary solutions “because they are so much better than routing protocols.”
=)
I take it there are no prizes for guessing who? :-(
and changing its topology based on certain metrics is pretty cool
To me, what is exciting with SD-WAN is that visibility is a key use case for SD-WAN. For once, network management isn’t an after thought.
"The architecture of most SD-WAN products is thus much cleaner and easier to configure than traditional hybrid networks. However, do keep in mind that most of them use proprietary protocols, resulting in a perfect lock-in."
In this case I should consider that today using tools like DMVPN I'm not locked-in ? Just Curious It´s usual have interoperability in this type of network with multiple nodes in remote sites ? I mean to scale I have to use something like DMVPN for instance, So for me we already live in a lock-in world. Do you agree ?
I agree with you that DMVPN is sort-of lock-in because there's a single vendor that decided to use this specific mix of standard protocols to implement mGRE overlay on top of IPsec.
However, every single protocol used in the DMVPN mix is a standard protocol, and apart from NHRP extensions needed to handle NAT traversal reliable, I don't think Cisco modified any of them, so in theory someone else could replicate what they did (modulo any patents - have to write a blog post on that topic, maybe someone knows the answer).
In any case, as you're dealing with a mixed bag of standard protocols, you have at least a fighting chance of figuring out what's going on, for example by firing up Wireshark and troubleshooting your problem (or at least identifying what the problem is). You have zero chance to do that with undocumented proprietary protocols.
Do you agree?