Reliability of SD-WAN and Hybrid WAN Solutions
My Business Case for SD-WAN blog post received numerous comments pointing out the potential pitfalls of hybrid WAN, including reduced security, unreliable Internet services and denial-of-service attacks.
While all those comments are perfectly valid, I still think hybrid WAN (whether implemented with traditional technologies or SD-WAN products) makes perfect sense.
However, like with any new technology, you have to understand the fundamentals of SD-WAN (or hybrid WAN) solutions, and use them correctly.
Fortunately, we’ve been using solutions similar to SD-WAN for at least a decade, so we’ve already learned a few useful lessons.
Internet Uplinks Are Unreliable
We all know a zillion things can go wrong with Internet uplinks (and eventually they will):
- If a link that costs you $100 a month is down, you have zero leverage with your ISP. It will be fixed… eventually;
- If you’re experiencing packet drops on that same uplink, sometimes the only thing you can do is change the ISP;
- If someone decides to blast you with a DDoS attack, you’re toast… unless you have a high-end router sitting at a large Internet exchange, or you’re paying for DoS scrubbing service (which you should consider doing for your hub site).
On the other hand, it’s amazing how well Internet usually works, so it would be a shame not to use it. Also, most traffic transported across enterprise WAN is not really mission-critical, and it’s a waste of money to transport it across high-quality infrastructure.
Long story short: don’t ever count on reliability or availability of your Internet uplinks (particularly at remote sites).
Redundancy is King
The usual way of dealing with unreliable components is to use redundancy. Apply the same thinking to your hybrid WAN design.
Use a combination of MPLS/VPN and Internet VPN, or Internet VPN with 3G backup. Use multiple access methods, so the cable-seeking backhoe doesn’t bring down all uplinks.
Keep Calm and Be Prepared
I guess we all agree the Internet uplinks will eventually fail. At that moment it’s important to
- Have a working backup solution that has been properly tested. The last thing you need when your high-capacity links fail are routing loops and traffic blackholes;
- Have enough bandwidth available on the backup path to carry mission-critical traffic, together with a mechanism that will block non-critical traffic (otherwise the non-critical traffic would hose the backup links).
There are numerous tricks you can use to be prepared. Some organizations send mission-critical traffic over MPLS/VPN WAN all the time to ensure the MPLS/VPN links have enough bandwidth to carry that traffic when the Internet uplinks fail; others monitor the state of backup links (which should be a standard procedure anyway).
Internet VPNs can't be ignored, and will prove to be at least part of the right solution for some Enterprises. The cost difference is irresistible.
But that also means it will attract a lot of folks looking to put "I saved $$$" on their resume and then move on before the disaster strikes.
It would be really great to see a realistic business case analysis taking into account things like the additional cost of securing and monitoring Internet connections at hundreds or thousands of sites vs. maybe less than 10. Or the cost of the Plan B for weathering enterprise-wide or near enterprise-wide Internet outages measured in days, where you're running on your backup solution (assuming it really was tested and works).
http://www.gartner.com/technology/reprints.do?id=1-2JRZ2US&ct=150722&st=sb