Worth Reading: SD-WAN Scalability Challenges

In January 2020 Doug Heckaman documented his experience with VeloCloud SD-WAN. He tried to be positive, but for whatever reason this particular bit caught my interest:

Edge Gateways have a limited number of tunnels they can support […]

WTF? Wasn’t x86-based software packet forwarding supposed to bring infinite resources and nirvana? How badly written must your solution be to have a limited number of IPsec tunnels on a decent x86 CPU?

But if you don’t limit what traffic is allowed between branches, you could run into situations where some process […] will trigger the creation of dynamic VPNs between ALL the branches. The Edges will run out of memory or CPU and start to drop traffic, OSPF will drop routes intermittently as the OSPF process competes for resources […]

We’ve seen similar problems in DMVPN… a decade ago. One would have hoped that the industry would learn from past mistakes and shortcomings, separate data- and control-plane, protect control plane resources… but evidently those who do not learn history are doomed to repeat it, yet again using their customers as scalability testers.

2 comments:

  1. The AI/ML will sort these issues out for sure...;)
  2. My guess.. in DMVPN, the later iterations eliminated large portions of state by leveraging phase 2 and phase 3 DMVPN (shortcut and spoke to spoke route exchange) making DMVPN nearly stateless (except for NHRP and IKE.) This drastically increased performance and scalability.

    Now in SDWAN we’re putting all that state and more back into the mix so we can divert traffic per class or flow based on heuristics gathered about packets that arrive in that class or flow. I intend on doing a writeup when we do our PoC of the major players at the end of the year.
Add comment
Sidebar