Why Would You Need an Overlay Network?

I got this question from one of ipSpace.net subscribers:

My VP is not a fan of overlays and is determined to move away from our legacy implementation of OTV, VXLAN, and EVPN1. We own and manage our optical network across all sites; however, it’s hard for me to picture a network design without overlays. He keeps asking why we need overlays when we own the optical network.

There are several reasons (apart from RFC 1925 Rule 6a) why you might want to add another layer of abstraction (that’s what overlay networks are in a nutshell) to your network.

You want to keep the forwarding tables on interim devices small and stable and don’t want to insert endpoint addresses or prefixes into them. That’s why we use VXLAN or Provider Backbone Bridging (PBB) to build layer-2 fabrics. Service providers use MPLS-based BGP-free core for the same reason.

Path separation. Most network devices (including bridges and routers) perform hop-by-hop destination-only forwarding. To separate two traffic streams, you must implement parallel forwarding domains on every device in the path or hide the user payload into a transport envelope (tunneling).

You might need path separation for security reasons (security domains or multi-tenant networking) or to deal with overlapping address spaces (the original use case for VRFs).

Legacy protocols. Tunneling is the easiest way to provide connectivity services for a protocol or technology you don’t want to see in the core of your network – be it IPv6, voice, SCSI, bridging, IPX, or AppleTalk.

I mentioned tunneling several times. We could debate for hours what tunneling is and whether MPLS is tunneling, However, there’s no doubt that transporting layer-2 frames (VXLAN) or layer-3 packets (GRE) inside another layer-3 (or higher) envelope is tunneling, and a network built out of tunnels is usually called an overlay network.

Finally, you could use VLANs or xWDM wavelengths (lambdas) instead of tunnels. Dedicating a lambda to every tenant (or legacy protocol) might be a bit expensive, and VLANs turn your whole network into a single failure domain. If you want to build a stable transport network, you usually use layer-3 technologies; overlays often happen to be the best tool for the job.

The Tradeoffs

As Russ White loves to say, if you haven’t found the tradeoffs, you haven’t looked hard enough. Here are some of the tradeoffs identified by Deepak Arora2:

  • Impact of overlay networks on visibility, reporting and performance management
  • Additional control plane that would result in additional abstraction layers and interaction surfaces and hence cascading effect in many situations
  • Impact on troubleshooting: how many solutions do we see in the market that can correlate underlay and overlay problems?
  • When it comes to sizing equipment in terms of control plane or data plane, it poses a new level of complexity an architect would need to deal with and in most cases vendors themselves won’t be able to offer much help in general rather than just asking you to believe in their words
  • I see lot of VXLAN and EVPN preachers, but let’s agree that mapping VLAN to VXLAN on 1:1 basis tells me you don’t know your stuff and believe too much in vendor marketing
  • EBGP underlay with IBGP overlay…man we can do better
  • Stitching two EVPN DCs with MPLS and SR: most of the implementations that I have seen were too complex and too fragile and thus results in a complex “policy.”

More Information

If you’re interested in this topic, you might want to watch these webinars (all of them part of Standard ipSpace.net subscription):

Haven’t found what you’re looking for? Send me an interesting question for the ipSpace.net design clinic.

Revision History

Added Tradeoffs section based on comments Deepak Arora made on LinkedIn.

  1. Look how far we got – people are calling VXLAN/EVPN “legacy implementation” 😁 ↩︎

  2. Copied from his LinkedIn comment because I hate good content going to waste. ↩︎


  1. I don't know Russ well, but he might also simply say an overlay adds more complexity. (love this book!! https://www.amazon.com/Navigating-Network-Complexity-Next-generation-virtualization/dp/0133989356).

    "Impact on troubleshooting: how many solutions do we see in the market that can correlate underlay and overlay problems?"

    A new breed of Network AIOps platforms will do a better job of correlation up and down layers in the network as well as across different data types. Disclaimer, I work for a Network AIOps company working on that: https://augtera.com/augtera-networks-platform-for-network-aiops/

    1. I like to sexually abuse and take advantage of romantic straight girls lol Its lot of fun Bdsm love Your loving Joanita dsouza https://Joanitad.github.io Love is love Rainbows all the way

  2. My colleagues at Magyar Telekom have solved this issue by a hybrid approach. Create a simple network for non-critical entertainment or best effort type of traffic. Pure old Internet style, no expensive feature licenses. This is where you have the huge bandwidth need and uncontrollable growth. Then build a parallel network on the same infrastructure with overlays, multi-service, and all other intelligence. This has less bandwidth needs and growth. At the end we had a dumb and an intelligent network on the same physical infrastructure where you could change the mix easily by DWDM lambda allocations. It was so efficient financially, that the DT idea of Terastream had no chance in Hungary... :-)

Add comment