Complexity Belongs to the Network Edge

Whenever I write about vCloud Director Networking Infrastructure (vCDNI), be it a rant or a more technical post, I get comments along the lines of “What are the network guys going to do once the infrastructure has been provisioned? With vCDNI there is no need to keep network admins full time.”

Once we have a scalable solution that will be able to stand on its own in a large data center, most smart network admins will be more than happy to get away from provisioning VLANs and focus on other problems. After all, most companies have other networking problems beyond data center switching.

More than a decade after the blog post has been written, we’re still provisioning VLANs to support VMware ESXi hosts even though we could have used overlay networks with VMware NSX for years. It turns out that everyone loves to push the problems down the stack.

As for disappearing work, we’ve seen the demise of DECnet, IPX, SNA, DLSw and multi-protocol networks (which are coming back with IPv6) without our jobs getting any simpler, so I’m not worried about the jobless network admin. I am worried, however, about the stability of the networks we are building, and that’s the only reason I’m ranting about the emerging flat-earth architectures.

Middle Ages view of flat earth. Not much has changed in the meantime (via Wikimedia Commons)

Middle Ages view of flat earth. Not much has changed in the meantime (via Wikimedia Commons)

In 2002 IETF published an interesting RFC: Some Internet Architectural Guidelines and Philosophy (RFC 3439) that should be a mandatory reading for anyone claiming to be an architect of solutions that involve networking (you know who you are). In the End-to-End Argument and Simplicity section the RFC clearly states: “In short, the complexity of the Internet belongs at the edges, and the IP layer of the Internet should remain as simple as possible.”

We should use the same approach when dealing with virtualized networking: the complexity belongs to the edges (hypervisor switches) with the intervening network providing the minimum set of required services. I don’t care if the networking infrastructure uses layer-2 (MAC) addresses or layer-3 (IP) addresses as long as it scales. Bridging does not scale as it emulates a logical thick coax cable. Either get rid of most bridging properties (like packet flooding) and implement proper MAC-address-based routing without flooding, or use IP as the transport. I truly don’t care.

Reading RFC 3439 a bit further, the next paragraphs explain the Non-Linearity and Network Complexity. To quote the RFC: “In particular, the largest networks exhibit, both in theory and in practice, architecture, design, and engineering non-linearities which are not exhibited at smaller scale.” Allow me to paraphrase this for some vendors out there: “just because it works in your lab does not mean it will work at Amazon or Google scale.

The current state of affairs is just the opposite of what a reasonable architecture would be: VMware has a barebones layer-2 switch (although it does have a few interesting features) with another non-scalable layer (vCDNI) on top of (or below) it. The networking vendors are inventing all sorts of kludges of increasing complexity to cope with that, from VN-Link/port extenders and EVB/VEPA to large-scale L2 solutions like TRILL, Fabric Path, VCS Fabric or 802.1aq, and L2 data center interconnects based on VPLS, OTV or BGP MAC VPN.

In the meantime, all the proprietary fabric solutions have died, and networking vendors are pushing another overly-complex architecture: EVPN control plane with VXLAN encapsulation. VMware still makes their lives easy by insisting on having the same MAC/IP addresses on all server uplinks.

I don’t expect the situation to change on its own. VMware knows server virtualization is just a stepping stone and is already investing in PaaS solutions; the networking vendors are more than happy to sell you all the extra proprietary features you need just because VMware never implemented a more scalable solution, increasing their revenues and lock-in. It almost feels like the more “network is in my way” complaints we hear, the happier everyone is: virtualization vendors because the blame is landing somewhere else, the networking industry because these complaints give them a door opener to sell their next-generation magic (this time using a term borrowed from the textile industry).

Imagine for a second that VMware or Citrix would actually implement a virtualized networking solution using IP transport between hypervisor hosts. The need for new fancy boxes supporting TRILL or 802.1aq would be gone, all you would need in your data center would be high-speed simple L2/L3 switches. Clearly not a rosy scenario for the flat-fabric-promoting networking vendors, is it?

VMware did just that with VMware NSX, and set the license prices so high that it’s rarely used. Even worse, VMware NSX requires all server uplinks to be in the same VLAN, so it cannot run over a simple layer-3-only data center network.

Is there anything you can do? Probably not much, but at least you can try. Sit down with the virtualization engineers, discuss the challenges and figure out the best way to solve problems both teams are facing. Engage the application teams. If you can persuade them to start writing scale-out applications that can use proper load balancing, most of the issues bothering you will disappear on their own: there will be no need for large stretched VLANs and no need for L2 data center interconnects. After all, if you have a scale-out application behind a load balancer, nobody cares if you have to shut down a VM and start it in a new IP subnet.

Revision History

Added a few notes on virtual network evolution between 2011 and 2022.


  1. Hi Mate,
    You must come visit us in our little backwater and you can see a large scale L2 network using
    Yours in Cow dung
  2. Ivan,

    There is a problem with complexity residing at the edge: complexity is inevitably expensive (in all aspects) and the edge is the largest part of any structure; which kills the economics of the solution.

    Otherwise, there's no real barrier here - nothing stops somebody from writing a vSwitch replacement with the necessary smarts.
  3. One of your best posts yet. Every new Network Engineer should read it. I interview people all the time that think whatever the new protocol of the month is will solve all their problems. In the end, if they want to get big, they will need to get rid of those problems and worry about getting big.
  4. Dmitri,
    The edge is where you have the most free CPU cycles, the most free memory, free buffer space, the greatest horizontal scale. The edge is where you have the post power to do cool things. The more you make the network do, the more it has to do. It can only do so much.
  5. Dmitri,

    Looking back, there were at least three large-scale technologies relying in simple-edge/complex-core architecture: X.25, SNA and voice circuit switching. Two of them are dead and the third one is not doing so well.

    As Peter pointed out, it's cheaper to do a bit of complexity in every edge device (because there are so many of them) than doing a lot of complexity in high-speed core devices. Case in point: TCP versus X.25.

    As for rewriting vSwitch, that's exactly what Amazon did for AWS to make it scale.
  6. Believe it or not, it just might happen. More details to follow.
  7. Peter,

    When we are talking about virtualisation environment, I would have thought that the whole idea was to have as little free CPU cycles, memory and buffer space, as possible.

    I understand where you coming from when you're saying that the edge is the place to do cool things, I'm just not too sure about this particular case.
  8. Isn't is the same with the IP? If you look at say triple play environment, the complexity there is actually in the *service* edge, which is quite removed from the actual network edge (CE), which is quite dumb.

    I'm not averse to the idea, in fact far from it (I pointed out my work on smart NTU for Carrier Ethernet services before). I'm just dubious about the idea of turning the whole thing on its head, where there are essentially more devices with more state and more things to go wrong.
  9. Ivan doesn't pull any punches with his rant on where network complexity belongs...
  10. In most cases you run out of memory/IO way before you overload the CPU. Also, vSwitch already does a number of cool things they have to do (in/out rate limiting, for example).
  11. Interesting & valid counter-example. Thank you!
  12. Hi Ivan,
    You said: "Imagine for a second ... a virtualized networking solution using IP transport between hypervisor hosts"
    You are spot on. This is exactly where things are headed.
  13. RFC3439 is a classics - look who co-authored it :)
  14. "As for rewriting vSwitch, that's exactly what Amazon did for AWS to make it scale."

    1. You know something more? Willing to share?
Add comment