VLANs are the wrong abstraction for virtual networking

Are you old enough to remember the days when operating systems had no file system? Fortunately I never had to deal with storing files on one of those (I was using punch cards), but miraculously you can still find the JCL DLBL/EXTENT documentation online.

On the other hand, you probably remember the days when a SCSI LUN actually referred to a physical disk connected to a computer, not an extensible virtual entity created through point-and-click exercise on a storage array.

You might wonder what the ancient history has to do with virtual networking. Don’t worry we’re getting there in a second ;)

I still remember these monsters: 7.25 MB on six platters (source: Wikipedia)

When VMware started creating their first attempt at server virtualization software, they had readily available storage abstractions (file system) and CPU abstraction (including MS-DOS support under Windows, but the ideas were going all the way back to VM operating system on IBM mainframes).

Creating virtual storage and CPU environments was thus a no-brainer, as all the hard problems were already solved. Most server virtualization solutions use the file system recursively (virtual disk = file on a file system) and abstract the CPU by catching and emulating privilege-mode instructions (things got way easier with modern CPUs supporting virtualization in hardware). There was no readily-available networking abstraction, so they chose the simplest possible option: VLANs (after all, it’s simple to insert a 12-bit tag into a packet and pretend it’s no longer your problem).

The “only” problem with using VLANs is that they aren’t the right abstraction. Instead of being like files on a file system, VLANs are more like LUNs on storage arrays – someone has to provision them. You could probably imagine how successful the server virtualization would be if you’d have to ask storage administrators for a new LUN every time you need a virtual disk for a new VM.

So every time I see how the “Software-Defined Data Center [...] provides unprecedented automation, flexibility, and efficiency to transform the way you deliver IT” I can’t help but read “it took us more than a decade to figure out the right abstraction.” Virtual networking is nothing else but another application riding on top of IP (storage and voice people got there years before).

More information

If you’re attending Interop Las Vegas, drop by my Overlay Virtual Networking Explained session (and use DISPEAKER marketing code to get 25% discount on registration fees), or register for the Network Infrastructure for Cloud Computing workshop. If not, don’t worry – there will be an overlay networking webinar in September/October timeframe.


  1. VLANs tend to be a container for a subnet which is used to help scale routing (prefixes, keeping IP control plane less busy, etc). VLANs give you network containers (luns) while a bump-in-the-wire firewall like vGW would give you application-level/intended access (files) and maybe other useful actions. In cloud environments, load-balancing is trending towards falling into the application developers domain (where it rightfully belongs) using software like haproxy. Generally the creation of the network container/subnet is a separate workflow from the creation of role-based policy. I suspect this model won't change much even with hypervisor-based overlays, even down to a low level separation of forwarding vectors from ACLs. When you say VLANs are the wrong abstraction, I read it as you saying subnets are the wrong abstraction; but maybe that is what you are saying?
    1. Whether subnets (and inter-subnet firewalls) are the right abstraction or not is a different story (but thanks for another topic). While subnets help to scale routing at large scales, that problem might be less relevant within virtual networks, where every "Intranet" (app stack) has only a few hundred nodes.

      The 802.1Q VLANs are wrong abstraction because they tightly couple virtual constructs (virtual networks = files) with physical reality (VLAN = LUN).
  2. Ivan, is a VLAN not the right abstraction, or have we (yes that collective we) simply not created the translation, mapping and provisioning layers that come with files on a filesystem or VMs on a CPU?

    Outside of the limitation of the number (4K is not enough), if we had created tools and protocols that would hide all this, perhaps a VLAN would have been just fine. Kind of like what is happening now in overlays :-)
    1. Marten, we can't get away from the sad reality - we have failed as miserably as OS/360 with DLBL/EXTENT statements has failed when faced with early Unix, VAX/VMS or even MS-DOS.

      No amount of translation, mapping and provisioning will change the basic facts: like SAN and storage arrays have no business being involved in file creation and directory lookups, networking devices and transport fabrics have no business being tightly coupled to inter-hypervisor communication.
  3. @fullmesh - "In cloud environments, load-balancing is trending towards falling into the application developers domain (where it rightfully belongs) using software like haproxy." That depends on what you want to load balance, network traffic or applications like web traffic. Also VLANs aren't necessarily to scale routing, but to abstract a subnet across switch hardware.
    Ivan - an IP phone call is a VLAN that gets priority routing due to the time budget being so small. Are you saying VM's should speak via an API? The problems show up because you can have a fail-over VM in another data center with the same IP? Or the server can move to another part of the network ala 'vmotion' . . . The network needs to catch up or the problem needs to get fixed like DNS resolution, arp cache or whatever the issue is when the VM moves. Maybe the applications just need to do federation so it will matter less where it is. Maybe just by making the app less IP bound.
    1. The only way to move forward is to make applications less IP-bound and more fault-tolerant. Everything else is a kludge.

      As for "priority routing" - do you get that when writing a critical file content onto your disk? Do you care? Why not? ... and how is that relevant to virtual networks within a cloud-scale data center?
    2. To answer your question, yes. I can set QOS on disk access within the virutal environment. Some storage arrays support this QOS as well.

      The issue many virtual environments struggle with is that network solutions refuse to integrate. I would love to create a "virutal switch" and publish it to the physical network. Need a new segment(VLAN), FW policy, route, etc. to support a new workload? Why not provision it as part of the VM deployment?

      VLANs may or may not be the right solution. It is what we have at this time. IPv6 will change some of this, but segmentation is required by legacy security controls.

      If we look at networks as a highway/road system, then the controls lie at the edge (home/driveway) with monitors in the flow. Conversely, todays' networks assume the edge to be dumb with intellegience centralized.
  4. If "virtual networks" are nothing more than applications (presumably running in their own VM), then I assume you can create multiple virtual networks by running multiple instances of this application on the same physical machine (rather than configuring VLANs on a single instance of Open vSwitch, for example)?
Add comment