Why Is Network Virtualization So Hard?
We’ve been hearing how the networking is the last bastion of rigidity in the wonderful unicorn-flavored virtual world for the last few years. Let’s see why it’s so much harder to virtualize the networks as opposed to compute or storage capacities (side note: it didn’t help that virtualization vendors had no clue about networking, but things are changing).
When you virtualize the compute capacities, you’re virtualizing RAM (well-known problem for at least 40 years), CPU (same thing) and I/O ports (slightly trickier, but doable at least since Intel rolled out 80286 processors). All of these are isolated resources limited to a single physical server. There’s zero interaction or tight coupling with other physical servers, there’s no shared state, so it’s a perfect scale-out architecture – the only limiting factor is the management/orchestration system (vCenter, System Center …).
So-called storage virtualization is already a fake (in most cases) – hypervisor vendors are not virtualizing storage, they’re usually using a shared file system on LUNs someone already created for them (architectures with local disk storage use some variant of a global file system with automatic replication). I have no problem with that approach, but when someone boasts how easy it is to create a file on a file system as compared to creating a VLAN (= LUN), I get mightily upset. (Side note: why do we have to use VLANs? Because the hypervisor vendors had no better idea).
There’s limited interaction between hypervisors using the same file system as long as they only read/write file contents. The moment a hypervisor has to change directory information (VMware) or update logical volume table (Linux), the node doing the changes has to lock the shared resource. Due to SCSI limitations, the hypervisor doing the changes usually locks the whole shared storage, which works really well – just ask anyone using large VMFS volumes accessed by tens of vSphere hosts. Apart from the locking issues and shared throughput (SAN bandwidth and disk throughput) between hypervisor hosts and storage devices, there’s still zero interaction between individual VMs or hypervisors hosts – scaling storage is as easy (or as hard) as scaling files on a shared file system.
In the virtual networking case, there was extremely tight coupling between virtual switches and physical switches and there always will be tight coupling between all the hypervisors running VMs belonging to the same subnet (after all, that’s what networking is all about), be it layer-2 subnet (VLAN/VXLAN/…) or layer-3 routing domain (Hyper-V).
Because of the tight coupling, the virtual networking is inherently harder to scale than the virtual compute or storage. Of course, the hypervisor vendors took the easiest possible route, used simplistic VLAN-based layer-2 switches in the hypervisors and pushed all the complexity to the network edge/core, while at the same time complaining how rigid the network is compared to their software switches. Of course it’s easy to scale out totally stupid edge layer-2 switches with no control plane (that have zero coupling with anything else but the first physical switch) if someone else does all the hard work.
Not surprisingly, once the virtual switches tried to do the real stuff (starting with Cisco Nexus 1000v), things got incredibly complex (no surprise there). For example, Cisco’s Nexus 1000V only handles up to 128 hypervisor hosts (because the VSM runs the control plane protocols). VMware NSX is doing way better because they decoupled the physical transport (IP) from the virtual networks – controllers are used solely to push forwarding entries into the hypervisors when the VMs are started or moved around.
Summary: Every time someone tells you how network virtualization will get as easy as compute or storage virtualization, be wary. They probably don’t know what they’re talking about.
(Warning..RANT coming)
I also think we impose some old world 'networking' because we are comfortable with it. There are some things we should probably just let go of in our networking architectures which would make things much better for us in the network virtualization space. Some of the complexity is currently a self inflicted wound which we accepted because of the technology of yesterdays past (or more gear gets sold that way).
Networks are really about deploying applications workloads, and not there just to be networks for networking sake. We all like our networking jobs, but when we forget that networking is a means to an end, we concentrate too much on the means.. and not the ends.
You stated... "there always will be tight coupling between all the hypervisors running VMs belonging to the same subnet". I guess for me it begs the question about how far paravirtualization should go up the stack. If we have forced the guest instance OS to understand it was on a hypervisor for storage and for network performance, why don't we force them to understand the network a bit further and assist with the network virtualization process too. If you want to move an application instance, let the guest OS help. Assuming that it must be decoupled from network intelligence is an old world idea.
There are many guest OSes which already demand an HVM model so they can have access to specific new compute functionality, why would we not expect to see networking follow in the same model. It breaks the OS virtualization model, but so what. So do a ton of things these days. Guest agents are doing more and more.
The 'subnet' coupling is an artificial boundary we've imposed because we still live with the sins of legacy networking support at the guest OSes..right? I am sure we will have that 'sin' around for many, many years (heck.. OSes have IPv6 support and we all see how well the networking community has embraced that..). However, new kernel networking functionality and application stacks can go a long way to understanding the underlying networking and interact with it in a way that could remove the need for the complexity in the network virtualization market too.
To me, network virtualization needs to understand much more than the 'coupling of subnets' to support advanced application workloads efficiently. For some of what it will take to have the network be workload aware, you will have to ask the guest OSes and even applications on them for help.
There are lovely things in the networking world now which hold application state and can provide application level control that we'll use to get us over the hump. We should however start pushing the guest OSes to be smarter..not just the hypervisor and soft switching crowd. Let's start rants about advancing linux netstack to support the network virtualization functionality for smarter application stacks. We can start getting the world behind the effort and provide models to follow. If we are trying to move the complexity to the edge for control and stability... then move it to the edge. The near edge approach is a great stepping stone. Our hypervisors and soft switches are great middleware. They don't comprise the whole application flow however. We should view it as such.
Why stop SDN at the network virtualization hurdle.
I think people are trying to make network virtualization too hard and push it into a PC or VMware. Why not virtualize multiple routers in a single Cisco router? Why not let Cisco Nexus Switches virtualize switching, and even integrate virtual switches into VMware?
Cisco's "FlexPod" design is now starting to become a 'buzz-word' and the sales teams are jumping on with Cisco Nexus but it's still underutilized IMHO. What it can do today probably wont be fully realized for another 5+ years. =[ Virtualization is the future and this is just the beginning!
The answer is always easy when you know it!
If you read this article as NX1KV versus NSX comparison, you totally absolutely missed the point. I just used them as examples illustrating why total decoupling scales better than tight(er) vSwitch-pSwitch coupling. I also used the latest publicly-available documentation or scalability results for both platforms (NVP 3.2 in case of NSX).
I'm positive Cisco is working on Nexus 1000V improvements, and once they ship, I'll be more than happy to write about them, like I wrote about unicast VXLAN and improved scalability (64 ==> 128 hosts). In the meantime, the numbers speak for themselves.
Finally, if you happen to be working for one of the vendors mentioned in this blog post, it would be fair to disclose that.
Best,
Ivan
I think a good related question is why it is new - the vswitch has been around for over a decade, why are we finally doing something with it now? And part of that may have to do with newer technologies becoming available, but it is mostly because, imho, VMware didnt want to piss off Cisco, and Cisco had to try to figure out a way to do it without disrupting or their margin machine and preserving the value-added features they wanted their hardware to be providing - thus a decade after virtualization became popular we are just now finally figuring out the network problems that have been present for virtualized environments from the very beginning.
Not that they didnt try, SONA and/or modern switch API's was probably a very fitting thing for networks circa 2005, but it was plagued, imho, because of Cisco's desire to make everything proprietary and thus inability to build a meaningful ecosystem. That and a completely network-centric view of technology that tried to position the network as the answer to all problems. Then the more recent attempts have been aimed at locking down vm's to specific hardware which Cisco loves but VMware has been against. Glad that today progress is finally being made.
Thanks for the comment. I think you're underestimating the difference between server and network virtualization - it's (fundamentally) way easier to implement a system of many isolated components than a system of tightly coupled components. The complexity of one versus the other has nothing to do with how new they are.
Also, don't blame it on Cisco:
* VMware wasn't the only virtualization vendor using VLAN-based virtual networking approach - nobody had a clue. The only ones that had the guts to invent something radically new (and scalable) were the engineers designing Amazon's VPC.
* Networking industry is full of MacGyvers who want to solve everything within the network. It started (at least) with the invention of transparent bridge that "solved" the problem of two broken protocols (LAT and MOP) and continues to these days with all sorts of kludges (LISP and MIP come to mind) that try to bypass brokenness of TCP stack.
Did Cisco encourage this behavior? Sure. But so did every other vendor in the networking industry. When was the last time you've seen a vendor telling their customer "this is not how it's done"? It was probably IBM sometime in the '80s.