Complexity Belongs to the Network Edge
Whenever I write about vCloud Director Networking Infrastructure (vCDNI), be it a rant or a more technical post, I get comments along the lines of “What are the network guys going to do once the infrastructure has been provisioned? With vCDNI there is no need to keep network admins full time.”
Once we have a scalable solution that will be able to stand on its own in a large data center, most smart network admins will be more than happy to get away from provisioning VLANs and focus on other problems. After all, most companies have other networking problems beyond data center switching.
As for disappearing work, we’ve seen the demise of DECnet, IPX, SNA, DLSw and multi-protocol networks (which are coming back with IPv6) without our jobs getting any simpler, so I’m not worried about the jobless network admin. I am worried, however, about the stability of the networks we are building, and that’s the only reason I’m ranting about the emerging flat-earth architectures.
In 2002 IETF published an interesting RFC: Some Internet Architectural Guidelines and Philosophy (RFC 3439) that should be a mandatory reading for anyone claiming to be an architect of solutions that involve networking (you know who you are). In the End-to-End Argument and Simplicity section the RFC clearly states: “In short, the complexity of the Internet belongs at the edges, and the IP layer of the Internet should remain as simple as possible.”
We should use the same approach when dealing with virtualized networking: the complexity belongs to the edges (hypervisor switches) with the intervening network providing the minimum set of required services. I don’t care if the networking infrastructure uses layer-2 (MAC) addresses or layer-3 (IP) addresses as long as it scales. Bridging does not scale as it emulates a logical thick coax cable. Either get rid of most bridging properties (like packet flooding) and implement proper MAC-address-based routing without flooding, or use IP as the transport. I truly don’t care.
Reading RFC 3439 a bit further, the next paragraphs explain the Non-Linearity and Network Complexity. To quote the RFC: “In particular, the largest networks exhibit, both in theory and in practice, architecture, design, and engineering non-linearities which are not exhibited at smaller scale.” Allow me to paraphrase this for some vendors out there: “just because it works in your lab does not mean it will work at Amazon or Google scale.”
The current state of affairs is just the opposite of what a reasonable architecture would be: VMware has a barebones layer-2 switch (although it does have a few interesting features) with another non-scalable layer (vCDNI) on top of (or below) it. The networking vendors are inventing all sorts of kludges of increasing complexity to cope with that, from VN-Link/port extenders and EVB/VEPA to large-scale L2 solutions like TRILL, Fabric Path, VCS Fabric or 802.1aq, and L2 data center interconnects based on VPLS, OTV or BGP MAC VPN.
I don’t expect the situation to change on its own. VMware knows server virtualization is just a stepping stone and is already investing in PaaS solutions; the networking vendors are more than happy to sell you all the extra proprietary features you need just because VMware never implemented a more scalable solution, increasing their revenues and lock-in. It almost feels like the more “network is in my way” complaints we hear, the happier everyone is: virtualization vendors because the blame is landing somewhere else, the networking industry because these complaints give them a door opener to sell their next-generation magic (this time using a term borrowed from the textile industry).
Imagine for a second that VMware or Citrix would actually implement a virtualized networking solution using IP transport between hypervisor hosts. The need for new fancy boxes supporting TRILL or 802.1aq would be gone, all you would need in your data center would be high-speed simple L2/L3 switches. Clearly not a rosy scenario for the flat-fabric-promoting networking vendors, is it?
Is there anything you can do? Probably not much, but at least you can try. Sit down with the virtualization engineers, discuss the challenges and figure out the best way to solve problems both teams are facing. Engage the application teams. If you can persuade them to start writing scale-out applications that can use proper load balancing, most of the issues bothering you will disappear on their own: there will be no need for large stretched VLANs and no need for L2 data center interconnects. After all, if you have a scale-out application behind a load balancer, nobody cares if you have to shut down a VM and start it in a new IP subnet.
Revision History
- 2022-11-12
- Added a few notes on virtual network evolution between 2011 and 2022.
You must come visit us in our little backwater and you can see a large scale L2 network using http://media.ciena.com/documents/Ciena_Carrier_Ethernet_Service_Delivery_Portfolio_A4_PB.pdf
Yours in Cow dung
There is a problem with complexity residing at the edge: complexity is inevitably expensive (in all aspects) and the edge is the largest part of any structure; which kills the economics of the solution.
Otherwise, there's no real barrier here - nothing stops somebody from writing a vSwitch replacement with the necessary smarts.
The edge is where you have the most free CPU cycles, the most free memory, free buffer space, the greatest horizontal scale. The edge is where you have the post power to do cool things. The more you make the network do, the more it has to do. It can only do so much.
Looking back, there were at least three large-scale technologies relying in simple-edge/complex-core architecture: X.25, SNA and voice circuit switching. Two of them are dead and the third one is not doing so well.
As Peter pointed out, it's cheaper to do a bit of complexity in every edge device (because there are so many of them) than doing a lot of complexity in high-speed core devices. Case in point: TCP versus X.25.
As for rewriting vSwitch, that's exactly what Amazon did for AWS to make it scale.
When we are talking about virtualisation environment, I would have thought that the whole idea was to have as little free CPU cycles, memory and buffer space, as possible.
I understand where you coming from when you're saying that the edge is the place to do cool things, I'm just not too sure about this particular case.
I'm not averse to the idea, in fact far from it (I pointed out my work on smart NTU for Carrier Ethernet services before). I'm just dubious about the idea of turning the whole thing on its head, where there are essentially more devices with more state and more things to go wrong.
You said: "Imagine for a second ... a virtualized networking solution using IP transport between hypervisor hosts"
You are spot on. This is exactly where things are headed.
Really?