Cloud Orchestration System Is an Ideal Controller Use Case
A while ago I explained why OpenFlow might be a wrong tool for some jobs, and why centralized control plane might not make sense, and quickly got misquoted as saying “controllers don’t scale”. Nothing could be further from the truth, properly architected controller-based architectures can reach enormous scale – Amazon VPC is the best possible example.
Totally unrelated note to bloggers: please don’t use marketing whitepapers disguised as technical documents as counterarguments in technology-focused discussions.
Cloud Orchestration System as Overlay Virtual Networking Controller
The orchestration system in an IP-aware IaaS cloud architecture has all the information we need to set up forwarding entries in an overlay virtual networking implementation:
- Hypervisor-to-VTEP (transport IP address) mapping
- VM-to-hypervisor or container-to-host mapping
- MAC-to-VM or MAC-to-container mapping
- IP-to-VM or IP-to-container, and consequently IP-to-MAC (ARP) mapping
- Subnets and other connectivity needs of individual tenants
- Security requirements of individual VMs and tenants
Dynamic/floating IP addresses and VM mobility might introduce some hiccups into this rosy picture, but let’s ignore them for a moment.
Some cloud orchestration systems push this information straight into hypervisors (example: Hyper-V System Center Virtual Machine Manager). More scalable architectures replace a single instance of the orchestration system with a scale-out controller cluster relying on back-end database (probably what Amazon VPC and Azure are using). For extra boost in scalability, replace transactional back-end database with eventually consistent distributed database, which is usually good enough in large-scale UDP clouds (don't tell me I just reinvented MPLS/VPN - I'm well aware of that analogy ;).
Other implementations use more convoluted approaches, from layered controllers (example: NSX controller for OpenStack) to centralized control planes (example: Cisco Nexus 1000V). Layered controllers add complexity, but still perform remarkably well as long as they stay on the management plane. The moment a controller starts dealing with the real-time aspects of control- or data plane, its scalability plummets.
A Few Data Points
What I wrote above should be common sense to anyone who spent time researching or implementing large-scale networking architectures. Do we see the same trend in real-life implementations? Here are some data points from well-known commercial products.
Products that stay out of the control- and data plane:
- A cluster of three NSX controllers can manage up to 3000 hosts, a cluster of five controllers up to 5000 hosts (supported numbers given in NSX release notes are lower, but you get the idea);
- A single System Center Virtual Machine Manager (using Hyper-V Powershell API) can manage up to 400 hosts (this is not a hard number);
- VMware virtual distributed switch (vDS) can span 1000 hosts with vSphere 5.5 (350 in vSphere 5.1);
Products with centralized control plane:
- Cisco Nexus 1000V VSM can control 128 hosts in recent release (64 hosts in older releases);
- Last I heard ProgrammableFlow controller controls up to 200 switches.
I never got the maximum number of Hyper-V virtual switches PF6800 controller supports, the online brochure has zero technical details, and the documentation is still not public.
Comparing vDS and Nexus 1000V maximums is particularly entertaining. You could believe that:
- VMware understands networking better than Cisco does;
- VMware programmers write better networking code than Cisco’s programmers;
- VMware cares more about scalability of virtual networking than Cisco
… or you could accept the fact that there are some fundamental architectural differences between the two products that affect scalability. Do I need to say more?
More details
Check out my cloud computing webinars – you can buy them individually or in a bundle, or get access to all of them with the yearly subscription. I’m also available for short online consulting sessions.
In a networking company, sometimes it is hard to accept a limit without networking features are configured. :)
Also, not sure to get the point of your post. If NSX, ACI, Hyper-V would be suboptimal, what would be a good SDN/Cloud architecture today? Should we treat the SDN part independently of the cloud orchestration?
What "suboptimal" is depends on your environment. Most data centers (apart from a few outliers) don't have more than 10K VMs, which can easily fit into 200 properly sized physical hosts, so almost any product I mentioned can handle them.
If you need more, then you probably know how to get there.
Ah, the point... there are some controllers that scale to gigantic proportions in real-life production environment, and then there are some things that can never scale due to the architecture they use.
SDN's purpose is to provide non-network engineer persons with the ability to create/manage/orchestrate their own network needs. VMware and Microsoft are much better add driving this paradigm as they are NOT network focused. They are service focused and want to enable their customers to manage the network like any other wizard.
So, when Cisco (et. al.) create a programmable single dashboard from which one can define an end to end network, then they will have an SDN technology. If we look at VMware, their SDN solution allows someone vaguely familiar with networking to define segments, subnets, routes, firewalls, and load balancing with a guided wizard - how many segments? connected or isolated? how many hosts? At the end, it presents a basic graphical representation of where your servers are and the network behind them. If you like it, click go and the configuration is deployed.
This is SDN. Treating the network as a consumable resource molded to the requirements of its consumers on the fly. All the other "stuff" usually comes from network hardware vendors trying to remain relevant. Rather SDN is accomplished in software or hardware is really unimportant. However, as virtualization continues to reach deeper into the enterprise stack, software will be more likely.
Everything works well in PowerPoint - vCDNI looked fantastic, as did a few other already-dead products.
What I did enjoy reading this week was Brocades technology to read VMware vDS environments and configure the physical network accordingly. That was pretty slick.