Cloud Orchestration System Is an Ideal Controller Use Case

Friday, August 22, 2014 08:09 +0200

Cloud Orchestration System Is an Ideal Controller Use Case

A while ago I explained why OpenFlow might be a wrong tool for some jobs, and why centralized control plane might not make sense, and quickly got misquoted as saying “controllers don’t scale”. Nothing could be further from the truth, properly architected controller-based architectures can reach enormous scale – Amazon VPC is the best possible example.

Totally unrelated note to bloggers: please don’t use marketing whitepapers disguised as technical documents as counterarguments in technology-focused discussions.

Cloud Orchestration System as Overlay Virtual Networking Controller

The orchestration system in an IP-aware IaaS cloud architecture has all the information we need to set up forwarding entries in an overlay virtual networking implementation:

Hypervisor-to-VTEP (transport IP address) mapping
VM-to-hypervisor or container-to-host mapping
MAC-to-VM or MAC-to-container mapping
IP-to-VM or IP-to-container, and consequently IP-to-MAC (ARP) mapping
Subnets and other connectivity needs of individual tenants
Security requirements of individual VMs and tenants

Dynamic/floating IP addresses and VM mobility might introduce some hiccups into this rosy picture, but let’s ignore them for a moment.

Some cloud orchestration systems push this information straight into hypervisors (example: Hyper-V System Center Virtual Machine Manager). More scalable architectures replace a single instance of the orchestration system with a scale-out controller cluster relying on back-end database (probably what Amazon VPC and Azure are using). For extra boost in scalability, replace transactional back-end database with eventually consistent distributed database, which is usually good enough in large-scale UDP clouds (don't tell me I just reinvented MPLS/VPN - I'm well aware of that analogy ;).

Other implementations use more convoluted approaches, from layered controllers (example: NSX controller for OpenStack) to centralized control planes (example: Cisco Nexus 1000V). Layered controllers add complexity, but still perform remarkably well as long as they stay on the management plane. The moment a controller starts dealing with the real-time aspects of control- or data plane, its scalability plummets.

A Few Data Points

What I wrote above should be common sense to anyone who spent time researching or implementing large-scale networking architectures. Do we see the same trend in real-life implementations? Here are some data points from well-known commercial products.

Products that stay out of the control- and data plane:

A cluster of three NSX controllers can manage up to 3000 hosts, a cluster of five controllers up to 5000 hosts (supported numbers given in NSX release notes are lower, but you get the idea);
A single System Center Virtual Machine Manager (using Hyper-V Powershell API) can manage up to 400 hosts (this is not a hard number);
VMware virtual distributed switch (vDS) can span 1000 hosts with vSphere 5.5 (350 in vSphere 5.1);

Products with centralized control plane:

Cisco Nexus 1000V VSM can control 128 hosts in recent release (64 hosts in older releases);
Last I heard ProgrammableFlow controller controls up to 200 switches.

I never got the maximum number of Hyper-V virtual switches PF6800 controller supports, the online brochure has zero technical details, and the documentation is still not public.

Comparing vDS and Nexus 1000V maximums is particularly entertaining. You could believe that:

VMware understands networking better than Cisco does;
VMware programmers write better networking code than Cisco’s programmers;
VMware cares more about scalability of virtual networking than Cisco

… or you could accept the fact that there are some fundamental architectural differences between the two products that affect scalability. Do I need to say more?

More details

Check out my cloud computing webinars – you can buy them individually or in a bundle, or get access to all of them with the yearly subscription. I’m also available for short online consulting sessions.

Recent posts in the same categories

SDN

OpenFlow

cloud

6 comments:

Anonymous 22 August 2014 10:13

The testing methodology is very different. N1K scale limit shows when all the features are provisioned; vDS shows when the least features are used.

In a networking company, sometimes it is hard to accept a limit without networking features are configured. :)

NsL 22 August 2014 11:55

Did you meant the 1000V used in ACI compared to NSX or without ACI?

Also, not sure to get the point of your post. If NSX, ACI, Hyper-V would be suboptimal, what would be a good SDN/Cloud architecture today? Should we treat the SDN part independently of the cloud orchestration?

Replies

Ivan Pepelnjak 22 August 2014 18:38

I meant the Nexus 1000V in its currently shipping and documented form, based on its most-recent release notes. I usually don't talk about futures.

What "suboptimal" is depends on your environment. Most data centers (apart from a few outliers) don't have more than 10K VMs, which can easily fit into 200 properly sized physical hosts, so almost any product I mentioned can handle them.

If you need more, then you probably know how to get there.

Ah, the point... there are some controllers that scale to gigantic proportions in real-life production environment, and then there are some things that can never scale due to the architecture they use.

Anonymous 22 August 2014 15:18

I believe this highlights an issue with network engineers discussing SDN. The second word in SDN is Defined. When the conversation devolves into data-plane and flows, the train has already derailed.

SDN's purpose is to provide non-network engineer persons with the ability to create/manage/orchestrate their own network needs. VMware and Microsoft are much better add driving this paradigm as they are NOT network focused. They are service focused and want to enable their customers to manage the network like any other wizard.

So, when Cisco (et. al.) create a programmable single dashboard from which one can define an end to end network, then they will have an SDN technology. If we look at VMware, their SDN solution allows someone vaguely familiar with networking to define segments, subnets, routes, firewalls, and load balancing with a guided wizard - how many segments? connected or isolated? how many hosts? At the end, it presents a basic graphical representation of where your servers are and the network behind them. If you like it, click go and the configuration is deployed.

This is SDN. Treating the network as a consumable resource molded to the requirements of its consumers on the fly. All the other "stuff" usually comes from network hardware vendors trying to remain relevant. Rather SDN is accomplished in software or hardware is really unimportant. However, as virtualization continues to reach deeper into the enterprise stack, software will be more likely.

Replies

Ivan Pepelnjak 22 August 2014 18:42

While I perfectly agree with most of what you said (including the end goal), it is important that at least a few of us keep the laws of physics and real-life constraints in mind. You simply cannot count on the vendor marketing whitepapers to tell the whole story, and you cannot rely on every vendor architecting and implementing their solution correctly.

Everything works well in PowerPoint - vCDNI looked fantastic, as did a few other already-dead products.

Anonymous 24 August 2014 03:11

Thank you, and I agree that all of these technologies (physical and virtual) must be validated. I have run a number of vCDNI implementations without issue, but there were not addons. The network was designed with the intent of vCDNI riding on top.

What I did enjoy reading this week was Brocades technology to read VMware vDS environments and configure the physical network accordingly. That was pretty slick.

Add comment