Cloud Orchestration System Is an Ideal Controller Use Case

A while ago I explained why OpenFlow might be a wrong tool for some jobs, and why centralized control plane might not make sense, and quickly got misquoted as saying “controllers don’t scale”. Nothing could be further from the truth, properly architected controller-based architectures can reach enormous scale – Amazon VPC is the best possible example.

Totally unrelated note to bloggers: please don’t use marketing whitepapers disguised as technical documents as counterarguments in technology-focused discussions.

Cloud Orchestration System as Overlay Virtual Networking Controller

The orchestration system in an IP-aware IaaS cloud architecture has all the information we need to set up forwarding entries in an overlay virtual networking implementation:

  • Hypervisor-to-VTEP (transport IP address) mapping
  • VM-to-hypervisor or container-to-host mapping
  • MAC-to-VM or MAC-to-container mapping
  • IP-to-VM or IP-to-container, and consequently IP-to-MAC (ARP) mapping
  • Subnets and other connectivity needs of individual tenants
  • Security requirements of individual VMs and tenants

Dynamic/floating IP addresses and VM mobility might introduce some hiccups into this rosy picture, but let’s ignore them for a moment.

Some cloud orchestration systems push this information straight into hypervisors (example: Hyper-V System Center Virtual Machine Manager). More scalable architectures replace a single instance of the orchestration system with a scale-out controller cluster relying on back-end database (probably what Amazon VPC and Azure are using). For extra boost in scalability, replace transactional back-end database with eventually consistent distributed database, which is usually good enough in large-scale UDP clouds (don't tell me I just reinvented MPLS/VPN - I'm well aware of that analogy ;).

Other implementations use more convoluted approaches, from layered controllers (example: NSX controller for OpenStack) to centralized control planes (example: Cisco Nexus 1000V). Layered controllers add complexity, but still perform remarkably well as long as they stay on the management plane. The moment a controller starts dealing with the real-time aspects of control- or data plane, its scalability plummets.

A Few Data Points

What I wrote above should be common sense to anyone who spent time researching or implementing large-scale networking architectures. Do we see the same trend in real-life implementations? Here are some data points from well-known commercial products.

Products that stay out of the control- and data plane:

Products with centralized control plane:

I never got the maximum number of Hyper-V virtual switches PF6800 controller supports, the online brochure has zero technical details, and the documentation is still not public.

Comparing vDS and Nexus 1000V maximums is particularly entertaining. You could believe that:

  • VMware understands networking better than Cisco does;
  • VMware programmers write better networking code than Cisco’s programmers;
  • VMware cares more about scalability of virtual networking than Cisco

… or you could accept the fact that there are some fundamental architectural differences between the two products that affect scalability. Do I need to say more?

More details

Check out my cloud computing webinars – you can buy them individually or in a bundle, or get access to all of them with the yearly subscription. I’m also available for short online consulting sessions.

6 comments:

  1. The testing methodology is very different. N1K scale limit shows when all the features are provisioned; vDS shows when the least features are used.

    In a networking company, sometimes it is hard to accept a limit without networking features are configured. :)
  2. Did you meant the 1000V used in ACI compared to NSX or without ACI?

    Also, not sure to get the point of your post. If NSX, ACI, Hyper-V would be suboptimal, what would be a good SDN/Cloud architecture today? Should we treat the SDN part independently of the cloud orchestration?


    Replies
    1. I meant the Nexus 1000V in its currently shipping and documented form, based on its most-recent release notes. I usually don't talk about futures.

      What "suboptimal" is depends on your environment. Most data centers (apart from a few outliers) don't have more than 10K VMs, which can easily fit into 200 properly sized physical hosts, so almost any product I mentioned can handle them.

      If you need more, then you probably know how to get there.

      Ah, the point... there are some controllers that scale to gigantic proportions in real-life production environment, and then there are some things that can never scale due to the architecture they use.
  3. I believe this highlights an issue with network engineers discussing SDN. The second word in SDN is Defined. When the conversation devolves into data-plane and flows, the train has already derailed.

    SDN's purpose is to provide non-network engineer persons with the ability to create/manage/orchestrate their own network needs. VMware and Microsoft are much better add driving this paradigm as they are NOT network focused. They are service focused and want to enable their customers to manage the network like any other wizard.

    So, when Cisco (et. al.) create a programmable single dashboard from which one can define an end to end network, then they will have an SDN technology. If we look at VMware, their SDN solution allows someone vaguely familiar with networking to define segments, subnets, routes, firewalls, and load balancing with a guided wizard - how many segments? connected or isolated? how many hosts? At the end, it presents a basic graphical representation of where your servers are and the network behind them. If you like it, click go and the configuration is deployed.

    This is SDN. Treating the network as a consumable resource molded to the requirements of its consumers on the fly. All the other "stuff" usually comes from network hardware vendors trying to remain relevant. Rather SDN is accomplished in software or hardware is really unimportant. However, as virtualization continues to reach deeper into the enterprise stack, software will be more likely.
    Replies
    1. While I perfectly agree with most of what you said (including the end goal), it is important that at least a few of us keep the laws of physics and real-life constraints in mind. You simply cannot count on the vendor marketing whitepapers to tell the whole story, and you cannot rely on every vendor architecting and implementing their solution correctly.

      Everything works well in PowerPoint - vCDNI looked fantastic, as did a few other already-dead products.
    2. Thank you, and I agree that all of these technologies (physical and virtual) must be validated. I have run a number of vCDNI implementations without issue, but there were not addons. The network was designed with the intent of vCDNI riding on top.

      What I did enjoy reading this week was Brocades technology to read VMware vDS environments and configure the physical network accordingly. That was pretty slick.
Add comment
Sidebar