Nexus 1000V LACP offload and the dangers of in-band control

A while ago someone sent me the following comment as part of a lengthy discussion focusing on Nexus 1000V: “My SE tells me that the latest 1000V release has rewritten the LACP code so that it operates entirely within the VEM. VSM will be out of the picture for LACP negotiations. I guess there have been problems.

If you’re not familiar with the Nexus 1000V architecture, read this post first. If you’re not convinced you should be running LACP between the ESX hosts and the physical switches, read this one (and this one). Ready? Let’s go.

Now imagine you’ve just installed the Nexus 1000V software on a bunch of ESX hosts and decided to enable LACP to support proper link aggregation between the hosts and a redundant pair of switches running multi-chassis link aggregation. All the soft switches (VEMs) are controlled by a VSM through the control VLAN, all the configuration is centralized on the VSM (where you can use show running to look at it without getting the carpal tunnel syndrome navigating the GUI); life is good.

Next, an ESX host reloads. The VEM module is loaded into the VMware kernel during the startup process, and tries to contact the VSM ... but it can’t. The network interfaces are not operational; the switch is waiting for the LACP negotiation to finish and VSM won’t even start the LACP negotiation until it’s configured to do so by the VSM.

There are two solutions to this problem:

Build an out-of-band control network. Install additional NICs in the ESX host and use them for control traffic (communication with vCenter and VSM). Obviously you’d need two additional NICs for redundancy. Not a big deal if your hardware supports virtual NICs (like Cisco UCS); in most other cases installing two extra NICs (and consuming two more switch ports per server) just to get the management traffic going doesn’t sound attractive if you’re on the buying side.

Make the switching element self-sufficient. This is exactly what Cisco did with LACP offload. Once you configure ESX host NICs in a port channel, VSM stores the LACP configuration in the local VEM settings. As part of the LACP offload functionality, the LACP code was ported to VEM module, allowing VEM to complete LACP negotiation with the upstream physical switch without VSM involvement (without LACP offload, all LACP packets are forwarded to the VSM through the packet VLAN and processed by the VSM).

Is this a Cisco-specific problem?

Absolutely not. LACP offload is just a simple manifestation of a fundamental problem well-known to operators of ATM or SONET/SDH networks: it’s hard to implement distributed switching architecture with central controller (like Nexus 1000V VSM or OpenFlow controller) without having completely independent out-of-band control network or at least some local intelligence in the switching elements – yet another “trivial” detail that’s usually glossed over in OpenFlow discussions.

Even more information

I’ll talk about networking requirements for cloud computing at the upcoming EuroNOG conference.

You’ll find big-picture perspective as well as in-depth discussions of various data center and network virtualization technologies (including Nexus 1000V, 802.1Qbg, 802.1Qbh, VN-Tag, adapter FEX and VM-FEX) in my webinars: Data Center 3.0 for Networking Engineers (recording) and VMware Networking Deep Dive (recording or live session). Both webinars are also available as part of the yearly subscription and Data Center Trilogy.

1 comment:

  1. Ok and with the version 1.4a you have also the SSU update which is pretty good with the DRS

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.