VM-FEX – not as convoluted as it looks

Reading Cisco’s marketing materials, VM-FEX (the feature probably known as VN-Link before someone went on a FEX-branding spree) seems like a fantastic idea: VMs running in an ESX host are connected directly to virtual physical NICs offered by the Palo adapter and then through point-to-point virtual links to the upstream switch where you can deploy all sorts of features the virtual switch embedded in the ESX host still cannot do. As you might imagine, the reality behind the scenes is more complex.

The first picture shows the mental model of VM-FEX architecture I would get after reading high-level whitepapers. According to this mental model, some handwave magic would automatically provision the virtual NIC and the upstream switch every time a new VM is started or vMotioned into an ESX host.

The second picture shows the reality: the control- and management-plane flows that have to take place for VM-FEX to work.

Before you can start deploying VM-FEX, the virtual Ethernet (vEthernet) adapters (on Palo NIC) used by VM-FEX have to be pre-provisioned by the UCS Manager; SR-IOV could be used to make changes in real time, but it’s not supported by vSphere.

You might have to reload your physical server before the changes take effect. However, due to the way UCS manager allocates PCI resources, the previously-created vEthernet/HBA adapters won’t change (your server will continue to work after the reload).

When a new VM NIC has to be activated (due to VM startup or vMotion event), the following events take place:

  • Whenever a new VM is moved to or started in an ESX host, vCenter changes port state in a vDS port group;
  • ESX signals the port change to the vDS kernel module (Virtual Ethernet Module – VEM). VEM thus learns the port group name and the port number of the newly-enabled virtual port.
  • VEM selects a free vEthernet adapter and establishes a link between VM’s virtual NIC and the vEthernet adapter.
  • VEM propagates the change to the UCS manager.
  • UCS manager configures a virtual port corresponding to the newly-activated vEthernet adapter in the upstream switch (Nexus 6100);

Even though both Nexus 1000V and VM-FEX use VEM kernel module, you don’t need Nexus 1000V to implement VM-FEX. Actually, you have to select one or the other today; you cannot run both in the same host (it would make no sense anyway).

There are a few architectural reasons for the complex architecture used by VM-FEX:

You cannot tie a VM to a physical NIC. The vEthernet NIC created on the Palo NIC appears as regular physical NICs to the operating system (ESX). You cannot tie a VM directly to a physical NIC; although the VMDirectPath allows a VM to use physical hardware, it also disables vMotion for that VM. Therefore even though VMs use physical NICs, we still need a kernel module (VEM) that acts like a patch panel and shuffles the data between VM virtual NIC drivers and physical NIC.

vSphere 5 can support vMotion with VMDirectPath for vEthernet adapters.

ESX cannot create new vEthernet NICs. Although you could create hardware on demand with the SR-IOV technology, neither ESX nor Palo adapter support SR-IOV at the moment. The only way to create a new vEthernet adapter on the Palo adapter is thus from the outside (through UCS manager).

vSphere/ESX host cannot signal to the upstream switch what it needs. What we would need to implement vEthernet integration properly is EVB’s VSI Discovery Protocol (VDP). VDP is not implemented by vSphere, so VM-FEX needs a vDS replacement (VEM) that provides both data-plane pass-through functionality and control-plane communication with the upstream devices.

In the initial implementation of VM-FEX, VEM communicates with the UCS Manager. According to Shrijeet Mukherjee from Cisco, the communication will take place directly between VEM and upstream Nexus 6100 switch in the 2.x UCS software release.

Actually, Cisco had to implement functionality equivalent to both 802.1Qbh/802.1BR standard (VN-Tag – support for virtual link tagging) and parts of 802.1Qbg (VDP) to get VM-FEX up and running.

Update 2011-09-23: Shrijeet Mukherjee (Director of Engineering, Virtual Interface Card @ Cisco) kindly helped me understand the technical details of the VM-FEX architecture. I updated the post based on that information.

More information

You’ll find in-depth description of Adapter FEX, VM-FEX, Nexus 1000V and EVB/VEPA in my VMware Networking Deep Dive (recording or live session) webinar. Data center architectures and virtual networking are also described in Data Center 3.0 for Networking Engineers (recording). Both webinars are available as part of the yearly subscription.

8 comments:

  1. What about NIV or whatever it is ultimately going to be called? Seems this is a very simple idea that should have been implemented a long time ago. Trouble is that it probably renders Nexus 100V and VDS and all of this other VN-Link garbage useless. VMwares networking has always seemed unnecessarily convoluted. This new VN-Link umbrella just seems like a nightmare and a money-pit.

    ReplyDelete
  2. Ivan Pepelnjak21 August, 2011 21:15

    What exactly do you have in mind when mentioning NIV? If it's something similar to what Scott Lowe described a while ago (http://blog.scottlowe.org/2010/03/16/understanding-network-interface-virtualization/), that idea is more-or-less how VM-FEX works.

    ReplyDelete
  3. Juan Tarrio Brocade20 September, 2011 15:12

    So in this case what happens to the hypervisor vSwitch? Does it exist? Is it used? Does it have to be a Nexus 1000V?

    ReplyDelete
  4. The passthrough VEM is a loadable ESX kernel module and does not require Nexus 1000V.

    The way I understand how VM-FEX works, VEM bypasses the vSwitch forwarding mechanisms, but not the control/management plane.

    The vSwitch (actually vDS) still exists (VEM is hidden inside it from the vCenter perspective), but the packets follow a different path (using logical NICs) than they would otherwise.

    ReplyDelete
  5. Juan Tarrio Brocade23 September, 2011 13:19

    Thanks Ivan, that's a lot clearer now. So without direct mapping of the vEth interface to the VM the server CPU is sill being hit with every single network I/O in and out of the server (and between VMs)?

    ReplyDelete
  6. That's absolutely true, although they can do true passthrough with vSphere 5 (see http://www.cisco.com/en/US/prod/collateral/modules/ps10277/ps10331/white_paper_c11-618838.html). Will blog about that in a few days.

    ReplyDelete
  7. Juan Tarrio Brocade23 September, 2011 15:54

    Thank you!

    ReplyDelete
  8. I'm not sure what abilities existed at the end of 2011, but now they are more advanced:
    1) U may easily build VM-FEX on UCS C-series
    2) U may do it w/o UCSM which requires FIs
    3) U may easily do vMotion of VM w/ DirectPathIO w/ VM-FEX

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.