Hypervisors use promiscuous NIC mode – does it matter?

Chris Marget sent me the following interesting observation:

One of the things we learned back at the beginning of Ethernet is no longer true: hardware filtering of incoming Ethernet frames by the NICs in Ethernet hosts is gone. VMware runs its NICs in promiscuous mode. The fact that this Networking 101 level detail is no longer true kind of blows my mind.

So what exactly is going on and does it matter?

Ethernet was designed as a shared media (remember the thick coax cables with vampire transceivers?) and even though switched Ethernet (aka bridging) gave us more bandwidth, it still emulates the coax cable – frames sent to broadcast, multicast and unknown unicast addresses are flooded to all hosts (the behavior that makes brokenware like Microsoft NLB work). There is no communication between Ethernet hosts and bridges (remember: bridges emulate a single cable and a cable cannot talk to stations attached to it) and thus the switches have no way of knowing whether those frames are important to the end hosts or not (IGMP snooping is one of those kludges that is supposed to make a broken design a bit less dreadful).

In the shared media environment of early Ethernet it was very important that the frames not meant for an individual end-station do not burden its CPU. Frame filtering based on destination MAC address was thus always implemented in hardware (the same is true for almost all multi-access L2 technologies). An Ethernet NIC was always able to listen to one (or a few) unicast MAC addresses and a few multicast MAC addresses (the limits are vendor-dependent); everyone would obviously have to process broadcast frames.

A typical end-host has a single MAC address. Some hosts running DECnet Phase IV would have two (DECnet Phase IV had nothing like ARP or ND and changed MAC address to match its L3 address) as would clustered hosts sharing a single IP address (for example, Windows servers running Microsoft NLB or even Cisco routers running HSRP or GLBP).

The server virtualization has completely changed the Ethernet NIC requirements – with tens of VMs running within the same physical host, its Ethernet NIC has to be able to receive frames for tens (or sometimes even more than a hundred) unicast MAC addresses. Most Ethernet NICs are not able to do it (if you know more, please write a comment; it would be interesting to learn whether VN-Link/VM-FEX improves things) and thus most hypervisor operating systems put the Ethernet NICs in promiscuous mode. Every single frame sent to the NIC (those sent to the VMs running in the host as well as all floods) has to be processed by the host CPU, regardless of whether it’s relevant or not.

The use of promiscuous mode has made hypervisor hosts slightly more vulnerable to flooding storms. In the pre-virtualization days, the links in the network could become overloaded after a forwarding loop, but the Ethernet NICs would do most of the damage control (unless, of course, you experience a broadcast storm, in which case nothing can save you). In a network having primarily virtualized servers running in hypervisors, the host CPU has to deal with all the looped frames. Don’t ask me whether this fact is relevant for your network or not – you might decide you have bigger problems than CPU overload if you experience a forwarding loop, or you might feel that this fact alone should push you toward smaller subnets (and thus broadcast domains).

More information

If you’re interested in details of VMware networking, check my VMware Networking Deep Dive webinar (register here or buy a recording). If you want to learn more about modern data center architectures, buy a recording of my Data Center 3.0 for Networking Engineers webinar and check the Data Center webinar roadmap. Both webinars are also part of the yearly subscription package.

8 comments:

  1. Another nail in the coffin of L2 DCI :-D

    ReplyDelete
  2. Alexandra Stanovska06 July, 2011 13:33

    Nah, I am sure manufacturers come up with better, improved NICs that will "Optimize your Cloud experience by running promisc mode natively, thus offloading CPU" or something along that lines ;)

    ReplyDelete
  3. Hyper-V may not be in the same class as VMware but it is worth mentioning that I think this is not true for that product... I recently tried to deploy a Wireshark VM on Hyper-V only to find that it would not work because the virtual NIC was not promiscuous. I also recently implemented a VOIP recording package that did not support Hyper-V and the manufacturer said it was because of the same. I might be missing something but I think Hyper-V may in fact differ in this regard.

    ReplyDelete
  4. Dan (different one)07 July, 2011 06:15

    Do you mean that the vNICs are running in promiscuous mode?

    AFAIK, bridge's ports are running in promiscuous mode. Switch is just a multi port bridge.

    I would be very much surprised to find out that vNICs are _not_ running in promiscuous mode, since they are actually links to the internal virtual switch.

    So there is nothing new here and Ethernet 101 is still valid :)

    ReplyDelete
    Replies
    1. Dan,
      your statement is correct, however the term is slighly wrong which could cause confusion. "vNIC" is the virtual network card used by VM guests - which by default are not in promiscous mode. A "vmnic" is a physical NIC port, which as you say, connects to the internal vSwitch and needs to deliver all frames from the physical network up into the virtual switch - and from that the vmnic must be promiscous.

      Delete
  5. Hi Dan,
    Ivan's not talking about vNICs. Getting promiscuity working within a vSwitch is well documented in the intertubes.

    Hi Ivan,
    I'm glad you find the observation interesting enough to share.

    FWIW, I don't think (m)any NICs are able to listen to just "a few" *multicast* addresses, because multicast filtering is typically done with a hash bucket scheme. Unfiltering one group unfilters many groups.
    I've blogged about the situation here:
    http://www.fragmentationneeded.net/2010/10/vmware-runs-in-promiscuous-mode.html

    The NIC filtering is a topic of interest to me because I used to run big-scale multicast applications in environments where I didn't control the L2 topology. On occasion, multicast groups I wasn't interested in slipped through the hardware filtering and crushed the OS.

    I'm only aware of one NIC with lots of filtering capability:
    http://www.lhcomp.com/vendors/neterion/NeterionXframeIISunFireDataSheet.pdf
    • Unicast/Multicast Rx frame filtering for up to 256 address/mask pairs

    ReplyDelete
  6. I forgot to mention.

    STP TCN messages may now have *serious* implications for the health of your hypervisor because they un-cork the last remaining hardware-based frame filter.

    Enabling STP edge mode (portfast) on server ports is probably more important than ever. Don't forget the 'trunk' keyword where it's required.

    The same goes for IGMP snooping and (often overlooked) querying.

    ...and one of my favorite problems: asymmetric routing with mismatched arp/mac timeout.

    ReplyDelete
  7. Emre Sumengen14 July, 2011 15:50

    I like both the idea AND the pickup-line :)

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.