Open vSwitch Under the Hood

Hatem Naguib claimed that “the NSX controller cluster is completely out-of-band, and never handles a data packet” when describing VMware NSX Network Virtualization architecture, preemptively avoiding the “flow-based forwarding doesn’t scale” arguments usually triggered by stupidities like this one.

Does that mean there’s no packet punting in the NSX/Open vSwitch world? Not so fast.

First, to set the record straight, NVP OpenFlow controller (NSX controller cluster) does not touch actual packets. There’s no switch-to-controller punting; NVP has enough topology information to proactively download OpenFlow flow entries to Open vSwitch (OVS).

However, Open vSwitch has two components: the user-mode daemon (process switching in Cisco IOS terms) and the kernel forwarding module, which implements per-flow forwarding and corresponding actions, not the full complement of OpenFlow matching rules.

There's a third component present in every OVS environment: the ovsdb (OVS database) daemon, but it's not relevant to this discussion, so we'll conveniently ignore it.

Whenever the first packet of a new flow passes through the Open vSwitch kernel module, it’s sent to the Open vSwitch daemon, which evaluates the OpenFlow rules downloaded from the OpenFlow controller, accepts or drops the packet, and installs the corresponding per-flow forwarding rule into the kernel module.

Does this sound similar to Multi-Layer Switching or the way Cisco’s VSG and Nexus 1000V VEM work? It’s exactly the same concept, implemented in kernel/user space of a single hypervisor host. There really is nothing new under the sun.

I would strongly recommend you read the well written developer documentation if you want to know the dirty details.

This approach keeps the kernel module simple and tidy, and allows the Open vSwitch architecture to support other flow programming paradigms, not just OpenFlow – you can use OVS as a simple learning bridge supporting VLANs, sFlow and NetFlow (not hard once you’ve implemented per-flow forwarding), or you could implement your own forwarding paradigm while leveraging the stability of Open vSwitch kernel module that’s included with version 3.3 of the Linux kernel and already made its way into standard Linux distributions.

Just to give you an example: Midokura chose to use the Open vSwitch kernel module in combination with their user-mode daemon in the MidoNet product – you can install MidoNet on recent Linux distributions without touching the kernel. Smart move ;)

2013-08-09: Changed the description of ovs-vswitchd. According to the recent list of OVS features the only control-plane protocol it runs is LACP.

4 comments:

  1. The Open vSwitch kernel module technique, of sending the first packet of each microflow to userspace, works really well in a variety of situations, but there are still some where we need better performance. We're working on a couple of different approaches for Open vSwitch 1.12 and later. My talk at HackerDojo in March covered some of this toward the end (you can find the slides on openvswitch.org's Documentation page and I think that video is on youtube somewhere as well).
  2. Great post. Cleared things up for me. I had been assuming standard MAC learning was still being done local to a tenant's OVS kernel module.

    On the other hand, you compare it to VSG. One is sold as a [light weight] FW and one is not although Nicira does have "Cloud Network Security" on their website. Interesting.

    I suspect after reading this post that VMware will go from offering a per-host user space FW like vShield App to a full blown stateful distributed FW in kernel space that dynamically builds its policy as new flows are checked. Their technology can do this already - just need to add inspection engines.

    -Jason
  3. OVS does not yet support the concept of Groups as described in the OpenFlow 1.1 onward specifications. Using Groups one can forward packet to multiple ports I think. I want to send a packet to the controller as well as forward it to a different port on the switch. I am using OVS as my switch. Now since I can only think of using Groups as the method to forward pkts to both the controller and the port and since OVS does not support Groups, is there a way to do this in the current version of OVS?
    Replies
    1. There are a few mailing lists dedicated to OVS and monitored by OVS developers. Don't you think that would be a better place to ask such a detailed question?
Add comment
Sidebar