VXLAN and EVB questions

Wim (@fracske) De Smet sent me a whole set of very good VXLAN- and EVB-related questions that might be relevant to a wider audience.

If I understand you correctly, you think that VXLAN will win over EVB?

I wouldn’t say they are competing directly from the technology perspective. There are two ways you can design your virtual networks: (a) smart core with simple edge (see also: voice and Frame Relay switches) or (b) smart edge with simple core (see also: Internet). EVB makes option (a) more viable, VXLAN is an early attempt at implementing option (b).

When discussing virtualized networks I consider the virtual switches in the hypervisors the network edge and the physical switches (including top-of-rack switches) the network core.

Historically, option (b) (smart edge with simple core) has been proven to scale better ... the largest example of such architecture is allowing you to read my blog posts.

Is it correct that EVB isn't implemented yet?

Actually it is – IBM has just launched its own virtual switch for VMware ESX (a competitor to Nexus 1000V) that has limited EVB support (the way I understand the documentation, it seems to support VDP, but not the S-component).

But VXLAN has its limitations – for example, only VXLAN-enabled VMs will be able to speak to each other.

Almost correct. VMs are not aware of VXLAN (they are thus not VXLAN-enabled). From VM NIC perspective the VM is connected to an Ethernet segment, which could be (within the vSwitch) implemented with VLANs, VXLAN, vCDNI, NVGRE, STT or something else.

At the moment, the only implemented VXLAN termination point is Nexus 1000V, which means that only VMs residing within ESX hosts with Nexus 1000V can communicate over VXLAN-implemented Ethernet segments. Some vendors are hinting they will implement VXLAN in hardware (switches), and Cisco already has the required hardware in Nexus 7000 (because VXLAN has the same header format as OTV).

VXLAN encapsulation will also take some CPU cycles (thus impacting your VM performance.

While VXLAN encapsulation will not impact VM performance per se, it will eat CPU cycles that could be used by VMs. If your hypervisor host has spare CPU cycles, VXLAN overhead shouldn’t matter, if you’re pushing it to the limits, you might experience performance impact.

However, the elephant in the room is the TCP offload. It can drastically improve I/O performance (and reduce CPU overhead) of network-intensive VMs. The moment you start using VXLAN, TCP offload is gone (most physical NICs can’t insert the VXLAN header during TCP fragmentation), and the overhead of the TCP stack increases dramatically.

If your VMs are CPU-bound you might not notice; if they generate lots of user-facing data, lack of TCP offload might be a killer.

I personally see VXLAN as a end to end solution where we can't interact on the network infrastructure anymore. For example, how would these VMs be able to connect to the first-hop gateway?

Today you can use VXLAN to implement “closed” virtual segments that can interact with the outside world only through VMs with multiple NICs (a VXLAN-backed NIC and a VLAN-backed NIC), which makes it perfect for environments where firewalls and load balancers are implemented with VMs (example: VMware’s vCloud with vShield Edge and vShield App). As said above, VXLAN termination points might appear in physical switches.

With EVB we would still have full control and could do the same things we’re doing today on the network infrastructure, and the network will be able to automatically provide the correct VLAN's on the correct ports.

That’s a perfect summary. EVB enhances today’s VLAN-backed virtual networking infrastructure, while VXLAN/vCDNI/NVGRE/STT completely change the landscape.

Is then the only advantage of VXLAN that you can scale better because you don't have the VLAN limitation?

VXLAN and other MAC-over-IP solutions have two advantages: they allow you to break through the VLAN barrier (but so do vCDNI, Q-in-Q or Provider Backbone Bridging), but they also scale better because the core network uses routing, not bridging. With MAC-over-IP solutions you don’t need novel L2 technologies (like TRILL, FabricPath, VCS Fabric or SPB), because they run over IP core that can be built with existing equipment using well-known (and well-tested) designs.

More information

If you need to know more about network virtualization and data center technologies, you might find these webinars relevant:

And don’t forget: you get access to all these webinars (and numerous others) if you buy the yearly subscription.

8 comments:

  1. You've answered a question I couldn't find an answer for: connection over VXLAN to a gateway. Using a VM with two vNICs for bridging to traditional VLAN access, or using it as the gateway itself also implies that you have to design with an extra layer of complexity in mind: ineffective briding is easily introduced on existing infrastructure, because data flows from one VM to the next, and most likely back into a layer 2 fabric towards destinations. That may cause increased bandwidth too.

    All in all it doesn't seem to be a stable and working concept for me right now, except in the niche cases you've mentioned (virtual firewalls).

    ReplyDelete
  2. Ivan,

    There seem to be more problems than just TCP offload -

    1. # of multicast groups in the physical network. The # of vxlans you support increases the scale requirement of # of multicast groups your networking gear needs to support.

    2. When you are using multicast the convergence of vm movement is still a function of your physical
    network convergance.

    3. Secure group joins and PimBidir support in majority of the networking gear today

    This I think the security part will be swept under the carpet till it becomes a real issue. PimBiDir
    support will become common only if vxlan catches up.

    4. TCP offload details

    Each of these features which save the CPU cycles are gone or you need a new NIC -

    a. LSO, LRO
    b. IP Checksum, UDP Checksum, TCP Checksum - both generation and testing
    Again this will be swept under the carpet is my guess.
    c. Path MTU
    This probably will be dealt with pre-configuring the MTU in guest VMs and will be swept under
    the carpet.

    5. VxLAN still aspires to provide multiple VLAN like constructs to the guest VMs running on multiple
    servers.
    The details of how network is simulated, what networking protocols required to be supported
    is left open to interpretation.

    6. This one has been addressed now by Embrane but there was a lack of load balancers, firewalls
    which need to go along with the vxlan solution. IPSec gateway is another example. However I
    think these are opportunities if market really catches this trend.

    The VDP based solution avoids most of these issues. So I am not sure why someone wants to use
    vxLAN on their already deployed data center which will result in a low performance and throughput.

    I see that STT avoids some of the TCP offload issues, but it seems like a clever hack. NVGRE avoids
    reliance on multicast in the network but still has the same problems of TCP offload.

    I think without a NIC which supports VxLAN (Cisco sure will do this to differentiate their servers and
    disrupt the market) moving to VxLAN will be a disaster for customers.

    Again opinionated, but would like to know your thoughts on each of these...

    Thanks,

    Suhas

    ReplyDelete
  3. Great post Ivan helped me clear up some questions I had in my head and great questions Wim. If you could do some more posts like this where you contrast the different technologies and standards that would be great.

    ReplyDelete
  4. Nice article. Thanks, Ivan.

    Keep in mind that VXLAN can be implemented in physical switches. This way, you can continue to use your paravirtualized TCP-offload NIC, and still get the scalability benefits of VXLAN.

    VXLAN improves scalability in several ways. It gets you past the 4k vlan limit, and also avoids scaling limits in core MAC tables, provides a multi-path fabric, avoids spanning tree, and reduces the scope of broadcasts.

    Finally, to route out of a VXLAN segment, you can either go through a multi-VNIC guest (as identified in the article), or, your friendly neighborhood top-of-rack switch can serve as the default gateway for a VXLAN and route unencapsulated traffic up and out, for extremely high performance. Of course, if you need FW/LB/NAT, then your friendly neighborhood top-of-rack switch might need an L4-7 education.

    -Ken

    Kenneth Duda
    CTO and SVP Software Engineering
    Arista Networks, Inc.

    ReplyDelete
  5. Ivan Pepelnjak19 March, 2012 07:44

    Thanks for the feedback, Ken!

    Am I right in understanding that your "VXLAN in physical switches helps you retain TCP offload" statement refers to a design where the hypervisor hosts would use VLANs and the VXLAN encapsulation would be done in the switches? That's definitely an interesting proposal, but faces the same "lack of control plane" problems as any other non-EVB proposal.

    And I'm anxiously waiting for a public announcement of VXLAN support in physical switches 8-)

    Ivan

    ReplyDelete
  6. It seems a lot of these so called "new" schemes are invented by people who don't really have in depth knoweldge of networking, see some problems and immediately come up with solutions and call them revolutionary, when in fact they are not well thought out, convoluted and complex. The sad thing is that the rest of the crowd worship them. VXLAN RFC did a good job on describing the problem space, but the solution proposed so short of expectation, a total let down. It is along the line of continue to extend the VLANs even though it recognized that the underneath infrastructure has to be IP. VLAN was not a good technology to begin with, it was a simple minded layer 2 folks' solution to solve broadcast storms. Now they are still twisting arms and legs to continue that path.

    ReplyDelete
  7. Hi Ivan,

    Just to let you know that EVB (with VEPA and VDP support) has been implemented in Junos 12.1 of Juniper Networks.

    Greetz,
    Frac

    ReplyDelete
    Replies
    1. Thank you. I know - I was so pleasantly surprised when doing research for the Data Center Fabrics Update webinar. Time to write a blog post ...

      Delete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.