It Doesn’t Make Sense to Virtualize 80% of the Servers

A networking engineer was trying to persuade me of importance of hardware VXLAN VTEPs. We quickly agreed physical-to-virtual gateways are the primary use case, and he tried to illustrate his point by saying “Imagine you have 1000 servers in your data center and you manage to virtualize 80% of them. How will you connect them to the other 200?” to which I replied, “That doesn’t make any sense.” Here’s why.

How many hypervisor hosts will you need?

Modern servers have ridiculous amounts of RAM and CPU cores as I explained in the Designing Private Cloud Infrastructure webinar. Servers with 512 GB of RAM and 16 cores are quite common and becoming relatively inexpensive.

Assuming an average virtualized server needs 8 GB of RAM (usually they need less than that) you can pack over 60 virtualized servers into a single hypervisor hosts. The 800 virtualized servers thus need less than 15 physical servers (for example, four Nutanix appliances), or 30 10GE ports – less than half a ToR switch.

Back to the physical world

The remaining 200 physical servers need 400 ports, most commonly a mixture of everything from Fast Ethernet to 1GE and (rarely) 10GE. Mixing that hodgepodge of legacy gear with high-end hypervisor hosts and linerate 10GE switches makes no sense.

What should you do?

I’ve seen companies doing network refreshes without virtualizing and replacing the physical servers. They had to buy almost-obsolete gear to get 10/100/1000 ports required by existing servers, and thus closed the doors for 10GE deployment (because they won’t get new CapEx budget for then next 5 years).

Don’t do that. When you’re building a new data center network or refreshing an old one, start with its customers – the servers: buy new high-end servers with plenty of RAM and CPU cores, virtualize as much as you can, and don’t mix the old and the new world.

This does require synchronizing your activities with the server and virtualization teams, which might be a scary and revolutionary thought in some organizations; we’ll simply have to get used to talking with other people.

Use one or two switches as L2/L3 gateways, and don’t even think about connecting the old servers to the new infrastructure. Make it abundantly clear that the old gear will not get any upgrades (the server team should play along) and that the only way forward is through server virtualization… and let the legacy gear slowly fade into obsolescence.

Designing a new data center network?

You’ll get design guidelines and technology deep dives in various data center, cloud computing and virtualization webinars, and you can always use me to get design help or a second opinion.


  1. That's fine if you only have x86 servers but there are also Solaris and IBM Power servers that should reside in the same networks/security zones.
    Also hardware appliances for network services like firewalls, load balancers, IPS and also Citrix access gateways are often cheaper than their virtual licenses.
    1. Of course you're right but:

      A) Those exceptions usually don't represent 20% of the servers (or ports)
      B) It still doesn't make sense to mix then with the hypervisor hosts on the same ToR switches.
  2. Great post Ivan!

    Can you expand on why it doesn't make sense to mix those appliances and legacy servers that absolutely can't be virtualized onto the same ToR switches as the hypervisor hosts?
  3. Hardware VXLAN VTEPs are still important for connecting external feed. e.g. MPLS VPN service from a service provider, private GE links. Cloud operator needs to support bridging physical and virtual environment for a customer. Thing is you cannot connect any single physical cable directly to virtual appliance.
  4. But software VTEPs are wirespeed at this point, so except in some very niche situations where you might have tens of gigabits of hardware SSL in F5 or a large AIX system or something, but for Inet/WAN connections etc. in the vast majority of firms there is no need.
  5. The number of physical servers could be quite a bit especially if the customer is big Oracle shop. Most of the applications that use clustering to achieve HA (Oracle RAC, MySQL Clusters, etc..) or applications with heavy duty IO requirements have to stay on dedicated physical boxes. How would you then connect them to the VXLAN network?
    1. using switches that supports Hardware VXLAN VTEPs. You can bridge VXLAN to a VLAN or to a port.
    2. yes i understand that. But Ivan was suggesting not putting physical servers on same ToR as hypervisor hosts as i was wondering why. Sorry my question in the previous reply was not clear
    3. Simply route to the physical subnet. Assuming these heavy iron DBs are in a separate network segment for security reasons, it would be quite simple to route the appropriate traffic.

      More than likely, your virtual farm and big iron are going to be separately racked and cabled anyway.
    4. Or you put an on-ramp pair of switches someplace in the DC which have some form of clustering so they can support multi-switch teaming - then they talk L3 to the fabric and L2 to WAN connections/appliances/non-x86 iron.
  6. Ivan,

    I am not fully convinced with basic assumption of "We quickly agreed physical-to-virtual gateways are the primary use case". Would rather look at the problem from controller's scalabilty and performance point of view. That is where would one deploy VTEPs, is it on hypervisors or or on ToR's.

    Consider a different usecase with 50K VM's, at 60VM's per physical host ~825+ physical hosts (all virtualized). Assuming 5 VM's per VNI, about 10K's VNI's and each VM's of a given tenant reside in different physical host.

    If one were to have VTEPs at the hyper-visors for the usecase considered. The performance numbers are as follows

    1) 2 TCP connection with each hyper visors. One for OVSDB and another for OF (With NSX or with ODL). So the controller has to handle about 1500+ TCP connections just for managing the hypervisors.

    2) If OF-1.0 is used, #virtual ports created on a single physical host are 60 * 5 = 300/physical host. So the controller to handle 300 * 825 ~ 25K virtual ports. Agree this number is reduced when OF1.3 is used. At this don't have numbers to what extent.

    3) #flows programmed by the controller also increases as flows are programmed by the controller.

    4) Controller to manage 825+ physical hosts to distribute VM routes.

    On the other hand, if the VTEPs are deployed at ToR switch, with 30 10GE Ports

    A) #TCP connection to controller is 25+. We only need OVSDB connection and don't require OF, as solution like NSX leave the programming of flows to HW vendor instead of using OF.

    B) As there is no OF in the picture, controller need not bother about creating virtual-ports/handling flow entries etc.

    C) Controller to manage only 25+ HW VTEPs to distribute VM routes.

    So, to summarize scalability of the controller becomes important point for choosing hardware VxLAN GWs
    1. 1) Last time I checked, web servers happily worked with 10K concurrent TCP connections. No reason a cluster of controllers couldn't do the same.

      2) You don't need virtual ports like you think you do. Read

      and comments to it.

      3) Number of forwarding entries isn't that different from the VTEP case, and the forwarding entries cost you less than the hardware ones.

      4) So what? What's the number of changes-per-second?

      Finally, with all the questions you're asking, I think it's time for full disclosure: who are you working for?
Add comment