Virtual Appliance Performance Is Becoming a Non-Issue
Almost exactly two years ago I wrote an article describing the benefits and drawbacks of virtual appliances, where I listed virtualization overhead as one of the major sore spots (still partially true). I also wrote: “Implementing routers, switches or firewalls in a virtual appliance would just burn the CPU cycles that could be better used elsewhere.” It’s time to revisit this claim.
The Easy Ones
A few data points are obvious:
- $0.02 CPU used in SoHo routers is good enough for speeds up to ~10 Mbps (see also: OpenWrt), and reasonably-sized x86 platforms are good enough for anything between 100 Mbps and 1 Gbps, depending on the functionality you need and the value of reasonable.
- High-speed packet forwarding (e.g. ToR switch @ 1+Tbps) is way cheaper to implement in hardware;
- High-end packet forwarding gear (CRS-1, MX-960) will be hardware-based for a very long time;
- Hardware encryption is still faster than software encryption, but at least AES is included in instruction set of recent Intel and AMD processors (RSA is still a CPU burner). Is hardware-assisted SSL offload cheaper than throwing more cores at the problem? I don’t know; shop around and do the math.
Vanilla Virtual Appliances
Virtual appliances are clearly good enough for low-volume loads. VMware claims the firewalling performance of vShield Edge Compact (1 vCPU) appliance is ~3 Gbps. Probably true under the ideal conditions (I got similar results testing an older version of vShield Edge with netperf).
HTTP load balancing performance of vShield Edge Large (2 vCPU) appliance is ~2.2 Gbps. F5 claims its BIG-IP LTM VE can do up to 3 Gbps in a 2 vCPU vSphere-hosted VM. Either one should be good enough unless you plan to push most of your data center traffic through a single virtual appliance (hint: don’t ... although I’ve heard F5 VE license isn’t exactly cheap).
Aiming for higher speeds? A10 claims its SoftAX virtual appliance can push up to 8 Gbps of load-balanced traffic. No idea what’s required to get that number, the hardware requirements are in the installation guide, which is hidden behind a regwall. Seems A10 is another one of those companies that never learn.
Getting Beyond 10Gbps
What about even higher speeds? It’s possible to push 50 Gbps through Linux TCP stack and if you do smarter things like custom stack, bypassing the kernel entirely or using Intel’s DPDK (or 6WIND equivalent) you can get the same performance with lower overhead.
However, all the figures quoted in the previous paragraph don’t include the virtualization tax (the performance loss, not this one). To get comparable performance from a VM typically requires some sort of hypervisor bypass, allowing the VM to work directly with the physical NICs, but that approach usually requires dedicated NICs (not really useful) and disables live VM mobility. You can get rid of both problems with Cisco’s VM-FEX and VMware’s vMotion with VMDirectPath, but that’s the only combo I’m aware of that gives you “physical” interfaces (which you need to avoid hypervisor overhead) on a migratable VM.
Good news: the hypervisor landscape seems to be changing rapidly – 6WIND is demonstrating DPDK-accelerated Open vSwitch at Open Networking Summit and they claim they can accelerate both OVS data plane and VXLAN encapsulation, resulting in 50 Mpps performance on a 10-core server. IMIX traffic profile should be pretty relevant when evaluating load balancers and firewalls, and using the IMIX average packet size of 340 bytes 50 Mpps translates into more than 130 Gbps of L2 virtual switching throughput. Good enough I’d say ;)
Finally, Intel just announced their reference architecture (using, among other things, DPDK-accelerated OVS): hardware is available now and DPDK-accelerated OVS in Q3 of this year. Open Networking Platform server is scheduled to enter alpha testing in second half of the year.
Summary: In a year or two, we’ll have plenty of software solutions and/or generic x86 hardware platforms capable of running very high speed virtual appliances. I would strongly recommend considering that in your planning and purchasing process. Obviously some firewall/load balancing vendors will adapt (major load balancing players already did) while others will stick to their beloved hardware and slowly fade in oblivion.
www.networkworld.com/reviews/2013/022513-cisco-virtual-router-test-266658.html
showing Vyatta pulling off 500Mbps on a single core (even on a Cisco UCS server ;-)
(yes yes, self serving post, but it is still true =)
For Vyatta (open source version used in test) they did that 500Mbps on just ONE core
for the Cisco test it took FOUR cores to do just 50Mbps
something is really odd there...
What about services like IPS and Load balances with rules including L4-L7 parsing. Do you think it´s feasible to implement it in software ?
Good post as usual
Cristiano
Cristiano
Most advanced load balancers are implemented primarily in software. For IDS data points, read the erratasec blog posts I linked to.
Do you want a system that can easily be updated w/ software and scale out / up as processors get faster, or do you want limited set of features that work really fast in hardware? There pros and cons for both, let's review in a decade from now and see where L4-L7 services get realized. Many arguments can be made one way or the other, but I surely would not bet against software & Intel combo myself...
Thanks for the interesting post, as usual ;-)
So, in the near future do you think that all today's standalone physical appliances will become virtual and distributed, having just the portion of state and rules relevant to the local bunch of VMs - say one for hypervisor? With rules and state migration following the VM?
Thanks,
Ariel.
Also, do you really want to keep forklifting your Firewall/LB networking gear for the next rev of contract manufactured hardware or does it make sense to align with "Moore's Law Networking" on commodity servers? Servers upgrade cycle is 2-3 years, contract manufactured L4-L7 appliances have typically a lifecycle of 5-7 years. Open your 5 year old top end firewall and there is a good chance your desktop processor is faster...
I've seen so many blogs and read so many books that harp on not sending packets to the CPU in a hardware switch/router as it will impact performance and that ternary RAM is needed for large IP tables / various ACLs / etc.
What I really feel when discussing virtual switch/appliances is basic features that the 2600s of ye olde would handle. If you're confortable running your network on a 2600 - virtualize. If not........
That said, there are many hardware appliances that far outstrip their virtual brethren simply because of hardware acceleration. In the case of load balancing, SSL stripping and re-encryption CANNOT be done at line-rate without specialized hardware. The same goes for HTTPS inspection of UTM, IPS, and firewall traffic. Packet-forwarding is a quaint topic. Claiming high through-put for routing pretty much means nothing as those functions are increasingly commodity and being stuffed in devices that are capable of much more. If you think stateful firewalls still make your network safe, you need to lift the rock you have been living under.
Sorry to be so harsh, but some of these articles are very myopic and don't really address the issues of a modern network. That is why the state funded phrackers (my word, it is a play on water-fracking for natural gas, because this is analogous to how modern 'hackers' mine data from networks) pwn you.
You might also want to read the follow-up blog post: http://blog.ipspace.net/2013/05/dedicated-hardware-in-network-services.html - x86 silicon is slower, but also cheaper (per Gbps) than whatever awesomesauce your vendor is selling you.
You might not like my conclusions (most hardware vendors don't) but price lists speak for themselves.
As for perspective problems - I always love constructive feedback, and since you wrote "__some__ of these articles are very myopic" I assume you're a regular reader, and would appreciate a list of articles you disagree with (and why).