Dedicated Hardware in Network Services Appliances? Meh!
Francesco made an interesting comment to my Virtual Appliance Performance blog post:
Virtual Appliance Performance is comparable to the equivalent Physical Appliance until the latter use its own ASICs (for a good reason), e.g. Palo Alto with its new generation Firewall...
Let’s do a bit of math combined with a few minutes of Googling ;)
Palo Alto has numerous hardware models. The low-end ones provide ~1 Gbps of throughput, the flagship models go up to 20 Gbps of throughput. They also offer a VM-based product that they claim has 1 Gbps of throughput when running on four CPU cores.
High-end Cisco’s C-series servers have 4 processors with up to 8 or 10 cores per processor with prices “starting @ $7,950”. Pushing 1 Gbps of “Palo Alto throughput” through that server requires one half of one processor or less, and you can push up to 10 Gbps of “next-generation firewalled” traffic through a fully-loaded box (which just happens to have two 10 Gbps interfaces).
HP ProLiant DL980 G7 server has 80 processor cores, for 20 Gbps of “next-generation firewalled” throughput ... and a very reassuring price tag of “starting @ $33,585”. Nah, I’d go for two C460-M2s.
Interestingly, the licensing fee you have to pay for the Palo Alto VM appliances isn’t exorbitantly high. According to an online price list I found some people get the VM-200 for $4,050 while the price list for full-blown PA-5060 on same web site exceeds $130,000. Even when buying twenty individual VM-200 licenses I’d have ~$50,000 left to buy the hardware (and there seems to be a volume bundle with 25 licenses at ~$70.000, leaving ~$60,000 for the hardware). Hmmm, maybe I could afford the DL980 G7 after all.
Someone pointed out in one of my Interop presentations that you have to consider the increased power utilization when using multiple Intel servers instead of a single appliance - and he was absolutely right. For a complete picture, consider also related cooling costs and costs of the rack space.
Summary: From my naively ignorant perspective there’s no good reason for dedicated ASICs in network services appliances ... unless you want to filter a single 10Gbps stream, in which case you probably have a design problem to start with.
How would you do something like a Fortigate 3600c using software/VM's? 60Gb Firewalling throughput in 3RU of space.
Hardware probably still has a place in the middle of a network but at the edges where bandwidth is much lower then software can be fine.
With Moore's Law, coupled with the newer efficiencies of Server virtualisation, that is no longer the case. There is a good reason that F5 bought LineRate. Software is eating hardware because it's cheaper & better & more agile.
Thanks for mentioning the power :)
I also wanted to point out a few things:
1. There are solutions that can be deployed with less than 4 cores, but def a good comparison for NGFWs. If NGFWs aren't needed in the DC, other solutions that require fewer resources can be used.
2. There are benefits of having it in software for snapshots, rollbacks, templats, etc.
3. How often is 20G REALLY NEEDED day 1? Scale out designs, pay as you grow, rapid deployment are also benefits of the software/NFV model. Likely, an environment will never even hit the capacity of the HW device - it's deployed "just in case" and for a 3-5 year life cycle.
4. Casado often says (as I do now) N simple devices vs. 1 complex device. Definitely another advantage here.
You made the point that pricing is lower in the example, but there are so many other benefits to point out b/c some ppl may dwell on the negatives :)
If you don't need the hardware accelerated IPS, buying Fortigate 800C's is the best bang for your buck - its the same chip as the 3600C (NP4 with 2x10G) but only one per chassis, but at 1/6th the cost. If the 3600C is anything like the 3140B I've used, there is not a load balancing fabric in front of those 3 chips, but instead the 12x10G on the front are split evenly (4 each) across those NP4 processors.
I'm still a big fan of them, there are just a number of caveats with Fortinet that you don't realize until you try and actually implement it.
When you talk of "pure firewall throughput", does that include full TCP session validation (including fragment reassembly if needed)?
As far as TCP validation, I believe that is included, but I don't know if it can do reassembly in hardware. The general flow is that all new sessions go through the x86 CPU (which varies from an 4-core i5 in the 800C to an 8-core Xeon in the 3600C) which profiles the session and then hands it off to the NP4 for flow through the system. Adding in AV or IPS slows that down, although it looks like Palo Alto is not including IPS in their throughput numbers either. Various models have other processors (CP8 for AV, SP2 for IPS) that can help accelerate those as well.