LineRate Proxy: Software L4-7 Appliance With a Twist

Buying a new networking appliance (be it VPN concentrator, firewall or load balancer … aka Application Delivery Controller) is a royal pain. You never know how much performance you’ll need in two or three years (and your favorite bean counter will not allow you to scrap it in less than 4-5 years). You do know you’ll never get the performance promised in vendor’s data sheets … but you don’t always know which combination of features will kill the box.

Now, imagine someone offers you a performance guarantee – you’ll always get what you paid for. That’s what LineRate Systems, a startup just exiting stealth mode is promising.

Where’s the trick? It’s simple – they don’t sell hardware, their software-only appliance (LineRate Proxy) runs on run-of-the-mill x86 servers. If you don’t get the performance you want (up to the licensed bandwidth), just add more servers.

Additional benefit: you don’t have to buy dedicate hardware – whenever you need more bandwidth, you can take another server from your compute pool and repurpose it as a networking appliance.

Isn’t CPU-based packet processing expensive? Sure is … if you use TCP stacks written for single-core CPUs and NE1000 interface cards in 1990’s. Like 6WIND, LineRate Systems developed their own TCP stack that replaces the one in FreeBSD kernel. They claim their LineRate Operating System (LROS) can reach 20-40 Gbps throughput on dedicated commodity x86 hardware with built-in NICs.

The performance numbers are good, but not exactly mind-blowing. vShield Edge can reach 4 Gbps with a single vCPU (single core) and Juniper’s Virtual Gateway can deliver up to 30 Gbps on a high-end server.

They also claim LROS can handle 100K+ session setups per second, 2M active flows and 4000 tenants (virtual network appliances) on a single server with 24GB of RAM. Although the numbers are not too far away from what’s realistically possible, I’d still do my own tests before deploying such a configuration (and just in case you need a great traffic generator, watch Spirent videos from Networking Tech Field Day).

What about SSL offload? Well, that’s where the going gets tough. You will definitely reach the performance you paid for, but you’ll have to throw a lot of CPU power at the problem compared to dedicate SSL hardware (after all, there’s a good reason we’re doing SSL offload in load balancers instead of running SSL code on web servers). Get a demo, run your own tests, do the math.

How good is the feature set? Honestly, no idea. The list of features supported by LineRate Proxy in currently shipping release is long and includes load balancing (with server health monitoring), content switching and filtering, SSL termination, ACL/IP filters, TCP optimization, and DDoS blocking.

They promised to give me access to documentation; after reading it, I might be able to tell you more (or at least ask Tony Bourke a non-trivial question). However, I do suspect there’s a reason their solution’s “entry cost is 70% lower than competing HW purchase”.

We all love bashing hardware networking companies for their extravagant margins, while they’re actually hiding software development costs by not including them in costs of goods sold.

Multi-Tenant isolation. As I already mentioned, a single LineRate Proxy appliance supports multiple tenants. To do that, it has to provide isolation between tenant networks, at least on the inside interfaces. So far, the only virtual networking technology they support are simple VLANs. No Q-in-Q, no PBB, no MAC-over-IP.

High availability. They claim you can configure their appliances in N+1 high-availability cluster. They use CARP (a FreeBSD artifact) to pass outside IP addresses between the hosts.

Other goodies? LineRate Systems got two other things right:

  • Contrary to some other vendors (large and small) they do acknowledge that IPv6 exists – LineRate Proxy has dual stack (IPv4 and IPv6) from day one, including load balancing between the two.
  • Like Juniper, they used the opportunity of a clean-slate approach. They implemented device configuration and provisioning the way it should be done in 2012 – the management interface is REST API (that you can tie directly to your cloud orchestration platform) with CLI and GUI being implemented as API clients.

Use cases. LineRate Proxy is probably best suited to cost-sensitive environments with unpredictable traffic growth (they claim their cost-per-tenant is less than $50/year … Amazon’s Elastic Load Balancing starts @ $18/month) and basic feature requirements. IaaS and SaaS providers immediately come to mind; not surprisingly, photobucket.com is one of their reference customers.

Finally - what is Software-Defined Network Services? Ah, that one? Glad you asked. You see, everything has to be hitched to the SDN train these days or you don’t get VC funding. A creative marketer could call any load balancer with an API SDNS, but between us – just because you have a software-only product that happens to have an API doesn’t mean that you just created a whole new category of solutions.

Summary

LineRate Proxy seems to be an interesting product worth testing, primarily due to their no-nonsense licensing (high performance doesn’t hurt either). If you’re building your own cloud infrastructure and are willing to consider startup vendors as part of your infrastructure, give them a try.

Full disclosure

I got the information presented in this blog post during a phone call briefing with Steve Georgis, LineRate’s CEO. I was not able to test their product or read the product documentation.

17 comments:

  1. Would be really cool if they ran the open vswitch on the southbound interfaces, and partnered with Nicira and/or Big Switch, so that the appliance could be used as a gateway in Overlay based clouds such as, um, Rackspace.
  2. Yeah, that would be cool, but I wonder what their TCP stack performance would be with extra MAC-over-IP encapsulation not supported by NIC HW, so it would be either STT or lower performance (assuming they rely on TCP offload to get their numbers, which remained a gray area during my briefing).
  3. I'm very wary about the SSL performance as that's been one aspect that we've relied heavily on SSL ASICs for many years. With the migration to 2048 bit certs, which eats up about 5x as many CPU cycles as 1024 bit keys, I'm going to need proof that generic x86 CPU cycles can scale out with 2048-bit SSL keys.

    Each new SSL connection will require an asymmetric operating to occur, and that's the part that has blasted CPUs before. Putting that onto an ASIC has helped greatly over the years. Most load balancing vendors have that, usually as a PCI card from Cavium.

    Admittedly though, it's been a while since I know anyone that's tested SSL performance with modern hard are. I'm willing to challenge the "need ASIC" assumption, but I'm going to want proof.
  4. It'd be cool if SSL is using AES-NI engine which can offload AES instruction to host CPU (even from VM) and configure the load balance to favor AES encryption.
  5. I wonder if a GPU could do this job...
  6. This is Manish at LineRate. We don't use any special TCP offload hardware. We typically measure peak performance using Intel's 82599 NICs, so pretty standard stuff.
  7. Brad, I spoke with Radware earlier at their booth at ONS and think they are sort of accomplishing what you are referring to here. They have and always had their Service Delivery Controller, managing their own ADCs, but now are integrating it with an OpenFlow controller that is then in turn controlling the OF switches in the environment. With this integration between the OF and Radware controllers, they are able to map tenants to specific virtual or physical appliances. Nice way for L4-7 insertion.

    This kind of controller to controller integration will be paramount for dynamic gateway and L4-7 insertion. No reason to openflow enable everything, just integrate the head ends on each side.
  8. 20 - 40Gbps throughput on a single CPU? That is just not possible. The highest performance single socket Sandy Bridge can do only about 25Gbps with just packet in/out without any thing useful running, on bare metal. If this throughput is on multiple CPUs, why stop at 20 -40Gbps, shouldn't this be able to scale out -- like indefintely (at least for a while)?
  9. Obviously they need multiple cores to reach that throughput. As for "indefinite scaling", you hit a number of other limitations (listen to one of the recent Packet Pushers podcasts on server architectures). At a certain point, it makes more sense to deploy a second box than to increase the performance of a single server.
  10. 82599 can do TCP segmentation (including Receive Side Coalescing) and checksum offload. Not a full TCP offload, but the time-consuming functionality is implemented in hardware.
  11. Service insertion: we need a shiny new protocol because the old ones (like WCCP) wouldn't ever work and because things like MPLS warp space-time continuum. Makes me sick.
  12. Unfortunately, AES-NI can only help with the symmetric encryption. The hard part (and CPU-blasting part) of an SSL connection is the asymmetric part, which is required for each new SSL connection, and AES-NI doesn't help with asymmetric encryption.
  13. Maybe the real money is simply in a clean and easy UI and use "old" protocols such as MPLS. :)
  14. I probably didn't make myself clear. Single CPU implies the multi-core CPU but single socket. To achieve a 20-40Gbps throughput on a sinlge socket with service is not possible. If you run on multiple sockets, these multiple sockets don't have to reside on the same physical machine, they are mostly likely independant machines connected by the network. If these software only solutions were to have real value and beauty, they should be able to run on multiple machines, to truly scale out -- like indefintely :-)

    BTW, How many companies are doing this kind of stuff? There is Zeus from Riverbed, Embrane (spell it right?) and this LineRate. Any more?
  15. Those throughputs are very much possible on a single CPU. New software architectures coupled with Sandy Bridge EN hardware allow for orders of magnitude increases in performance. On those systems we are seeing 11+Mpps per core (not CPU) for L3 forwarding even with a full LPM on every packet. Linear scalability is limited by other factors in the system, but it's much higher than 40Gbps. Packets rates decrease as complexity of the services increases, however it is still in the Mpps/Core range. Even more interesting is packet latency. Compared to a standard Linux or BSD stack, we see min/avg/max something like 6/12/200 microseconds. Equivalent max times in Linux are sometimes 500ms under load.
  16. I confirm the values and a single CPU can do much more. With our networking acceleration platform, we get 11Mpps/ core wih linear scalability when you add cores.
    On a dual sandy bridge CPU, we demonstrate 162Mpps. Performance independent from packet size.
Add comment
Sidebar