Does It Make Sense to Build Your Own Networking Solutions?
One of my readers was listening to the Snabb Switch podcast and started wondering “whether it’s possible to leverage and adopt these bleeding-edge technologies without a substantial staff of savvy programmers?”
Short answer: No. Someone has to do the heavy lifting, regardless of whether you have programmers on-site, outsource the work to contractors, or pay vendors to do it.
Also, someone has to support, troubleshoot, maintain and improve the product over the years to come. This is the also the most-overlooked aspect of any project, particularly homegrown automation solutions. When you roll out your solution into production you’re not done – your job has just started for real.
My reader continued his email by explaining how they work today:
Whenever we adopt bleeding-edge technologies, it’s innovation with training wheels. We try new things but usually with someone holding our hand. There was a recent debate about whether to attempt Openstack internally or via third-party such as Mirantis.
In most cases Innovation with training wheels is the only thing that makes sense, regardless of whether it’s personal growth (read: attend trainings) or introduction of new technologies… unless of course you’re doing something fundamentally new.
Reinventing the wheels (as opposed to having a helping hand that supplies the training wheels) is usually a waste of time - I’ve seen several teams that wasted years (real-time years, not man-years) trying to get OpenStack working from vanilla distro instead of buying a ready-to-use product.
On the other hand, unless you want to work with your consultants (or system integrator or vendor) for years, you must be ready to take over and strip the training wheels eventually.
Products like Cumulus-on-whitebox and Snabb Switches bring opportunities for either highly customized needs or even cost-savings when compared to big-name network vendors. But the argument inevitably comes back to whether or not the cost-savings is worth the risk and headaches of programming and supporting.
It depends on how big you are. You’ll pay the vendor’s R&D and support costs (+ sales and marketing) or you’ll pay your own programmers or contractors. If you’re big enough it makes sense. If you have special needs (high-frequency trading comes to mind) or operate at the very bleeding edge, it might be the only option you have. If you only need a few switches it’s a waste of time.
Also, keep in mind that the infrastructure inevitably becomes a commodity over time. We’re seeing the early stages of that process in networking - data center switches running Linux with a device driver that controls the switching silicon. We had open-source infrastructure including vendor support in reasonably-high-speed x86 packet processing for years (PF_RING, netmap…). The same process has started in bleeding-edge x86-based networking (solutions like 6WIND or Snabb Switch).
Reinventing that infrastructure clearly doesn’t make sense - it’s like writing your own CRM instead of using SalesForce (or another similar product).
My reader concluded his email by saying:
However, I refuse to fall into that narrow-minded and lazy mindset of just going with the usual big-name, incumbent vendor over and over (even after the CIO asks if there are opportunities to reduce costs).
If you need a few instances of a product, go with an incumbent vendor. It’s not worth risking your whole data center just to get a slightly cheaper pair of load balancers or firewalls. If you need a few thousand boxes, it makes sense to look around. In between there’s the usual gray area.
Also keep in mind that reducing CapEx (= buying a product) often increases OpEx (= building a product), and you have to be very careful and realistic in your estimates. The costs of building a product are usually underestimated by order of magnitude, particularly by the people who never built one (see also Dunning-Kruger effect), and the costs of maintaining a product (particularly syncing your fork when the underlying codebase evolves) are usually conveniently ignored.
Finally, doing the same thing over and over and expecting reduced costs is close the well-known definition of insanity. The real cost-savings come from changing your application development and deployment processes and removing the unnecessary kludges and reverse engineering that happen between Dev and Prod.
a) a few engineers who know exactly what they are doing.
b) management who understands (and support) the problems the engineers are trying to solve + shield them from politics.
c) A and B usually never come together. When they do, the result is companies like Google, FB, etc.
I don't think anyone can say OpEx will be systematically higher until someone actually does it. So far everyone is just too scared to even try.
Also, I never claimed OpEx will be "systematically higher", I just said most people have no idea what it will be.
As for Barefoot Networks - so far it's still vaporware (= technical whitepaper), and programmable ASICs (= NPUs) have been around for ages... they just tend to be more expensive than stripped-down versions with fixed pipeline.
Eolo is a medium size SP. No one is suggesting to build your own network gear for SMEs but large enterprises and SPs can, IMO. Eolo may have deployed over a thousand boxes but I am pretty sure they developed their solution only on a handful of them. Anyone up to the challenge can ask OEMs to develop hardware at given specs for a fraction of those 1000 boxes. 30-50 units is perhaps a sufficiently large order for an OEM and certainly not a mind boggling one for most SPs to afford.
SDN and network hardware abstraction is beyond DC applications. We should not deter anyone from building their own network gear for whatever application they see fit. Some smart folks in silicon valley understood that a while ago and now run pretty successful businesses (acquired or as vendors). Building your own gear is not for everyone, but some engineers with the right skills can do it. Eventually Eolo will be just one of the many examples instead of being the exception. I am confident there will be more and more guests at your podcasts talking about their ideas.
Barefoot Networks is a little special. The founder is Nick McKeown. He is a Stanford professor, one of the founders of Nicira and arguably one of the fathers of SDN (together with Scott Shenker and Martin Casado). Nicira was no vaporware as we all know.
Silicon Valley is surely home of a lot of vaporware these days, but that´s not to say it is all vapour. Time will tell. Let´s wait a few more months to see what Barefoot´s ASIC is really made of (ASIC-like performance/cost of NPU-like performance/cost). I cross my fingers and hope for some serious innovation. Their use of the P4 language certainly made me curious.
Nick McKeown mentioned that Barefoot Networks is working on a flexible-function ASIC. Traditionally ASICs were custom developed according to specific functions (match + action, e.g. match specific header field and replace it). If I understood it correctly, Nick is suggesting to perform pipeline insertion & manipulation in a a way that is protocol independent (i.e. arbitrary bit by bit manipulation instead of protocol hierarchy-specific pipelines). Also a significant amount of complexity in ASIC development comes from microcode programming, which Nick seems to suggest we can all forget about. Protocol Independent Forwarding and higher level (abstraction) level languages like P4 will be used to code pipelines in an abstracted (and easier) manner.
Nick also makes some excellent points:
a) The vast majority of source code to run network functions is open source and most modern Network Operating Systems run on Linux.
b) Performance ratio of ASIC:NPU:CPU=100:10:1
c) all sort of new applications at 100G line rate per port could be enabled.
d) Software will eat the world (Marc Andreessen).
e) network vendors have been hindering technology development for years and a lot of things will change in our industry if the network hardware layer is fully abstracted. Google, FB, Spotify and others companies with competent technology-focused leadership have understood that a long time ago.
Of course price-performance ratio will be the key aspect but I think Barefoot is going to surprise everyone.
> Eolo may have deployed over a thousand boxes but
> I am pretty sure they developed their solution only
> on a handful of them.
No, virtually *all* their radio towers host one of those SDN boxes discussed in the podcast. It's cheaper and simpler to have a homogeneous network. Multiple thousands of these boxes have already been built and deployed.
I agree with point d) above. Also, I'd add:
f) Do not aim for line-rate all the time. This and realistic traffic patterns on moderately multiplexed traffic allows you to use CPU-only processing. This is exactly what we did in that Eolo project.
Previously I wrote that Eolo deployed over a thousand boxes but only developed on a handful of them. This bit may have been misunderstood. I used the word developed referring to development stage after design/research (R&D). I would expect you first made a small order from the OEM and tested the solution for a while before going into full blown deployment and ordering over a thousand of units. The point I was trying to make is that one does not need to order 1000 units to get an OEM to build something based on custom specs. Or in other words, smaller companies than Eolo could well try to build their own gear too. Even better if your reference design is published.
Unfortunately line rate performance is used by vendors to set themselves on a different league and sadly, many (if not most) customer decision makers can only compare boxes by datasheet/marketing numbers instead of their actual use/need. Fair enough, if line rate performance is what one is after (surely DCs and Carrier Networks are) then an ASIC is the only way to deliver Tbps worth of processing power. Custom and expensive ASIC designs were blamed for high development costs and that reason is likely to disappear once barefoot releases what I think they are planning to.
I fully agree with you that CPU-based solutions can fit most use cases and barefoot is not going to change that. However barefoot may add to it by enabling something quite new and extraordinary. An ASIC could be flexibly programmed by a controller to perform all sort of advanced tasks even at the edge or even access layer (e.g. DPI as NG-FWs and DDoS detection & filtering) and I don't technically need a vendor to code that if an open source community sets on the task to write that code for everyone (see OpenStack as example).
More clues on this 2014 paper published by Stanford in collaboration with Barefoot Networks, Google, Intel, Microsoft and Princeton University:
I have worked briefly with some "visionary" leadership over the years and they were not only open minded, they were informed and looking at some innovative solutions on their own. It was refreshing, but short-lived.
"Who needs dynamic routing?"
"OSPF? Static Routing works, too"
"Who needs IPv6?"
"firewall ha cluster? We can put a second hardware beside the first one"
"regularly software updates? Why? That switch/router/firewall/system is running for 1000+ days without problems"