You Want Your Network to Be like Google’s? Really?
This article was initially sent to my SDN mailing list. To register for SDN tips, updates, and special offers, click here.
During one of my SDN workshops one of the attendees working for a mid-sized European ISP asked me this question:
Our management tells us we should build our network like Google does, including building our own switches. Where should we start?
The only answer I could give him was “You don’t have a chance.”
The Problem
Building your own network operating system is still a major undertaking. LinkedIn recently described their journey toward the first Tomahawk-based switch. It took them a year (~ 6 man-years of work) to build the prototype with the functionality they need in their network, and test it in a pilot network… and I’m positive they used standard building block (example: Quagga to run BGP).
Also consider that:
- LinkedIn applications were probably designed from the grounds-up to be well-behaved. Your networking gear might have to support all sorts of extra kludges to cope with broken application stacks;
- Unless you decide to use Broadcom’s OpenNSL API or buy your hardware from a vendor that ships their boxes with Linux device drivers, you’ll have to deal with Broadcom’s NDA procedures, so you’ll have to be big enough to matter to them.
With all this in mind, do the ROI calculations, and don’t forget to include the costs of the ongoing software maintenance (either vendor support or your own team). It might turn out you’re not big enough to make it work.
But wait, that’s not all
The ISP I mentioned in the beginning might have been big enough to make the ROI arithmetic work, but I’m positive they would face another problem: lack of talent.
Building a network operating system (even when using standard components) is not trivial, and it’s hard to get engineers with the necessary experience unless you’re a Silicon Valley startup or one of the popular big guys.
Starting from scratch and growing your own talent sounds like a feasible Plan-B, but unfortunately people considering this approach usually underestimate the complexities of the project. I know companies that wasted years trying to build their own OpenStack-based public clouds from the OpenStack sources instead of using a commercially-supported distribution… and likewise you might be better off buying Cumulus Linux licenses (or another commercial alternative) and slowly building your competence while already running a production network.
I'd like to hear from you!
Disagree? Please write a comment! Want to hear what I think about your SDN deployment plans? I’m usually available for short online consulting.
Want to know more about SDN? Watch the SDN and network automation webinars on ipSpace.net – if you're serious about advancing your career I’m positive you already have the subscription that gives you full access to all of them.
(Software Gone Wild Episode 48 - OpenSwitch Deep Dive)
Also, I believe that things may have changed since you last checked, but I was able to download and run OpenSwitch in Virtualbox just fine. Also the build system instructions appear pretty complete as well (http://www.openswitch.net/develop/develophome).
Last, have to agree on the current HCL but I'm sure that won't be much of an issue here soon.
on otherhand standalone network OS such as Cumulus, Pica8 are also good commercial operating systems.
CWB
At the current time, we have many Whitebox/ BrightBox hardware vendors (Edge-Core/ Accton , Quanta , Dell , HP, etc) and also Multiple opensource and commercial network OS (Big Switch, Cumulus, Pica8, OpenSwitch, OpenNetworkLinux, Ocnoc, Dell OS10, Nuage, Pluribus, ...)
in fact if google was sleeping and suddenly wake up at this time, they won't go for building their own Jupiter, Firehorse, etc. they could use an exiting technology.
Linkedin just started to build their own. It's the big question for me related to linkedin's announcement, why not using and contributing to an existing NOS project ?
* How much R&D budget your company have?
* How much are you spending in your networking gear, Support etc?
* Will you be making a saving in the long run if you innovate your own technology?
* Are you willing to take a possible financial risk if the innovation doesn’t pay off?
Any mid to large company can do it, but it all depends on how the figures tally on the books. Tasks like this are more of less depends on the business side than technical. You do need technical expertise but there should be enough funds to support the work. In the long run, you have to justify that running cost of your own innovation will have major advantages in comparison to off-the-shelf technology.
I guess, because of these factors, only companies with big R&D budget can afford it.
we're also a "mid-sized European ISP" and we do build our own network gear, although not L2 datacenter switches but edge L3 stuff. We instal them in thousands of remote POPs with no out-of-band management.
We decided an overall SoC architecture and hired an OEM to assemble our custom hardware (at the moment 2*10G+24*1G and 8*10G) in a few thousands units.
Of course we used "standard building block", as you call them, e.g.: Quagga for OSPF, Exabgp, OpenVSwitch, lldpd, a custom PPPoX/L2TP damon, etc.
We've opted to do dataplane in a dpkd-like way over a great userspace stack we licensed from 6WIND (http://www.lightreading.com/carrier-sdn/sdn-technology/italian-sp-deploys-homemade-sdn-appliance/d/d-id/713802).
Also, we build our own centralised network automation tool, scripting tools, a monitoring GUI and even defined our own "cli" with its own syntax.
We've got a rather meshed backbone, so we're using OpenFlow-rules to loadbalance MPLS customer traffic on all the available routes based on opportunistic traffic class determination and realtime link capacity (most of them are microwave, i.e., time-varying) though a Mixed Integer Programming algorithm that uses the libs of a general-purpose commercial solver to find the optimal routing strategy. On top of that, we proactively provision backup paths and use BFD for fast-reroute.
For fun, we're rewriting part of that central controller on GPU (on a pair of Nvidia Telsa K2).
All this with a staff of 3 people, over the last 1.5 years.
If we were to start this adventure again, we would honestly do a number of things differently, but I think we would follow the same path.
I completely agree with the "lack of talent" you talked about! Imho networking is still a software-adverse field: we find it extremely hard to find networking people that have a solid software mentality.
If you'd like more details for a chat, feel free to get in touch.
Giacomo Bernardi ([email protected])
Giacomo, cool stuff!