One of the comments I got on my Lego Bricks & BFT blog post was “well, how small should those modular Lego bricks be?”
The only correct answer is “It should be Lego bricks all the way down” or (more formally) “Modularity is a concept that should be applied at every level of the architecture.”
Today let’s focus on how much easier the life would be if we could take apart the network operating systems instead of just watching them as glued-together Death Stars.
The architectural differences between modern server- and network operating system are minimal. They commonly run on some variant of Unix, and use a number of independent processes (daemon) to get the job done. The real difference between the two is in packaging.
Linux is usually distributed as a zillion of packages (all of them hopefully supported by the vendor who sold you the distribution). You can get most network operating systems only as a single humongous all-or-nothing image (in the spirit of Matt Oswalt’s blog post let’s call them BFI).
The only major exception I’m aware of is Cumulus Linux; Arista might have something similar (if that’s the case, please write a comment), and Juniper started shipping non-core functionality as packages outside of Junos on their QFX10K switches.
Do you want to have Libre Office on your Linux web server? Probably not, and so nobody forces you to install it. Do you want to have VoIP support on your Internet edge router? Probably not, but you can’t get rid of that code, because it’s tightly coupled (or at least packaged together) with the code you need.
Before anyone tries to tell me how impossible it is to support hundreds of independent packages making up a network operating system, let me point out that the same model works pretty well for Red Hat and a few others companies. It’s not that it cannot be done, it’s simply that your networking vendor cannot do it.
Let’s move from initial deployment to troubleshooting. All software has bugs, and sometimes you have to restart a daemon that sprung a memory leak. No big deal on a server operating system. Mission Impossible on most network operating systems. How would you restart OSPF or BGP daemon on Cisco IOS? How about IPv6 RA daemon?
Occasionally, you might have to replace the buggy code that’s been giving you headache. Doing that on a typical Linux distribution is (relatively) easy – you download a new version of that package (and its dependencies), install it in test environment, check whether it works, and roll out the changes into production environment.
In the networking world, the vendors expect us to download a whole new version of the whole operating system (including all the other wonderful bugs … oops, features … they introduced in the meantime) just to fix a simple bug in one process. Nobody in his right mind would do that just because vendor TAC told them to do so – in environments that take networking seriously you’d have to go through a whole release validation and bug scrubbing process before deploying the new image.
It’s hard to replace the whole operating system with a new version without reloading the whole box, which (due to potential significant disruption) triggers all sorts of SNAFU-avoidance procedures (aka maintenance windows). The usual fix for the problem: additional complexity, this time in form of ISSU. Wouldn’t it be easier (and less error-prone) to give us the tools to patch the problematic software components without crashing the whole box?
Supposedly some vendors got the message and allow you to download bug fixes as small patches, but AFAIK Cumulus is the only one that fully embraced the Linux model and started packaging their operating system the way it should have been done: as independent Debian packages available for download from an online repository. Is anyone else doing the same thing? Please write a comment!
You can also get monolithic Cumulus images instead of individual packages. You’d obviously use the monolithic images for initial installations, and might prefer them for major upgrades… but at least you have options.
Finally, keep in mind that what I just described has nothing to do with the “horrors of monolithic vertically integrated stack” that SDN evangelists like to ramble about; it’s a simple consequence of 20-year-old way of delivering software (going all the way back to shipping EPROMs to the customers) that never changed.
Unfortunately, as I see startups launching new products using the same BFI approach, it seems we’ll be stuck with this nightmare for a long time – it looks like almost everyone working for a networking vendor (regardless of how many vendors or startups they worked for in their career) considers this outdated methodology best current practice.