Build the Next-Generation Data Center
6 week online course starting in spring 2017

Whitebox Switching and Industry Pundits

Industry press, networking blogs, vendor marketing whitepapers and analyst reports are full of grandiose claims of benefits of whitebox switching and hardware disaggregation. Do you ever wonder whether these people actually practice their theories?

There’s a simple litmus test: look at the laptop they’re using. Is it a no-name x86 clone or a brand name laptop? Is it running Linux or Windows… or are they using a MacBook with OSX because it works better than any alternative (even Windows runs better on a MacBook than on a low-cost craptop). Finally, are they using Android on a low-cost ODM phone or an iPhone?

If you're using Apple products while promoting whitebox switching, it would be nice to hear your reasoning. Please leave a comment!

43 comments:

  1. I'm typing this on a Macbook Air, but I'm effectively required to use Apple products at work, otherwise I'd use a bring-your-own craptop (I used to do that before I became a manager). Where I have the choice I try to buy the cheap stuff as long as the hardware is decent - in many segments this opportunity exists - and then make it work with Free/Open software where that exists in good-enough quality - also possible in many areas.

    So my phone *is* an Android phone (I'm too much of a coward to go all the way to Chinese no-name stuff that only runs Cyanogenmod, and too much of a cheapskate to but a "truly open" Firefox-or-whatever phone).

    And yes, I fully expect my next ToR switches will be called "Edge-Core" or "Quanta" rather than Arista, Cisco or Juniper...

    ReplyDelete
    Replies
    1. "I fully expect my next ToR switches will be called "Edge-Core" or "Quanta" rather than Arista, Cisco or Juniper..." - and I'm looking forward to a chat about your experiences ;))

      All the best!

      Delete
    2. Like Simon, I am obliged to use OSX on a Mac at work. However, all my servers, either on branded metal or vitualízed, are running either BSD or Linux, as do my private laptops.
      And in our Lab, we have installed our first 10 GE Edge-corE switch which runs ONIE and various Linux based network OS. So far, I am quite happy with what we experienced with it, though hoping that Open Network Linux will soon include a forwarding plane that is apparently in the works with the support of Broadcom and Big Switch, among others.

      Delete
  2. (disclaimer: I work at Cumulus Networks)

    It's a pets and cattle argument. My phone and laptop are pets, my network should be cattle.

    ReplyDelete
    Replies
    1. ... and this is probably the best possible answer. It also provides a nice rule-of-the-thumb: can you treat your network like cattle? If YES, all options are open, if NO, stick with what you have. Would you agree?

      Thank you!

      Delete
    2. (continued disclaimer: I work at Cumulus)
      I think there are more shades of grey, but in a sense, yes. Whitebox value comes in flexibility, price and ease of automation. If you are running EIGRP on Cat6k and managing with Cisco Security Manager, whitebox probably isn't a good fit for your org.

      The company needs both the technical drivers as well as the staff will to make the change.

      Delete
  3. I have been raising this question about whitebox / britebox switches and its penetration in the market to the Big-5 network vendors.

    I get a typical Steve Job's answer that the hardware and software should be owned by the same vendor/organization. I was told that the the vendor managing hardware and software do well in terms of silicon/chip failures from cosmic radiation. Apparently it seems cosmic radiation causes downtime and the Big-5 have some secret to prevent, which they claim the whitebox vendors will unable to do it.

    I was even presented mathematics of how whitebox vendors are losing money and how their business models are not good.

    Do you have any comparative study describing the pros and cons of using britebox/whitebox switches

    ReplyDelete
    Replies
    1. "I was told that the vendor managing hardware and software do well in terms of silicon/chip failures from cosmic radiation" - which is totally bogus.

      I would love to see a comparative study, but doubt we'll see an unbiased one in a long long time. We could, however, use Linux and its acceptance/success in various market segments as an approximation (keeping in mind Linux development started in 1991 ;).

      Delete
    2. (Disclaimer: I am co-founder/CTO at Cumulus)

      The cosmic radiation thing is hilariously bogus.

      Several of the name-brand switch vendors have their ODM partner ship directly from the factory to the customer. In this case, the only differences between the branded version and the Open Networking version are:
      1) ONIE preinstalled vs vendor OS preinstalled
      2) Whether or not the vendor logo is applied.

      Delete
    3. Here is one official TAC response I had saved from a long time ago:

      =====
      Apparently the cause of the ACE failure was due to an environmental
      condition causing a 1-bit flip which the ACE detected as a parity error.
      The probability of a bit flipping has to do with the level of background
      radiation, which is a function of the height above sea-level, the amount
      of concrete around, and many other environmental factors.

      This issue cannot be fixed until Cisco either does a significant Itasca
      re-design or moves the data-path to Miltons (which do not have SRAM).
      This re-design is about a year away.

      Currently a reboot is needed to resolve the problem, which the ACE did
      on its own. The ACE failed over has expected and is configured to
      maintain active sticky connections.
      =====

      Delete
  4. To be fair, IT industry press is always full of hype about the next big fad, and whitebox switching is nothing if not that.

    ReplyDelete
  5. I don't think it's a fad, even Cisco are making greater use of merchant silicon; they have to in order to be able to compete. I think the trend is clearly towards commodity hardware but the white box ecosystem is to immature for most organisations to derive any benefit from it just now.

    ReplyDelete
    Replies
    1. Merchant silicon != whitebox switching. Just saying ;)

      Delete
    2. I am aware of that. My point (perhaps poorly made) was that at a high level the trend appears to be away from hardware being a differentiator as evidenced by Cisco reducing its reliance on custom silicon.
      This trend seems to happen to almost all areas of technology: once cheap readily available hardware reaches a certain level of performance/capability, hardware itself becomes less of a differentiator and the software becomes the focus instead.
      Whitebox switching is starting to become a reality as the necessary hardware is on the cusp of reaching that level of capability (although as mentioned there are many areas in this ecosystem that need to develop for it to become more mainstream).

      Delete
  6. Ivan has a good point. Why don't some of the more vocal proponents eat their own dog food? (Pete had a very good counter point too). I've actually encountered a sales engineer running an unfamiliar brand laptops with various OS/es and VMs, and shells. It was impressive from a geek perspective, but to Ivan's point, it wasn't anywhere near as seamless or as polished as, say, a Mac or Windows experience. It required MORE technical geek prowess on the part of operator and yes, there was some embarrassment when things didn't work. In a nutshell, it made the poor guy look unprofessional and unprepared.

    If your business is completely risk-averse, it's unlikely you'll do more than look at whitebox. We are currently heavy users of vblock. It's all off the shelf stuff and familiar to our staff... could we have built vblocks on our own? Technically, yes... realistically, no. Our business process, our organization, and our risk aversion hampered a smooth adoption of private cloud, so we went the "managed" route. However... the experience has opened up minds and imagination. We have gradually assumed more responsibility for the vBlocks and reduced our suppport dependency. Now, Whitebox/britebox is mentioned in every roadmap discussion nowadays... and that's a good thing.

    My opinion, even if you aren't Facebook - with a laser-focused design goal and top notch support staff - you can still consider it. It doesn't always make sense to put Production-grade infrastructure in you non-production environments, so that is a good candidate to kick the tires on whitebox/britebox. If it works well, obviously the next step is to use it in production. The cost savings are pretty apparent, but there is absolutely some technical debt.

    ReplyDelete
  7. I'll bring my twitter comments to a forum that allows more than 140 characters.

    You assume two things falsely:
    1) Whitebox hardware is == to "craptop" hardware.
    2) Desktop OSs are consumed the same as infrastructure OSs.

    The reality:
    1) Most popular WB hardware comes off of the same assembly lines that comes off of and with the same ASIC. They just cut out the middle man and deliver the hardware directly to you. I've had WB switches running 24/7 for > 365 days with 0 component failures.
    2) Infrastructure focused operating systems are not about presenting things as "magic". Yes there are abstractions, but the engineer should understand and be able to manipulate these abstractions. The end user should see the network as magic (as my mom views her iPhone and MAC), but not the engineer.

    ReplyDelete
    Replies
    1. Apparently I can't edit my post but it replaced some gt/lt signs I had for a single space. Between "assembly lines that" and "comes off of" I had 'insert your favorite vendor here'.

      Delete
    2. Matt, I find user stories to be very valuable, often more so that vendor pitches and use cases. Are you able to share what WB solutions you chose? Who was also considered? What were the technical requirements and constraints?

      We are also considering WB for a few point solution roles (Cumulus/Dell, Pluribus, and a couple others) and we are hoping that they could eventually find their way into Production. Any experiences you can share would be helpful.

      Delete
    3. You should listen to the Software Gone Wild podcast ;) Here's the one with Matt sharing his experience:

      http://blog.ipspace.net/2014/10/cumulus-linux-in-real-life-on-software.html

      Delete
    4. Nice, he answered my questions in that podcast, more or less. I highly recommend it to anyone that hasn't listened to it yet. Thanks!

      Delete
  8. (Disclaimer: I am co-founder/CTO at Cumulus)

    I run Linux on my laptop. Though it is a Lenovo...

    ReplyDelete
    Replies
    1. Right answer ;) See you in a few hours!

      Delete
  9. I want Cumulus to do port1.100.200 briding to port2.200! qq

    ReplyDelete
  10. Taking all the marketing, hype, and other kind of c..p away, you're always looking at the lifecycle => You have to buy sth - implement that - and live with (support) that.
    I.e. it all boils down to
    - how much money do you want to spend on the equipment ?
    - what kind of suppport do you need/get from whoever/whatever you're using (and how well do you trust that/depend on and are willing to try)?
    - do you have skills (time/people) to do more or less on your own ?

    Best practice - try/implement new thingies in small/controlled environment and then scale if they prove their value (which is used here as a VERY wide term).
    And I agree, I see a lot of "fancy-gadget" owners preaching sth completely different. As if they've never heard of "practice what you preach".
    Anyhow, time will tell...
    Or as Heraclitus once said "The only constant in life is change".
    And that is the beauty :) but that's another story

    ReplyDelete
  11. Great Post Ivan! Here Here!!!!

    I like the cumulus linux example. Run that server OS as a network OS using server boxes and nics(to make a switch) or white box switches.

    Didn't we do that with Novell Netware, 3Com, and Banyan with server OS and server nics years ago? We even had Netframe custom solutions until the performance of that collapsed model peaked thus requiring purpose built devices/silicon from Bay, Cisco, etc of that time separating server, routing and switching out of that single box. Yes the x86 platform and memory is much cheaper and faster today but so is our thirst for bandwidth.
    So I will build a cheap network of white box x86 servers with standard nics and run my enterprise on it. uh okay. It is so cheap I can swap anything out at will without even troubleshooting.

    History repeating itself with Moores Law in a different dimension. LOL!

    ReplyDelete
    Replies
    1. I'm guessing you don't realize that every box on the Cumulus HCL uses a Broadcom ASIC for hardware forwarding? Just like Brocade, Arista, Nexus 9k for Cisco, etc. I also assume you don't realize that EOS, IOS-XE, JunOS etc are built on top of the very "Server OS" you keep referring to (Linux).

      Delete
    2. Yes, I do, - purpose built vs. whitebox was the point- from a Novell type to an IOS type monolithic based platform then to Linux and what's added on top for non monolithic purpose built platforms. IOS-XE on ASR hardware. Now to that Linux distro with the generic network kernel modules/daemons, the off the shelf box, throwing in some nics and away we go.


      There are going to be those special - read purpose built "bakes"- and if Cumulus gets the sauce right then they aren't quite white box and fall in with the rest of the purpose built, ASR, Junipter HW etc. I made reference to Cumulus for their cute video played it out that simply. a LinuxOS , nics and away we go.

      Is that IOS-XE Linux kernel, since it is open sourced tweaked or changed for the hardware or do they just grab a "ubuntu like distro" lol and load it up on ASR HW and throw on some networking kernel modules?

      Delete
  12. In the world of PC's, laptops and rack-mount servers are kind of at the opposite extremes of customization and coordination of OS & hardware.

    Laptops have always had more deep customization and special hardware or custom hardware which requires more software integration.

    Servers are typically sold without any software on them, and customers can install their own VMware, Windows or Linux on them to do what they want.

    A more relevant question would be to go into any datacenter (including your own, Ivan!) and look at the servers and ask whether they are running some custom version of OS that came with the server hardware. In the old days, this _was_ the case (with Sun, SGI, HP servers, IBM servers, mainframes, DEC, etc.). I know you love the lessons of history!

    Today, server hardware is interchangeable, and the key insight is not whether they come from some name-brand server company (HP, Dell, etc.) or white box vendor, but that the software and hardware are bought and managed separately, and that this model is far superior to the old one where vendors tried to sell you both.

    Server companies are not selling you software, and they are not building you software, at least not enterprise-grade server OS's. Ask yourself why that is, and then ask why networking is fundamentally different to justify a perpetual connection between hardware and software.

    ReplyDelete
  13. For apps that have been built from the ground up to tolerate and expect failures, whitebox is great. I think this is the way things should be done. Plan for failure. For the rest of the market where near perfection (or in many cases perfection itself) is expected/demanded from the network while shoddy apps are allowed to sit in pride of place, nothing but the high end major brands is going to cut it.

    If your apps are right then the hardware is going to matter a lot less then it has in days past.

    ReplyDelete
    Replies
    1. Another great summary ;) Thank you!

      Delete
    2. Second that the the old do you Architect your applications around the network or Architect the network around your applications conundrum.

      Delete
    3. Why will only "high end major brands" cut it? Are they more reliable? If so what data are you using to back up this statement? Is it because of features? If so which features?

      Delete
    4. This is the same baloney that the Sun/Solaris guys said 15 years ago about Linux/VMware/Windows ... today, Sun and Solaris are pretty much gone, and 99% of the datacenters are running Linux/VMware/Windows on commodity x86 servers.

      The whole notion that there is something shoddy about the hardware or software vs. the legacy vendors is weird, given the quality of the software coming from them. Solaris was a way better engineered OS than anything coming from Cisco today, and Solaris is still gone today because Linux and Windows overtook it.

      Delete
    5. My earlier reply went missing. It was a bit long but I swore I published it.

      Delete
    6. Blogger thought you were a spammer - probably because you mentioned Novell ;)) Restored the comment.

      Delete
  14. That is a good point Anonymous, remember Windows and Linux are mainly for x86 architecture(RISC and ARM different story) written for specific register sets of the CPU, BIOS interrupt calls, chipset interrupt calls etc. for a standard computing platform. The network OS was tied to purpose built HW that have chipsets/asics with specific buffer, bus, fabric arbitrations algorithms and holding spaces, different interrupts calling mechanisms, queues sizes, algorithms, virtual input and output queues, parsers, latency buffers, arbiters all for L2 or L3 address lookup/ hit miss cycles etc. for specific link types for specific expected service/performance results. You take that purpose built HW and SW for it connect/architect those together and your network will perform a specific way as planned. You are not going to put a POS interface in an x86 box and run that as a core Linux router?

    That is why server OS runs well on commodity server hw - it was designed for it.
    Now on one hand you can build a network with all off the shelf x86 HW and NICs throw/compile a Linux distro on them for a routed or switched solution and it will operate consistency since all the nics and buffers et al. are the same(as long as you get the exact cards). There you go, but how well will it perform vs. purpose built network components and how much can you expand if you wanted to introduce other features? There are some of those large clusters using x86 commodity and in one way they are purpose built out of commodity HW/SW but in bulk scale to achieve a figure of merit at a lower cost than that of a smaller but more expensive solution. Some call it HPCC but some of those use specialized HW too.

    I guess my point is now it is true that purpose built network HW and overbuilt network OS(read features you purchased but never use) does contribute to the cost but the decoupling via virtualization, SDN and the use of purpose built HW for those performance features but with non monolithic module Network OS so you can “bolt on” features when you need them provide a decent balance today.

    Remember Novell NLMs, they had an IP one, SNA GW one et. al. – seems familiar.

    You are not going to put a POS interface in an x86 box and run that as a core Linux router?

    A trading floor network won’t do this.

    Can you build an 802.11 wireless network this way too, yes but why would you. In 03, I used x86 as a low cost 802.11b 10mb PtP solution for a client that provided a link between buildings of a small business and it handled VoIP. I used commodity HW/SW off the shelf wifi nics, did the RF/Spectrum analysis, Fresnel zone testing, a couple of cheap directional(almost coffee/Pringles canned it too;) )
    It worked and it worked well, it met its figure of merit on performance and cost but that was all you would get out of it, no scalability outside of upgrading nics to a higher 802.11 spec. or adding multiple PtP boxes which would get clumsy compared to IR or purpose built Wifi solutions.

    Look at the Summary statement in one of Ivan's earlier posts about Brocade to see what I mean. There is a reason why whitebox is not the panacea.

    http://blog.ipspace.net/2014/03/per-packet-load-balancing-interferes.html

    ReplyDelete
  15. Buying based on emotions leads to very different results than buying based on reasoning. Laptops and phones need to be pretty, sexy, smart and crafted. Switches need low TCO and minimum headache.

    TCO to headache ratio of whitebox will be better than that of the incumbents. Whitebox already wins in web-scale scenarios and the other use cases will be won soon enough since the pace of open source innovation will make them catch and overrun the incumbents.
    --
    Sent from my iPad

    ReplyDelete
  16. As a related tangent, a great example of someone that practices exactly what they preach: https://stallman.org/stallman-computing.htm

    ReplyDelete
  17. I really meant: https://stallman.org/stallman-computing.html =)

    ReplyDelete
  18. Nick I love it. LOL, Ivan thanks, yes well it could have been MS Lan-Manager or Worse IBMs Net server or OS/2

    But to add fodder for your initial topic saw this on linked in today
    Facebook just fired another big shot at Cisco — and dissed it a little, too

    Read more: http://www.businessinsider.com/facebook-releases-6-pack-switch-in-shot-at-cisco-2015-2#ixzz3RaM1oKm0


    http://www.businessinsider.com/facebook-releases-6-pack-switch-in-shot-at-cisco-2015-2?utm_source=linkedin-ticker&utm_medium=referral

    ReplyDelete
  19. Why bother about hardware quality when you have designed your network/application to tolerate failures? As long as as it does the job, meets your requirements, the MTBF is acceptable and the price is right why do you care about the vendor? We shouldn't be scared of failures, after all, we spent good time planning our network to not have a SPOF. Or maybe the problem here is that we don't trust our protocols/designs?

    ReplyDelete
  20. I'm running Quanta whitebox switches in a large production network, all running a mix of Cumulus Linux 2.2/2.5. Each of the switches costs less than $5k, with 48 10gigs, and 6 40gig ports. Automation-wise, we deploy and control our configs using Ansible. There's absolutely no way to achieve anything close to this port density from the big vendors for this much money.

    Not just that, but I don't think your comparison of Juniper/Cisco being apple, and the Quantas being no one is valid. The value-add in disaggregating the hardware and software is that you got a platform (Debian Linux) that's actually more stable than the proprietary flavors still running on modern network hardware.

    Not just that, but the chipsets running most of the modern Nexus gear (as an example) is the same Broadcom Trident 2 you find in the Quanta/Penguin switches.

    ReplyDelete
    Replies
    1. I don't think you got my point. Macbooks are running on x86 just like Windows craptops.

      Delete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.