Pluribus Networks… 2 Years Later

Thursday, November 16, 2017 09:12 +0100

Pluribus Networks… 2 Years Later

I first met Pluribus Networks 2.5 years ago during their Networking Field Day 9 presentation, which turned controversial enough that I was advised not to wear the same sweater during NFD16 to avoid jinxing another presentation (I also admit to be a bit biased in those days based on marketing deja-moo from a Pluribus sales guy I’d been exposed to during a customer engagement).

Pluribus NFD16 presentations were better; here’s what I got from them:

The rest of this blog post is based on information Pluribus shared during their NFD16 presentation. If I misunderstood or missed something please write a comment.

They pivoted from “we’re having this great hardware with server-like capabilities on every switch” to “we’re having this fabric solution that can run on whitebox switches”, losing most of their potentially-interesting traffic analysis capabilities (the path between switching ASIC and CPU is laughably slow in most whitebox switches, and the CPUs are most often ridiculously underpowered to get the cheapest possible product);
They are implementing proprietary edge fabric using VXLAN encapsulation on top of a generic layer-3 underlay. Think NSX but implemented in a ToR switch instead of hypervisor (where it belongs). Not something that would get me excited when everyone else works on having eventually interoperable EVPN.
As with every proprietary solution (including Cisco ACI, VMware NSX or Juniper Virtual Chassis Fabric or QFabric), it's an all-or-nothing game regardless of how you want to spin it. Either you're using all-Pluribus edge or you need a Pluribus-to-outside gateway. I don’t remember them mentioning any interoperability feature like EVPN gateway.
Control plane architecture is pretty traditional - each device is an independent router/bridge, and they use anycast gateway to support VM mobility (par for the course these days). Nothing to see here (which is a good thing).
The only value-add feature I saw is fabric-wide management and provisioning. That's a really hard problem (Brocade got burned on this one and needed years to get it working with VCS Fabric), and in Pluribus case it requires 3-phase commit and puts fabric into read-only mode on partitioning. Compare that to Cisco ACI where only one part of the fabric goes into read-only mode, and syncs with the read-write part of the fabric once partitioning is removed.
I hated how they tried to avoid answering the "what happens when fabric partitions" question for most of the presentation, and tried to get around it with the "use redundant links" red herring.

Why would I want to go down a proprietary path and risk being locked out of the fabric just to get a single-point-of-configuration (which is also a single-point-of-disaster when fat fingers strike)? The only reason I see is because I'm not good at standardizing and automating stuff but want to manage my fabric the traditional way - through manual CLI. That makes them interesting to mid-range market (people who can't figure out how to automate while still having more money to spend on boxes than cheap FTEs to burn configuring them), which is also the craziest part of the market in terms of expectations and featuritis, and probably toughest to get into if you're a startup.

Explore!

Interested in different viewpoints? Try these blog posts from fellow NFD16 delegates:

Want to know more about other data center switching vendors? You’ll find an in-depth overview of what the major ones are doing in the Data Center Fabric Architectures webinar.

Want to learn how to build a leaf-and-spine fabric? You’ll find all the details you need, plus a vibrant support community, and hands-on exercises in Designing and Building Data Center Fabrics online course.

How about mastering more than just data center networking? Check out Building Next-Generation Data Center online course.

Finally a disclosure: I attended the Pluribus presentation as part of Networking Field Day 16 event.

1 comments:

Unknown 28 November 2017 17:03

Hi Ivan,

This is Marco Pessi with Pluribus. Thanks for your article first of all!
I do not think you have misunderstood anything from the NFD16 session. The content and presenters were chosen to provide a comprehensive overview of our different technologies to the broad NFD community, as we thought that, after a long 2.5 years break, there was a need to start over.

We were also aware that this would have led to some information gaps, and unfortunately the technical documentation on our website is too incomplete (and I really have to apologize for that), to be used to complement the information you grasped during our presentation.

Let me make two quick comments: (1) we do have an EVPN gateway in our roadmap and (2) we support write-read partitions for our fabric.
I will post a new comment when more documentation will be available and/or hopefully we will be able to demo/discuss these topics at next NFD

Let me make now two longer comments:
(1) Regarding your statement that CPUs are "laughably slow”…please note that the new generation of white boxes based on Broadwell have server grade operating systems and can provide flow telemetry for 5000 sessions per second. We do provide our own brite-boxes with slightly different components and that enable additional CPU lines available on certain merchant silicon asic. This results in additional capabilities: L4-L7 flow analytics and, on the networking side, increased virtualization capacity, for example: number of containers (router instances)

(2) I am not sure I follow your comment about Pluribus fabric being useful only to who is not good at automating. Pluribus fabric presents a high level abstraction of network objects, that can be managed with either CLI or REST API. For example, a subnet with anycast gateway is a single object (provisioned with either a single configuration line or REST API command), versus a single object on every switch for the majority of other vendor implementations: programming a network of 40 switches and 400 subnets with Pluribus requires provisioning 400 objects, with other vendors it is 16000 objects: if you allow me the analogy, it is a bit like scripting versus programming with Assembly. I agree I can do more damage with a higher level of abstraction, but my point is that handling exceptions and troubleshooting inconsistencies with a huge provisioning state is no fun either and results in higher operational costs.
My opinion is that good automation can hardly solve problems of poorly programmable components: automation works well if it is holistically implemented at all components and layers. One of our goals at Pluribus is to provide a consistent and programmable network layer using white/brite boxes.
Still, to your point, if customers have already invested in an automation framework that relies on box-by-box management and do not see enough value in the fabric, we can provide a cheaper Netvisor license that does not include the fabric.

Thanks for reading my comments and looking forward to hosting you again at our offices (and please feel comfortable to wear anything you like!)

Best!
~Marco

Explore!

Recent posts in the same categories

data center

fabric

1 comments: