Build the Next-Generation Data Center
6 week online course starting in spring 2017

Tail-f Network Control System – the First Impressions

One of the most pleasant surprises of the recent Interop show was the Tail-f's Network Control System (NCS). I “knew” Carl Moberg (of the NETCONF and YANG fame) for a long time and had the privilege to meet him in person just before the SDN Buyer's Guide panel that I co-hosted with Kurt Marko (who did an excellent job putting the buyer's guide together). Anyhow, what Carl presented during the panel totally blew me away.

Tail-f built a service provisioning platform. Yeah, I know, that's a boring topic, we've seen so many of them - they are either too simplistic to be useful, or too expensive and require as much customization as a typical SAP deployment (read: it stays forever in the "almost done" state and you never get rid of the consultants). What's interesting is the way Tail-f approached the problem.

Network Control System describes services you want to offer in your network in YANG. It has a large library of device models (routers, switches, firewalls, load balancers …) also described in YANG. Their "magic glue" ties the two - when you deploy a new service for a customer, NCS automatically figures out what needs to be done in individual devices. No surprises there; I would use the same architecture.

Tail-f realized they have to live in real world if they want to make real-life revenue from real service providers. NCS thus doesn't rely on OpenFlow or any other emerging technology but supports a very wide range of device configuration mechanisms, including OpenFlow, NETCONF, SNMP (yes, there are still boxes out there using the Wellfleet model of SNMP-based configuration) and CLI. No real surprises there either; they're smart realistic Swedes.

As one would expect, NCS offers web-based UI and numerous northbound APIs (NETCONF, REST, Java) … but also a network-wide CLI. Imagine being able to configure the services on the whole network (not just on the switches like you can do with QFabric) through a single CLI management point, and being able to do diffs to figure out what changed - network wide. How cool is that?

However, what really astonished me was a single implementation detail: once you create a new service (using whichever northbound configuration mechanism), NCS configures the network devices in an all-or-nothing (ACID) transaction using two-phase commit and doing full rollback if a single device configuration fails. Network-wide ACID transactions? Wow.

Want to know more? Me too. I'm currently waiting for Carl to send me more details; expect more blog posts once I digest them.

6 comments:

  1. I saw their website a while back, and checked out some introduction videos of them on YouTube, but there's still a foggy cloud surrounding their product to me. Imagine a SP network with 100+ routers, how do they keep the webinterface to-the-point and understandable? The way the WebUI looks to me right now is that it's something for ~10 devices. Apart from that, how does it handle multiple users working at the same time at configurations? Either through their WebUI or directly from device CLI?

    ReplyDelete
  2. Have mixed feelings about Tail-f. To me their solution looks like device api normalizer by Yang. What's still missing is network services modelling and by network service I mean service that spans more than one device. You need to understand topology, and service overlays. One of attempts to model network services is Quantum in OpenStack - but still very rudimentary.....

    ReplyDelete
  3. Dear Anonymous #1,

    Actually, we do have NCS up and running in real networks with 100+ routers without any concerns from the customers regarding the usability of the web interface. Now, if you've looked closer at NCS (and if you haven't, I'll be happy to show you!) you know that the Web UI is driven by the device and service models providing a comfortable development environment based on the latest and greatest for web developers (e.g. bootstrap, backbone) so they can tailor the look and feel their hearts content.

    Happy also that you bring up management of change collisions through the northbound APIs (including across CLI, REST, Java API, etc). This is a first class feature in any transaction-oriented system like NCS. We have pretty cool demos that show live collision detection across all the interfaces allowing for controlled remediation.

    Managing out-of-band changes is also fundamental to our customers and a key part of NCS. We use very clever (if I may say so) check-sync features combined with the ability to remediate configuration changes to- or from the network. This is a point of very strong opinion among most network network engineers and the ability to allow the manager to overwrite the network OR the opposite is key. But it gets better. Since we have a strong association between network device configuration and service instances, we can also trivially show service impact information and allow that to be driving the decision of whether to overwrite the network or take the changes into the system. That's actually one of the highlights of our current demo and normally gets people to lean forward :-)

    Again; I'll be more than happy to show you this in more detail! Find me at calle@tail-f.com or @cmoberg.

    ReplyDelete
  4. Dear Anonymous #2,

    When you write:

    "What's still missing is network services modelling and by network service I mean service that spans more than one device. You need to understand topology, and service overlays."

    ...you're actually hitting a pretty decent description of what NCS does.

    We have large service providers and data centers running these types of services in production. We have gone through a very large SPs whole service portfolio (ranging from triple play for consumers to VPNs for businesses) and encoded them in YANG. It was *very* interesting I'll tell you :-) And it worked very well. And we have several other examples including BGP peering/transit services and datacenter multi-tenancy/service chaining (yes, including service insertion and retraction in real time across multiple appliance vendors).

    OpenStack Networking's (we're supposed to remind each other not to call it "Quantum" any more, right :-) view of the network world is currently a little too constrained in my mind. And it gets complicated in that it's a part of how OSN is designed at the core. Some activities going on in this area (e.g. the ML2 blueprint) and we are working on contributing solutions here as well. Stay tuned.

    And my invitation to demo is of course applicable generally. You too should feel free to find me at calle@tail-f.com or @cmoberg.

    ReplyDelete
  5. calle@tail-f.com or @cmoberg, you guys must be mighty Swedes! :-)
    " We have gone through a very large SPs whole service portfolio ... It was *very* interesting I'll tell you".

    I would have loved to have been there through that exercise, I did it with a telco and their SONET infrastructure 10 years ago and we never got anything of much value out of it.

    Definitely an intriguing product, hope to see it in action some day.

    ReplyDelete
  6. Congratulations to Tail-f & Cisco on the acquisition

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.