Model-Driven Telemetry Isn’t as New as Some People Think

During the Campus Evolution with Cat9K presentation (I hope I got it right - the whole event was an absolute overload) the presenter mentioned the benefits of brand-new model-driven telemetry, which immediately caused me to put my academic hat on and state that we had model-driven telemetry for at least 30 years.

Don’t believe me? Have you ever looked at an SNMP MIB description? Did it look like random prose to you or did it seem to have some internal structure?

As I explained in the Data Models and Data Stores part of Building Network Automation Solutions online course and in Intent-Driven Networking and Data Models part of Network Automation Use Cases webinar (included in ipSpace.net webinar subscription), data model is nothing more (and nothing less) than a description of a data structure, and the first data models we had in networking were SNMP MIBs, currently described in a data description language called SMIv2.

Long story short: please stop talking about the beauties of model-driven telemetry in 2017. It makes you look silly to anyone who actually understands what you’re talking about.

So what was the presenter trying to tell us? The real change from the days of SNMP is the delivery paradigm - we moved from polling to streaming. Sounds like Latin? It’s really easy:

Polling - the network management system wants to know a value of one or more variables (example: interface counters) known to the networking device. It sends a request listing the variables it’s interested in, the device parses the request, collects the values, and returns them in a reply. The whole process is repeated every time the network management system feels it’s time to get a new set of values.

Streaming - the network management system knows it periodically needs values of a set of variables, so it tells the device what it needs, and the device sends the values without being asked (the streaming part) either when the variables change or at regular intervals.

Streaming telemetry reduces the load on both network management system (because the periodic polling is gone) and the network device (because it doesn't have to parse the same request for the same set of variables every few seconds).

Other fringe benefits: YANG has richer data structures than SMIv2 (not everything is a tree), making a few things a bit easier.

Which standards do you want to use today?

The streaming telemetry is becoming standardized and as is often the case, the beauty of having standards is that there are so many to choose from:

  • The data model describing the telemetry data is usually described in a YANG data model. In the ideal world that model would be defined by an independent organization (where you can choose between IETF and OpenConfig). In reality, a lot of stuff uses vendor data models (think vendor-specific MIBs);
  • The encoding of data could be done using JSON, XML (both are text formats) or Protocol Buffers (a binary format);
  • The data could be sent over NETCONF, RESTCONF or gRPC transport.

Sounds like another incarnation of SIP? You’re not exactly wrong ;)

Does it matter? Absolutely. More about that in another blog post. In the meantime (as I started with a Cisco Live presentation), it’s worth noting that Cisco IOS XE supports NETCONF now, RESTCONF soon, and gRPC sometime in the future.

8 comments:

  1. I knew it as pushing/pulling (streaming/polling)
  2. In regards to "Cisco IOS XE supports NETCONF now", have there been any recent developments around that? Last time I looked at Netconf on XE, it was pathetically incomplete - dumb things like getting completely unstructured config text back when requesting the configuration. But it has been a couple of years. At the time, it felt very much like cisco needed to check a "supports NETCONF" box in an RFP, so they threw together the bare minimum function and called it good.
    Replies
    1. Worse still the Yang schema (carried over netconf) for XE is different to XR. If there was ever a protocol/language that should have resolved the issues around OS compatibility within the same vendor, Netconf Yang is what should have addressed this. As a customer I'd recommend you message that to your Cisco points of contact and the wider industry because it won't get fixed until people escalate.

      There are reasons this happens and there are also reasons it should not happen.
    2. NETCONF on XE got much better in the meantime as they ported Tail-f's confd to IOS XE.

      http://blog.ipspace.net/2017/04/netconf-agents-on-cisco-ios-xe-16x.html

      Still not much better than lipstick on a pig; for example, there's no candidate config and commit because you cannot get that done on IOS at all.

      http://blog.ipspace.net/2017/03/netconf-transactional-consistency-on.html

  3. Whilst I don't disagree with your statements on old vs new regarding telemetry. I think it's worth pointing out that Telemetry is a publisher & subscriber model. What that means for the average Joe is that you don't need to configure the device for each mib feature, you ask it via API to send specific or generic feeds on events/ data. Now that also is not that new, but what does change is the need for essentially a telemetry proxy / cache. Whilst traditional SNMP trap or Netflow feeds limit the number of destination targets, telemetry does not. There will normally be multiple subscribers to the same telemetry data. And this could result in race conditions. I think the next big change you'll see messaged in this area is the move to a central cluster of telemetry cache systems. Kafka is one example of these. The main point being the systems using telemetry wont subscribe direct to the device they'll need to request it via the proxy/gateway. This is something all the vendors are still coming to terms with. It may also unfortunately result in periods of vendors lock in until the market forces better standards and the standards groups start to catch up on the industry. One of the great things about SNMP and other legacy MANO protocols is they have a reasonable level of software neutrality. That won't be the case for a while in the telemetry world.
    I'd also point out that the server and platform SaaS vendors have been doing telemetry for years...nee gRPC Openconfig. The network industry is in catch-up mode and not really seeing customers flex their muscle with requests to conform to more generic telemetry capture tools and standards.
  4. The word "telemetry" inherently means a push or streaming model of delivery instead of polling. Most already realize "model-driven" isn't new, like you said SNMP has been using data model structures for 20+ years. It's the "telemetry" portion of "Model-Driven Telemetry" or "Juniper Telemetry Interface" that's important and new. Of course that word risks the same kind of vague definitions and dilution of meaning as "sdn" and "cloud" before it.

    The gNMI spec covers a standard encoding method, at least when using GPB for telemetry data. Vendors are working on implementing that now and at least we may have a key/value format that is standard across vendors even if the paths and values are not. Standard models like OC or IETF solve some of those issues where possible.
    Replies
    1. Telemetry: the science or process of collecting information about objects that are far away and sending the information somewhere electronically

      https://dictionary.cambridge.org/dictionary/english/telemetry

      Not sure everyone would understand that to imply push or streaming.
  5. People disregards SNMP trap and only comparing between SNMP polling versus telemetry.

    it needs to define more comparison between two protocol types rather than push versus poll.

    SNMP trap is pushing mechansim so really it confuses people this way.
Add comment
Sidebar