Network Automation RFP Requirements

After finishing the network automation part of a recent SDN workshop I told the attendees “Vote with your wallet. If your current vendor doesn’t support the network automation functionality you need, move on.

Not surprisingly, the next question was “And what shall we ask for?” Here’s a short list of ideas, please add yours in comments.

The Pass/Fail information included below was collected in 2016/2017 to the best of my knowledge with extensive help from Jason Edelman, Nick Buraglio and David Barroso (THANK YOU!). If you feel the information is incorrect, please point me to public documentation that proves me wrong. Also, do provide information on other platforms.

Programmable Interface (API)

The device MUST have an on-device programmable API (NETCONF or REST) that allows an external script to:

  • Get device configuration
  • Get operational data
  • Change device configuration.

I don’t want to hear about “solutions” that insert layers of kludges between my script and the device I want to manage. If I can’t access the device itself using NETCONF or REST I’m no longer interested. After all, my calendar is showing 2016.

Pass: Most networking vendors, at least in recent software releases.

Fail: List your grievances in the comments ;)

Structured Operational Data

The device MUST return operational data as structured data (JSON or XML format) not as text printouts wrapped in XML or JSON envelopes.

I had enough of screen scraping in the 30 years I had to deal with networking devices. I don’t want to write another Expect script or TextFSM definition. My calendar is still showing 2016.

Pass: Junos, Nexus OS, Arista EOS, Brocade VDX, ALU/Nokia

Fail: Cisco IOS

Nice try: Cisco IOS XE with REST API (it returns a minimalistic set of operational data, see also feature parity below).

The importance of structured operational data is covered in in module 2 (Easy Wins) of Building Network Automation Solutions online course.

Device Configuration in Structured Format

The device SHOULD return its configuration in structured format (JSON or XML) with meaningful structure (for example, ACL lines should be within the ACL).

I don’t know why I should write another configuration scraping program to figure out what BGP neighbors a device has if I could do the same thing with a simple walk down the return object. I had enough Perl Regexps for one life.

Pass: Junos, ALU/Nokia, Cisco IOS XE release 16.

Mostly there: Cisco IOS and IOS-XE (prior to release 16).

Atomic configuration changes

Changes to device configuration MUST be atomic, more so if the device supports NETCONF – either all the submitted changes are accepted or none is.

I really don’t care if I can get that done in a NETCONF session with commit capability or as a single huge REST call, but I don’t want to be cut off the box once again because the box accepted only half the ACL.

Pass: Junos, IOS XR, Arista EOS

Almost: Cisco IOS XE. REST interface is atomic within a single call, as is NETCONF implementation in release 16.x which implements rollback-on-error.

Fail: Cisco IOS, Nexus OS

Configuration Rollback

The device MUST support rollback to a previous configuration.

If I made a mistake, I want to be able to go back to a previous configuration without spending hours hand-crafting the differences between the mess I made and the configuration that worked before I started messing it up.

Pass: Junos, IOS XR, Arista EOS, Cisco IOS, Nexus OS, ALU/Nokia

Configuration Replace

The device MUST support replacing current configuration with a new configuration without a reload.

Sometimes I really don’t want to waste my time calculating the differences that have to be made to get the device to do what I want, particularly when I create the whole configuration with a template.

Pass: Junos, IOS XR, Arista EOS, Cisco IOS/XE, Nexus OS

The importance of configuration replace functionality is the focus of module 4 (Changing Network Configuration or State) of Building Network Automation Solutions online course.

Configuration diff

The device SHOULD be able to create a list of configuration commands needed to transform one configuration into another.

It’s great if you can point out the differences between two configurations to the engineer who has to approve the change. Oh, and I’m looking for the list of commands to get from A to B. I can run a diff on Linux myself.

Pass: Junos, Cisco IOS

Fail: Most everyone else. Many platforms use standard Linux diff instead of considering configuration context.

For example, if I change BGP AS number, I’d like to see no router bgp A followed by router bgp B followed by whole BGP configuration.

Support for Industry-Standard Models

The device SHOULD support industry-standard configuration data models (IETF and/or OpenConfig).

We waited long enough to get them. I don’t want to wait another decade for the vendors to implement them.

Pass: Junos, Arista EOS (OpenConfig), Nexus OS (OpenConfig), IOS XE (IETF), IOS XR (OpenConfig)

Warning: While most vendors support some industry standard, always check out what can be configured through the standard models.

The benefits of industry-standard data models are described in module 3 (Data Models) of Building Network Automation Solutions online course.

Feature Parity

Paraphrasing Ron Broersma: All functionality requested in the RFP must be fully supported by the device API and meet the above requirements.

Anything Else?

I probably forgot a few critical requirements. Please list them in the comments.

Want to Know More?

Check out the network automation webinars and the Building Network Automation Solutions online course.

Revision History

2023-03-12
Removed all references to Brocade VDX which has been obsolete for years.

Latest blog posts in CLI versus API series

23 comments:

  1. What I'm actually missing today from network OS is seamlessly running automation software agents, for instance chef-client, puppet slave, etc.. OpenSwitch tried that, but still missing this. Cumulus doesn't have this too, only agent-less is supported. Of course, you can make it work, but it would be workaround, not just built-in solution.
    Replies
    1. Donatas, you know that for big networks, having agents won't make automation scale. Agentless is still better in that it uses the built-in/existing protocol. Also, from a cost perspective, agents can be expensive.
    2. There are a few problems with this request in that you are asking network vendors to
      A.) Support "CM" tools that were created for server systems and their state, not configuration
      B.) Use these tools to provide "CM" for configuration - which is what you are mostly concerned with on network devices.
      C.) Idempotency for configuration, without context, is much harder than idempotency for state - i.e. is service X running, is pkg Y installed, etc.
      D.) As Leke mentioned, scaling the disposition and logistics of these agents is very tough when you get into the thousands of devices.

      All that being said, there are several vendors that support both Puppet and Chef in one form or another. Junos supports them on all but the SRX and PTX platforms as of their latest release.

      Both Ansible and Saltstack are much better ways of attacking network state/configuration management than Puppet and Chef. Ansible doesn't require agents at all and SaltStack has a great proxy system that allows for their proxy agents to work without much configuration at all and work against any device you want it to, regardless of vendor. You just have to have the modules - the same way you do for Ansible anyway.

      This is a case of missing the forest for the trees. You are narrowing down your focus for automation based on the set criteria of a product. And, that product wasn't even meant to do what you want in the first place. To really get the most out of automation, what Ivan has put together is a base set of must haves or don't even attempt it. If you are not looking at the capabilities of the vendor OS from the perspective of what you need to accomplish business (not technical) objectives via automation, then you are missing said forest.
  2. So from this list Junos seems the must mature solution in 2016, and Cisco IOS falls behind as usual.
  3. No intermediate abstract software from OEM for automation. We tried doing automating Alteon SLB using vDirect. But the catch is you have to create template residing in vDirect and API calls to be built in our automation software. Moreoever template should be built in a specific language - Apache Velocity. This counter-intuitively complicates the entire automation process, so we rolled back to CLI based screen-scraping automation.
  4. * Well documented libraries to work with those APIs. I don't have to use ncclient and python requests to work with Netconf and Rest
    * Seconding Donatas above - network OSs have to support standard CM clients not just in powerpoint (even though cumulus does work with standard chef client)
    * Network OSs should have x86 virtual replicas of major hw products. It's about time network engineers had their own dev and staging environments, just like programmers
    Replies
    1. Juniper does all 3 of those. What are you using?
    2. Something other than Juniper obviously :)
      Unfortunately ENT campus/DC are dominated by other vendors
    3. To quote the article - “Vote with your wallet. If your current vendor doesn’t support the network automation functionality you need, move on.”

      If the dominant vendor doesn't support the features you need, don't support them.
  5. In addition to support standard models it must at least provide schema models with documentation, if possible in YANG since it seems to be current way.
    Also we are talking about devops, telemetry is to be supported with support of http push and at least one protocol for asynchronous messaging (amqp, xmpp, mqtt, Kafka...)
  6. I came across this, while preparing something similar, but at a higher level and had posted requesting comments in the network to code slack team:

    ü The device should lend itself to virtualization, as deployed in production - so, if it is a firewall, it should be 'virutalizable' with all production features (multi-context, transparent firewall etc)
    ü The device should be created with an api-first approach (especially if it is a closed vendor). If there is a feature on the product, it should be accessible via an API
    ü If the enterprise intends to manage the device using a centrallized controller (BigIQ, CSM etc), every feature on that management platfrom should be available as a north-bound API, consumable by automation tools


    ====
    However, since then, I have been rethinking the API piece. Given that we look up to our *nix pioneers as standard bearers for system automation, why do we demand it? I am now more inclined to think, that the API mandate should only be if the vendor OS is a closed system. If an open system vendor, creates APIs for applications running on their system (say for BGP configs) - kudos to them, but no longer think that should be mandated. Something like Ansible could be the 'API broker' for higher level workflow tools, to interact with the services on that platform....

    Thoughts?
  7. Cisco is big time into Model Driven Manageability, with support for well defined data models for both configuration and operational data. Cisco provides development kits that allow for manipulation of the data models programmatically.

    Cisco does support retrieval of Structured Operational data on IOS-XR and Nexus platforms in the recent releases. The operational data can be streamed out from the router and received by a client with a push model, rather than the pull model normally supported with SNMP. The telemetry stream can formatted in JSON, Google Protocol Buffers or Google KeyValue Protocol Buffer formats. The streaming telemetry is supported using the non-proprietary Open Config Telemetry model for subscribing to the operational data that the user is interested in. Most of the Open Config models are supported and Cisco native models are supported for other areas that don't have either OC models or the OC models are still being worked out. The subscriptions can themselves be made over Netconf/XML or Google RPC session to the router.

    Cisco also supports structured configuration data by the way of ITEF/Open Config/Cisco Native Yang Models over a Netconf session.
    Cisco also supports Google RPC mechanism to push a config change structured as a JSON object to the router.

    Cisco also has built and open sourced a framework called YDK (Yang Development Kit) that allows a user to compile the yang models into objects in a language like python (other language bindings are being worked on). The user is then able to manipulate the config on the router by programmatically setting attributes on the config objects and performing a CRUD operation to write the data to the router to have the config take affect.
    Replies
    1. Dear Anonymous,

      Thanks for a marketing manifesto ;) If you'd have shared your contact details or contacted me offline, we could add IOS XR to the lists. Alas...

      Now for the details:

      "Cisco is big time into Model Driven Manageability" << what counts for me is what's shipping and documented. Big-time statements and visions are nice, executing on them is even better.

      "Retrieval of Structured Operational data on IOS-XR and Nexus" << Nexus OS is in the list. See above for IOS-XR.

      Streaming telemetry - interesting, but not the topic of this blog post.

      Open Config and IETF models - mentioned.

      Structured configuration data - Cisco has at least four different network operating systems, so please specify which one(s) support it. The last time I checked Nexus OS didn't even have "get-config" NETCONF command. I know that has been added, but I haven't tested what it returns yet. Checking how XML configuration looks in latest versions of IOS XE is already on my to-do list.
    2. Only a lowly engineer, who labors on the code and is not authorized to speak for the company, and hence the anonymity. If you want to disparage an honest source of info, so be it.

      Looks like you are are paid shill for Brocade based on the quote earlier in your blog "The Pass/Fail information included below was collected to the best of my knowledge with extensive help from Jason Edelman, Nick Buraglio, David Barroso and several Brocade engineers (THANK YOU!)." .

      This is the last post from me.
    3. Response here: http://blog.ipspace.net/2016/11/breaking-news-im-vendor-shill.html
    4. Ivan, you must work on your Brocade shilling. You had them failing most of the categories.
  8. Partial: IOS XE (IETF), IOS XR (OpenConfig)
    This says a lot about Architecture and Standards in Vendors...if you can't get it right within a vendor, how are you going to adapt to market standards.
  9. Great post Ivan!
  10. Hello Ivan! First of all thank you for the post. It is very good idea to make the structured list of requirements to classify the current situation with the vendors gear. And it is important to the community to have got the minimally adequate list of the requirements to be able further to automate device configuration in the right manner.

    DISCLAIMER: I am not represent/work for any vendor of the equipment. It is just my experience. If any of my thoughts are not right, may be I have used not right tool for this or I have not enough information about the vendor devices, because it is not publicly available.

    So having netconf is good. But devices also must publish all the configuration modules, as the capabilities in the netconf hello messages. I have tried IOS-XR and JunOS. It is not true f.e. for JunOS. But works for IOS-XR.

    The next key thing is the ability to get YANG modules out from the device for declared capabilities (ietf-netconf-monitoring RFC6022). it works well for both JunOS and IOS-XR.

    And the last but not least. The obtained YANG modules must be able to be compiled ;-). F.e. with publicly available pyang, but more than that better with "pyang --ietf". So this not true for both of them IOS-XR for several modules. And for JunOS YANG models gotten out from device.

    So industry is heading further for the bright future and this good for all of us. But clear some marketing hype sometimes also very important.
  11. Are equipment such as Huawei & ZTE also tested?
    Replies
    1. I'm positive someone must be testing them ;)
  12. We need much more. NETCONF is not designed for real-time control loops. As usual network devices are so bad compared to real-life needs that we would need a hybrid SDN architecture to have the possibility of correcting what the vendors are not motivated to do. But none of the big vendors support this. They have some fake implementations of programmibility. But when you really want to use them, you recognize it is unfinished, not enough fast, not scalable, etc. In safety critical network it is still better to stick to TDM (PDH/SDH) and wait for some reasonable solutions to come out. New generation of SD-WAN is a step into the right direction, but still a lot to do...
  13. I disagree that streaming telemetry/statistics aren't relevant to this hypothetical RFP. In recent years, network devices have become far less responsive as API's are decoupled from OS's, and OS's are decoupled from ASICs. For example, CLI/SNMP counters are no longer real-time but instead refreshed every 5-15 seconds. Meanwhile, SNMP TRAP & Syslog events are lazily dispatched (though their embedded timestamps are largely accurate). This loss of "tactility" can be a real problem for a modern network tuned for millisecond control plane. It's worth pressing your vendor on this topic.
Add comment
Sidebar