Making Cisco ACI REST API Transactional

Thursday, April 18, 2019 18:07 +0200

Making Cisco ACI REST API Transactional

This is a guest blog post by Dave Crown, Lead Data Center Engineer at the State of Delaware. He can be found automating things when he's not in meetings or fighting technical debt.

In a recent blog post, Ivan postulated “You’d execute a REST API call. Any one of those calls might fail. Now what? ... You’ll have absolutely no help from the orchestration system because REST API is not transactional so there’s no rollback.” Well, that depends on the orchestration system in use.

The promise of controller-based solutions (ACI, NSX, etc.) is that your unicorn powered network controller should be an all seeing, all knowing platform managing your network. We all have hopefully learned about the importance of backups very early on our careers. Backup and, more importantly, restore should be table stakes; a fundamental feature of any network device, let alone a networking system managed by a controller imbued with magical powers (if the vendor is to be believed).

If we look at a traditional solution built around Junos, the steps should be pretty simple.

Determine our blast radius for the rollback
SSH in (or use a nifty ansible play) and run rollback 1

This works pretty well due to the way configs work on Junos. We could do the same thing with other platforms, by backing up the config first, attempt a change, determine the rollback blast radius, and then restore it. Under the covers, ACI is doing the same thing as a Junos rollback. The APIC (the controllers for ACI) compares the current state with selected config snapshot, then pushes out the changes to bring the fabric to the requested state. But for an omnipotent, magical controller, this shouldn’t be hard.

This is actually pretty simple to do in ACI with Ansible due to the aci_config_snapshot and aci_config_rollback modules. At the beginning of a play, call aci_config_snapshotwithstate: present` and that will take a backup and, and store it on the controller. Then on failure, it’s as simple as:

Get the config snapshots using aci_config_snapshot with state: query and register the results to a variable
Use set_fact to store your latest snapshot in a second variable
Call aci_config_rollback with your latest snapshot with import_type: replace

Putting an “on any failure do X” in an ansible play isn’t feasible. I put this into a single play that can be executed from the command line on failure. Generally I don’t like to have processes for people to follow that have conditional logic in them, but a condition of “any issue” and a single recovery step is acceptable. If you want to have Ansible Tower run your plays to configure ACI, breaking out the rollback is a necessity so that workflow can run it on any failure condition.

One thing I know I glossed over is how to find your latest snapshot. I did this with a custom filter plugin. Custom Jinja filter plugins are a very powerful tool in your Ansible toolbox. There are things that Ansible is very good at, and other things it is not. One thing it falls short on is handling complex data structures. Custom filters plugins let you drop into Python and bang out code without resorting to complex Jinja statements. This is especially nice when your only other option is an overly complex jquery statement embedded in Jinja. Another upside of using custom Python filters is moving the more “complex” logic into a single place to maintain that logic.

You may be wondering, “Why don’t I worry about the roll back blast radius?” Simple: I start with the assumption that the running state of the fabric is one I found to be acceptable. The network was running well enough before I decided to change it. As ACI is managed as a single logical device from one control point, I don’t need to determine which individual entities need to changed. Plus, we’re talking about a physical data center fabric, where I shouldn’t have a very high volume of configuration changes. The end result is I can treat the entire batch of changes as a single transaction, and rollback the entire batch at once.

I’m not saying Rest API driven platforms are great and are able to provide feature parity with platforms like Junos. I’d like to think The Rolling Stones were right, “You can’t always get what you want, but if you try sometimes, you might find you get what you need.” Understand and work with what your platforms and tools have available, and you can get close enough for government work.

Please note that Cisco ACI does not provide atomic transactions as every API changes the state of the running system, and that you can do rollbacks mentioned by Dave only because ACI REST API implements rollback functionality. That functionality is not present in most other REST API implementations. - Ivan

On a totally different note, if you’d like to start automating your network but don’t know how/where to start, check out our Building Network Automation Solutions online course.

5 comments:

Anonymous 18 April 2019 20:11

I think the difference is that REST as a system (or architecture) itself doesn't have the transaction capability whereas NETCONF for example has it built in. Of course you can rely on the backup and restore functionality of your device (or black box). But that's not the point here.

Albert Siersema 19 April 2019 17:45

> Putting an “on any failure do X” in an ansible play isn’t feasible

I think I understand where you're going with this, after all you say 'not feasible' and emphatically not 'not possible'.
For completeness sake, it's somewhat possible with e.g. 'any_errors_fatal: true', 'serial: 1' combined with 'block' and 'rescue'.
Depending on your device there is the complexity of ordering on which I touched upon in another comment somewhere on Ivan's blog.

Ivan Pepelnjak 19 April 2019 17:47

In ACI case it’s actually a bit simpler as you’re only dealing with a single device unless you’re combining ACI configuration with services deployment.

Albert Siersema 19 April 2019 17:58

Sorry, I should have made it more clear that I meant the comment not specifically for the ACI use case, but rather as a pointer for the more generic Ansible use cases. Kind of how I read the sentence I quoted.

Anyway, good read and insight as per usual.

Albert Siersema 19 April 2019 17:54

> where I shouldn’t have a very high volume of configuration changes

and learn something more from software development workflows, CI/CD, version control: small individual changes. Release smaller changes, more often. Pinpointing the culprit is much easier and rollbacks less painful.

Latest blog posts in CLI versus API series

Recent posts in the same categories

automation

ACI

5 comments: