Repost: What's Wrong with Network Automation

Responding to my Infrastructure as Code Sounds Scary blog post, Deepak Arora posted an interesting (and unfortunately way too accurate) list of challenges you might encounter when trying to introduce network automation in an enterprise environment.

He graciously allowed me to repost his thoughts on my blog.

Why don’t we agree on that :

  1. Automation never really took off well despite all efforts from vendors and communities side. My best guess is even today close to 80% of network infrastructure is deployed and managed in a manual fashion. The ratios might be bit here and there and I can imagine North America as region would be ahead of everyone else.

  2. Most automation engineers I have interviewed in the past couple of years had absolutely no idea about how to sell it to management for seeking investment. At best they will often be found throwing another set of marketing slides at them.

  3. Let’s admit that when we ask people what network automation really means and can do, 90% of people will just spell out “Python, DevOps, Ansible and Terraform”.

  4. How many engineers in the network automation community treat is as an architecture in itself rather than as a very tactical thingy?

Automation is a great tool like many others available to the network engineering community, but most of them are still looking at it as if it is a silver bullet, and being very tactical in their approach.

I was trying to address #4 (and a bit of #2) in automation webinars and the network automation online course, but according to Deepak’s feedback it looks I wasn’t making much of an impact.


  1. i think there is a more fundamental reason than the (in my opinion simplistic) lack of skills argument. As someone mentioned on twitter

    "Rules make it harder to enact change. Automation is essentially a set of rules."

    We underestimated the fact that infrastructure is a value differentiator for many and that customization and rapid change don't go hand in hand with automation.

    1. > We underestimated the fact that infrastructure is a value differentiator for many and that customization and rapid change don't go hand in hand with automation.

      Which, if sold right, can be a good thing. I've called this "inflexible flexibility". We can do absolutely anything, but it'll take some effort to add it to the data model and tool chain that generates the configurations. After that, it'll be 100% supported and won't ever be something that only the one engineer that implemented the one instance knows about, catching someone else off guard at 2am when it breaks. It means no more one-ofs, and that changes are inherently documented at the very least by merit of being part of a codebase with a commit that can be tracked down.

      I have sometimes gotten some pushback on how this is inherently less nimble since it takes longer than just throwing a manual config at a problem, but so far haven't had any issues with the counterargument that we don't want untested configuration changes, and that developing a new feature for the automation process doesn't take much longer than properly testing a new configuration in a lab before rolling it out. The additional time is easily made up by the increased reliability of the change being executed a second time (where I mean reliability in the sense of "the customer expected the change to work the first time"). Of course, this partly depends on how your exact automation process. Ours is amenable to fairly rapid changes, because our campus network is quite dynamic.

      Yes, having to run everything through an automation pipeline is less rapid than just ssh'ing into a box and typing away. But it doesn't have to be much less rapid if your staff is skilled, and raising the bar to entry for a change can often mean an opportunity for evaluation on whether the work should be done, which is a very different question than whether it can be done.

    2. If the problem is well known you can apply rules to it (automation). The problem with networking is that it results in a huge number of cases that are not known in advance. And i don't mean only the stuff you add/remove to fix operational problems. A friend in one of the biggest private clouds was saying that more than 50% of transport services are customized (a static route here, a pbr there etc) or require customization during their lifecycle (e.g. add/remove a knob). Telcos are "worse" and for good reasons.

      It is no coincidence that one of the coolest features claimed by network provisioning tools is the ability to reconcile with the ground truth.

      "more automation", "fix your pipelines", "fix your culture", "FAANGs do it" is the usual reaction but i doubt it takes people 10 years (or more since this conversation started) to learn to write yaml templates and playbooks :)

Add comment