How Do I Start Automating Network Device Configurations in an Existing Network?
I get a “how do I get started with network automation” question every other week, and when I wrote a lengthy reply to one about configuration templating of existing snowflake network on networktocode Slack channel I decided it’s time to turn my replies into a blog post.
Go for easy wins. Periodically store configurations into a source control repository. Use RANCID, Oxydized, or something as simple as my Configuration-to-Git Ansible playbooks.
Start small. Abstract common variables in a data model, and use templates to build simple things (NTP servers, syslog servers, DNS servers, VTY lines…).
Check the proposed changes. Use Ansible --check-mode to identify the changes your templates would make to the network devices before deploying them. Collect those changes into a change report, get it approved, and then re-run the same playbook without check mode.
It’s a bit tricky to collect those changes when running Ansible in check mode until you figure out how check_mode parameter works (hat tip to David Barroso and his awesome NAPALM presentation). Here’s an example till I find time to write a proper blog post.
Start compliance reports. Checking your templated configurations against actual device configurations is a great way to ensure nothing bad happened to the device configurations.
Grow one configuration object at a time. After fixing the common configuration snippets, continue with more challenging concepts like routing protocols or VLANs. Yet again, you might find my MPLS deployment or VLAN services playbooks useful. They’re both pretty complex – I spent hours explaining the VLAN services solution in the Building Network Automation Solutions online course.
Add the snowflakes. After a while, when you manage most things with Ansible, use the brownfield trick from David Barroso to include device-specific configurations (source code on Github, videos are part of the Ansible for Networking Engineers webinar).
That should bring you to the stage where you control the whole configuration with an automation script, but have unstructured per-device exceptions. Next step: figure out what those exceptions are, why you made them in the first place, and abstract the snowflakes (per-user, per-service, per-site, per-whatever). I wrote about that challenge almost exactly a year ago.
Finally – if you’d like to get a head start, consider attending a training like my Building Network Automation Solutions course.
This blog post was initially sent to the subscribers of my SDN and Network Automation mailing list. Subscribe here.
The crucial thing is: we automate typical tasks from the pre-defined task repository (task is related to adding / removing part of a system).
But I guess the IT is somewhat different. If you could share some examples from the IT world where automation really helps. I would like to understand broader context as the automation (I guess) is just a part of the change we want to do in the system. Someone needs to decide what to do, someone needs to accept the change and finally the automation is used.
I ask the question because (for me) the final part of of this process is not as important as the decision about change, risk analysis, mitigation plan, etc.
That's why I wonder how the automation itself improves the overall process - as I do not understand the IT processes, the automation is a buzzword for me. From system development perspective automation is fine as the task we automate are well-define and carefully prepared in advance. The automation 'gain' comes from repeatibility of the same task which was carefully thought earlier.
Is what I wrote really stupid?
I started looking into your webminars (as I am a subscriber) maybe you can suggest one where this topic is detailed. I want to see real life IT use case where the automation is worth spending time on. The statement like: "Do you still configure VLAN using CLI?" does not convince me because for me more time is spent on deciding if the VLAN is really needed... For me automation was always related to the economy - the balance between effort spent on automation and saved time...
As for "time spent" - you won't cut down on the design/impact analysis time, unless you go for orchestrated infrastructure where users can request VLANs on demand with no additional justification/analysis. However, even if you automate non-self-service processes, you get consistency, and (sometime) deployment in real time once you manage to persuade everyone involved that the risk of outage is low enough.
The "are you still configuring VLANs" question is there to trigger a discussion (see, mission accomplished) and prompt people to start considering why they're still doing something so boring and fundamental using a manual process.
On the other hand is difficult to understand something without knowing the problem you are trying to solve.
What I wanted to say - I am afraid that the 'automation' is a buzzword for me (sorry for that). Automation is not something you need to be talked to do. Automation is rather a need (the problem is first, before the solution). In 90ties I automated some log collecting / config changing on 700+ routers using TeraTerm (funnyY). Because the need of doing it was clear. Also in my current job we automate a lot of system provisioning tasks both for avoiding mistakes (here we agree) and for repeatable task (repeatable).
So I am really interested how it looks like in the pure IT field.
PS. I am the guy you wrote to during the "ipspace content leak affair" (legal subscriber and your fan)