Why is Network Automation So Hard?
This blog post was initially sent to the subscribers of my SDN and Network Automation mailing list. Subscribe here.
Every now and then someone asks me “Why are we making so little progress on network automation? Why does it seem so hard?”
There are some obvious reasons:
- Tightly-coupled components and humongous blast radius;
- Lack of good tools and programming interfaces;
- Lack of transactional consistency (in some cases even simple commits);
However, there’s a bigger elephant in the room: every network is a unique snowflake.
You can buy dozens of network management products, download numerous open source tools, and yet you won’t be a single step closer to offering service-level abstraction of your network to your users because it’s impossible to develop a tool that will cater to the idiosyncrasies of every single network designed by an engineer with MacGyver mentality (because the needs of his company couldn’t possibly be identical to the needs of hundreds of similar businesses around him). It’s thus impossible to develop a simple network automation tool (similar to vCenter or System Center) that would cater to the needs of mid-range market.
The best you could do today is to go down SAP route: develop a highly customizable tool (example: Cisco NSO) and deploy an army of consultants that will customize the tool to the specific needs of the target network – a fantastic undertaking if you happen to be the consultant, a pretty good fit for a service provider looking to fully automate their services, but not exactly what a reasonably-sized organization that needs a network to support its business might be looking for.
Alternatively, you can build your own solution with low-level tools like Ansible, and integrate it with an off-the-shelf or custom-built orchestration system. Should you wish to do gown this route, you might find the Building Network Automation Solutions course highly useful.
Or you could give up, say “automation is not for me” and keep doing random mistakes because you’re sick-and-tired of menial work. Before deciding giving up is the right thing to do, please read this.
Unfortunately, things won’t change for the better until we give up the “car parts” mentality and start deploying cookie-cutter networks based on a standard design. Products like Cisco’s ACI fabric are definitely the step in the right direction… until the reality intervenes and clutters a clean design with legacy integration options.
What about the economic?
I have been working in the system development engineering for 13+ years and we have always used automation. Why? Because we wanted to save time & effort for repeatable tasks.
For this reason we see automation in other non-IT businesses.
I do not think the lack of API is any barrier. We do use CLI a lot.
Why the pure economic does not make IT automate things in those places where we can save money?
I am against religious attitude. I am talking about money.
What's your thoughts about Cisco's ACI and SDN?
Two different beasts but it seems that there will be synergy between them "on the horizon".
However, taking a more relaxed picture on SDN as being "something that provides a layer of abstraction useful to consumers of network services" (which is as useful as Cloud), ACI is definitely an SDN product.
What I means was Cisco's take on Software Defined Access. :)
Who cares about SDN..Marketing..
Networks are all Snowflakes..I hear that pervasively..that is a cop out..and limits our ability to get stuff done..
In Soft Products..a good Application does not care about the Snowflakes because of the Abstractions built into such tool..
My 2¢...
best infrastructure wins baby!
However, not automating your infrastructure (or at least application deployment) inherently makes you way slower and prone to catastrophic failures, so making sure you automate repetitive stuff is also a way of making your business adapt faster (assuming you rely on IT in some significant way to get your business done).
We started our automation journey two and a half years ago.
We have 6 datacenters, each datacenter was built by different engineers with a different approach to work.
We have 6 different network vendors.
As you all mentions, it was very very hard, we didn't knew where to start.
It took us 3 months of research before we even started to write the first line of code.
Even after the first line of code, we went back and forth so many times.
And this is how we did it:
We took two approaches:
1. General tasks.
2. Full enforce - for ToR switches, ToR switches behave almost the same across all datacenters.
General automated tasks - We started to use Ansible to align all the configuration we could, MGMT ACLs, SNMP, Logging and more.
Full enforce - We started with Ansible inventory in a hierarchy as follow:
Top level - The common information across all datacenters, such as domain-names, snmp community and more.
DC level - The datacenter specific information, such as DNS server, NTP server and more.
Rack level - The rack information which is the interfaces configuration of the rack.
We created templates for each vendor using all this information, comparing and pushing the configuration using NAPALM.
Now, most of our network devices are fully enforced from Ansible using NAPALM.
On the old devices and devices which are not similar to any other, we're managing only what we can while working on the alignment of the devices.
It was not easy, but I can say it's worth it!
Hop it helps...