Single-Image Systems or Automated Fabrics
In the Network Automation 101 webinar and Building Network Automation Solutions online course I described one of the biggest challenges the networking engineers are facing today: moving from thinking about boxes and configuring individual devices to thinking about infrastructure and services, and changing data models which result in changed device configurations.
The $1B question is obviously: and how do we get from here to there?
My ideas were heavily influenced by the Our Traditional Interop Breakfast discussions I had with Terry Slattery in times when Interop still found it worthwhile to invest into good speakers, and the fantastic blog post by Jeremy Stretch. Thank you both!
There are two fundamental ways to solve this challenge:
Build or buy a network automation solution that translates high-level data model (marketers prefer to use the word Intent) into device configurations that are then pushed to traditional network devices;
Buy a $vendor solution that pretends a whole network (for example, a data center fabric) is a single device. This trend started with shared-control-plane approaches like stackable switches, Cisco VSS and HP IRF, and was made way more scalable with reasonable architectures like Juniper QFabric, Cisco ACI, or Pluribus Networks Netvisor.
The major difference between the two: how much you can do when the thing breaks down (here’s what Douglas Adams had to say on this topic).
If you’re using a network automation solution (doesn’t matter whether it’s a bunch of Perl scripts, Ansible playbooks, or Cisco NSO), you can still log into individual devices, check their configuration, figure out how to fix stuff, and get the network patched together till a proper fix is implemented. Obviously, this approach only works if:
You thought about big red button that can stop the automation from overwriting your attempts to get the network up and running;
You haven’t fired all your network engineers due to misplaced unwavering belief in Easter Bunny and $vendor promises.
If you decided to go with the single-image solution from a $vendor you’re stuck with a broken network till they get their act together and figure out how to fix it, because in most cases you can’t troubleshoot what’s going on behind the scenes, and have no way of fixing it even if you’d be able to figure out what’s wrong.
At this moment pundits with more opinions than operational experience should start telling me how totally outdated I am and how every industry eventually moves from tinkerers to integrated solutions, citing modern cars as the prime example.
I totally agree with that sentiment, and would be perfectly fine with that approach… as soon as those same pundits could tell me where I can get a rental network when mine sits at a clueless mechanic for a month because the $vendor-approved testing equipment cannot identify what’s wrong with my network. No takers? I thought so.
Of course, you can make a counter-argument that the almighty $Cloud provides all the answers, and that might be a good-enough approach for many smaller organizations, but do keep in mind that you still need a network to get the data in and out of the cloud.
Everyone else has to decide whether they want to be able to mend the network should $vendor introduce a feature that doesn’t work as advertised, or drink awful free coffee and wait patiently whilst the business is on fire. Just remember that there is no silver bullet, no unicorn tears, and rarely some silver lining.
Disagree? Please let me know!
In any case, you need user interface that does transactions on services data model, which combined with infrastructure data model gets translated into device data model(s) which then get translated into device configurations.
As for tools: some people build their front-end stuff, or use Ansible Tower (or AWX) as an approximation, others use all sorts of orchestration tools from vRealize to HP orchestration tool to Cisco NSO (and there are, I'm guessing, I few using Apstra).
The back-end depends on what you need to get done. Ansible, Salt, running Ansible playbooks from vRealize... tons of options.
And yes, a lot of that is in my courses.
"if multivendor:"
Build or buy a network automation solution that translates high-level data model (marketers prefer to use the word Intent) into device configurations that are then pushed to traditional network devices;
"else:"
Buy a $vendor solution that pretends a whole network (for example, a data center fabric) is a single device. This trend started with shared-control-plane approaches like stackable switches, Cisco VSS and HP IRF, and was made way more scalable with reasonable architectures like Juniper QFabric, Cisco ACI, or Pluribus Networks Netvisor.