Rant: Cisco ACI Complexity
A while ago Antti Leimio wrote a long twitter thread describing his frustrations with Cisco ACI object model. I asked him for permission to repost the whole thread as those things tend to get lost, and he graciously allowed me to do it, so here we go.
I took a 5 days Cisco DCACI course. This is all new to me. I’m confused. Who is ACI for? Capabilities and completeness of features is fantastic but how to manage this complex system?
Everything is based on objects. I thought Junos is policy heavy but this is ultimate. There’s no proper tools to create and manage all these objects and policies. Manually through GUI it seems impossible even at small scale. So you’d need external automation tools and inventory.
Object names can’t be changed after creation. Do things right for the first time or do it several times by trial and error. Logical structure and consistency is hard.
APIC GUI is overwhelming. Config hierarchy is very deep and hard to navigate. You can’t list or find all objects at once but you have to pick every one in different config hierarchy.
Leaf access interface configuration blocks for example. Interface policy has 20 drop down menus to define used policies. Simple access port configuration takes about 10 different policy definitions and glueing them together.
L3 interfaces are also complex to manage. Like OSPF configuration which is distributed to multiple config hierarchies.
Every GUI config page has tens of config options. You have to check what is it and do I need to set it. Very complex and time consuming to operate. Most options are best to just ignore in the first round.
That’s just basic connectivity at switchport level. Along the vlan pools, physical domains, attachable entity profiles, bridgedomains, VRFs, L3outs you need contracts between endpoints to let traffic flow. You can skip this and allow all traffic but then you lose lot of ACI.
Verification and troubleshooting is still relying on CLI. GUI has lot of visibility but finding simple things like what is configured and what is the protocol status is frustrating via GUI. Lower level network verification ends up to logging device CLI and running show commands.
I hate NX-OS syntax. It’s combination of Linux and IOS but worse combination than each one alone. Even industry standard “sh” command is not working without writing it completely. Argh…
Overall ACI was impressive with its comprehensive features and capabilities. But operations using GUI are frustrating and almost impossible to handle. You need huge amount of config structure and feature understanding and planning. Hard to see it going right first time.
That’s why you want to use external single source of truth where you can create and manipulate objects and push new configuration to APIC.
Also you may want to standardize and simplify your connectivity and services before putting it all in ACI. Which is only a good thing.
This is what you will end up by teaching networking to coders.
Unless they get this basic point that they need to create some abstraction layer for interaction to simplify the consumption and breaking everything down to smallest of objects for programmability, flexibility and extensibility in automation/coding terms doesn't mean everyone should go down to that level when putting this to life.
Though network engineers learning automation are no different either unless they are doing it for full time to taste the real flavors of it
and at some point I just hope that people will stop believing in those both sets of unicorns and rather invest into simple Operational model which would bring them together and make it work while keeping it simple.
ACI took a departure from the classic CLI-driven loosely-coupled network of boxes that's been predominant in most networks for the past 20 years. Instead, ACI is architected as a declarative controller front-ending a number of switches. You tell the controller what you want, and it ensures your desired configuration is rendered on the appropriate switches.
If the complexity of configuring and operating a classic loosely-coupled network of boxes is expressed as O(n*m) where n represents the number of features and m the number of boxes, then I would say ACI is expressed as O(n). It removes the number of boxes variables from the equation.
To do that, ACI does model everything (VRFs, subnets, interfaces, etc.) as an object and creates relations between objects. Why does this matter? With this model, you can configure 1000 interfaces across 50 switches with just two objects. Imagine you want all interfaces to have LLDP enabled tx/rx, CDP disabled, and accept endpoints in VLANs 100-200: you bind the VLAN pool to the interface selector, and you bind the interface selector to a switch selector. Done. OK - there are a few intermediate objects in the picture (physical domain, interface policy group and AAEP) but at the end of the day the philosophy is to build reusable building blocks you can apply to an arbitrary large number of objects. This lends itself quite well to pattern-based data center network architectures. One of ACI's strength is that everything (all CRUD operations) uses a well-documented REST API. The CLI on APIC (ACI's controller) is nothing but a client of that REST API. The API allowed us to build modules for popular configuration management and infrastructure provisioning tools such as Ansible and Terraform in record times. We can also easily document objects that have been modified or deprecated between releases. We published a Python SDK that auto-generates code based on JSON payloads exchanged between the web client and APIC itself.
The concept of tenants coupled with the REST API tremendously simplifies A/B testing, versioning (it's very simple to roll back to a functioning version of the configuration) and role-based access-control (RBAC).
But I feel this isn't immediately germane to the actual topic. The OP appears a bit torn between not being able to do things like he's always done them. For instance he writs "Do things right for the first time or do it several times by trial and error. Logical structure and consistency is hard". I don't know if I agree that logical structure and consistency are "hard". I would say they are critical in building systems that scale while remaining simple. Certainly not something that is unique to ACI. On the same topic, we read "You need huge amount of config structure and feature understanding and planning". Isn't that applicable to most fields in engineering, at least if you expect a positive or predictible outcome to a project?
Another comment "Most options are best to just ignore in the first round" caught my attention ==> precisely. Particularly in recent versions, the intention is for the GUI to present default values that work for most people. I think the OP actually gets the philosophy of ACI. At the end of his article, he writes "you may want to standardize and simplify your connectivity and services before putting it all in ACI. Which is only a good thing."
ACI can come across as kick in the anthill that is traditional CLI-driven loosely-couple networking of boxes. However, ACI's intent is not to alienate classic networkers. The GUI is constantly improved and simplified. There is a NxOS-like CLI if you really want to use the CLI. More day-2 capabilities are built either directly in APIC or in adjacent platforms.
This being written, you will probably reap the most benefits through automation. And that requires proper planning, and the adoption of a new skillset. I have seen CLI-ninjas converted to Ansible and Terraform that are now handling infrastructure as code. And they're not going back.
Steve Mullaney (Aviatrix, formerly Nicira and even at Cisco) recently gave an interview titled "There’s No Going Back To ‘Cisco Model Of The 90s’": https://www.crn.com/slide-shows/networking/aviatrix-s-steve-mullaney-there-s-no-going-back-to-cisco-model-of-the-90s-
From page 4: "No one’s going to go back to the Cisco, horribly complex operational model of the 90s. In the old days, if you were a business unit or a developer and you went to your IT team, the answer before you could even ask what you wanted was “no.” Or maybe the next question from the IT staff would be: “What year would you like that?” That forced Shadow IT [and] people went to the cloud. That’s all now being brought back in [and] enterprises don’t want to go back to that model. Now, we have to have that DevOps mentality where we have to be able to say yes -- we have to adopt the cloud model of simplicity and automation."
It's all glittery when people talk about how simple it is to configure 1000 port with aci as opposed to the old way, but what we forget to answer is how quick would someone will be able to troubleshoot a simple issue in the two worlds?
Imagine someone calling you and asking we need to check if port A is learning the MAC on Vlan X and it has an ARP learnt and can you confirm that this IP has route within ACI to the outer L3 domain, these questions don't have a straight forward answer, at most people try a mix of CLI and GUI when troubleshooting in ACI
What Cisco forgot (or dumped) was that it had built a technical work force to support its produts when they were installed on customer premises, and to an extent cisco TAC support was only opted for breakdowns, but with ACI cisco ignoring the well established skill sets and pushing in favor of programmability (imagine you been asked to run mo queries to get Vlan id's or other stuff to get simple things when you are troubleshooting a live issue) means that cisco wants its customers to sign contracts with them and I am pretty sure cisco will be seeing increased support tickets for simple things now where everything is ACI.
Is software defined networking a new thing? Improvements in hardware Capabilities paved the way to move away from hardware specific for packet switching,
Cisco orchestrator a failed attempt to implement GUI based implementation? It's not bad to learn something new, but at the same time trying to re invent the wheels may not be the best solution, or at least give both options for the user