Worth Reading: ACI Terraform Scalability
Using Terraform to deploy networking elements with an SDN controller that cannot replace the current state of a tenant with the desired state specified in a text file (because nobody ever wants to do that, right) sounds like a great idea… until you try to do it at scale.
Noël Boulene hit interesting scalability limits when trying to provision VLANs on Cisco ACI with Terraform. If you’re thinking about doing something similar, you REALLY SHOULD read his article.
There's one advantage when doing it with Terraform. You know pretty well how your environment is configured (declarative way). So no need to first export configurations out of SDN controller (via API) into an Excel spreadsheet :D
We've had success with working around the constraint in workaround 3 in the article by associating both a "tagged" and "untagged" phys domain to each EPG that requires both tagged and untagged interfaces in the same vlan. The tagged domain maps vlan to EPG on the AEP. This allows us to trunk all of these EPGs/vlans to all tagged interfaces on all ports with no static bindings. We create static bindings just for the dot1p access ports. We try to enforce tagged interfaces on all hosts as much as possible to keep the static bindings to a minimum. Depending on the pruning needs or how granular one wants to get with AEPs, domains, and vlan pools, this may not work for everyone, but it keeps our object creation count low in our automation workflows for EPG and switchport rollouts. Deploying dozens, if not hundreds of EPGs, or switch ports is trivial. We do have scenarios where we opt to prune vlans and use static bindings, but 99% of our switchport add activities require zero static bindings. This is all off of a brownfield environment where we must support the migration of workloads from a legacy environment. Granted we don't use Terraform but that seems more or less irrelevant to the constraint in the workaround.