Dealing with Cisco ACI Quirks

Sebastian described an interesting Cisco ACI quirk they had the privilege of chasing around:

We’ve encountered VM connectivity issues after VM movements from one vPC leaf pair to a different vPC leaf pair with ACI. The issue did not occur immediately (due to ACI’s bounce entries) and only sometimes, which made it very difficult to reproduce synthetically, but due to DRS and a large number of VMs it occurred frequently enough, that it was a serious problem for us.

Here’s what they figured out:

The problem was, that sometimes the COOP database entry (ACI’s separate control plane for MACs and host addresses) was not updated correctly to point to the new leaf pair.

That definitely sounds like a bug, and Erik mentioned in a later comment that it was probably fixed in the meantime. However, the fun part was that things worked for almost 10 minutes after the VM migration:

After the bounce entry on the old leaf pair expired (630 seconds by default), traffic to the VM was mostly blackholed, since remote endpoint learning is disabled on border leafs and always forwarded to the spines underlay IP address for proxying.

A bounce entry seems to be something like MPLS/VPN PIC Edge – the original switch knows where the MAC address has moved to, and redirects the traffic to the new location. Just having that functionality makes me worried – contrary to MPLS/VPN networks where you could have multiple paths to the same prefix (and thus know the backup path in advance), you need a bounce entry for a MAC address only when:

  • The original edge device knows the new switch the moved MAC address is attached to
  • Other fabric members haven’t realized that yet.
  • The interim state persists long enough to be worth the extra effort.
On a tangential note, now I understand why Cisco had to build Network Assurance Engine – a reassuringly expensive software solution that seemed to have one job when we first heard about it during Cisco Live Europe 2018: making sure an ACI fabric works as expected.

Anyway, the organization facing that problem decided to “solve” it by limiting VM migration to a single vPC pair:

In the end we gave up and limited the VM migration domain to a single VPC leaf pair. VMware recommends a maximum number of 64 hosts per cluster anyway.

Having high-availability vSphere clusters and more than two leaf switches, and limiting the HA domain to a single pair of leafs, definitely degrades the resilience of the overall architecture, unless they decided to limit DRS (automatic VM migrations) to a subset of cluster nodes with VM affinity while retaining the benefits of having the high-availability cluster stretched across multiple leaf pairs. It’s sad that one has to go down such paths to avoid vendor bugs caused by too much unnecessary complexity.

Want to Know More About Cisco ACI? Cisco ACI Introduction and Cisco ACI Deep Dive Webinars are waiting for you ;)

1 comments:

  1. ACI could be a lot simpler if it used OpenFlow instead of a tangle of distributed protocols. ;-)

    Replies
    1. OTOH: how many OpenFlow-based production-grade data center fabrics have you seen, and how many of them are still around? The ancients weren't stupid when they decided to go with the distributed protocols.
Add comment
Sidebar