I'm involved in a Nexus 9500 (NX-OS) migration project, and one bug recently caused vPC-connected Catalyst switches to err-disable (STP channel-misconfig) their port-channel members (CSCvg05807), effectively shutting down the network for our campus during what was supposed to be a "non-disruptive" ISSU upgrade.
Weird, right? Wait, there’s more…
Here's the explanation of that behavior as experienced by my frustrated reader:
Apparently, Nexus 9K was using another vendor's OUI for the source MAC address it used to send BPDUs (CSCvd99364). This was changed to Cisco OUI in 7.0(3)I6(1), but with no mention of this change in the release notes, an upgrade (one 9k at a time) would cause one 9k to source BPDUs with the old source MAC address, and the other 9k to source BPDUs with the new source MAC.
Furthermore, Cisco's "solution" has been to change the OUI back to what it was before... likely causing one more outage for anyone who mistakenly upgraded to 7.0(3)I6(1) or 7.0(3)I7(1).
Note: I'm told a special knob has been added so the new or old OUI can be hardcoded for future upgrade resilience, but I think this is a band aid fix.
There are some serious underlying issues here:
- Why is Cisco using a non-Cisco OUI for anything in the first place?
- Initially there was no mention of this change in the 7.0(3)I6(1) or 7.0(3)I7(1) release notes.
- Cisco's response has been slow and quiet while this gets swept under the rug. For instance, the release notes for 7.0(3)I7(2) does not mention a change back to the old OUI, nor does it mention any warning for customers upgrading from 7.0(3)I6(1) or 7.0(3)I7(1).
In all honesty Cisco should defer 7.0(3)I6(1) and 7.0(3)I7(1) but these releases remain very available on CCO.
Anyway, here’s the BPDU Source OUI Cheat Sheet
- Release Prior to 7.0(3)I6(1) - non-Cisco OUI
- 7.0(3)I6(1) and 7.0(3)I7(1) - Cisco OUI
- 7.0(3)I6(2) and 7.0(3)I7(2) - back to non-Cisco OUI
Upgrading between any of these groups will cause the BPDU source MAC to change without warning, causing all L2 connected devices that do proper BPDU checks to err-disable the ports.