One of my readers sent me an intriguing challenge based on the following design:
- He has a data center with two core switches (C1 and C2) and two Cisco Nexus edge switches (E1 and E2).
- He’s using static default routing from core to edge switches with HSRP on the edge switches.
- E1 is the active HSRP gateway connected to the primary WAN link.
The following picture shows the simplified network diagram:
All four devices are in the same VLAN, resulting in the following logical connectivity:
He wanted to test the backup WAN link, shut down the primary link without changing the active HSRP gateway, and discovered that the core switches are no longer reachable from the outside world. Changing the HSRP gateway solved the problem. Adding another transit link between E1 and E2, and running a routing protocol on that link instead of on the current VLAN also fixed it.
I had no clue what might have gone wrong, even though the root cause was so obvious in hindsight: ICMP redirects.
C1 and C2 had no idea about the changed routing landscape. When they continued sending the outgoing packets toward E1, E1 sent them ICMP redirects, desperately trying to tell them to send the traffic to E2 instead. There were just a few tiny little problems:
- Linecard hardware cannot send ICMP redirects. All packets that generate a redirect (packets sent out through the same interface) must be forwarded to the main CPU.
- Control Plane Protection – protecting the main CPU – dropped most of those packets.
- IP routers (aka layer-3 switches) ignore ICMP redirects anyway.
Disabling ICMP redirects on the Nexus switches with no ip redirects magically solved the problem.
Considering the impact of this SNAFU, one has to wonder about the Nexus OS default settings:
- ICMP redirects are rarely useful
- Ignoring ICMP redirects on hosts is often considered a “security best practice” – they are almost as good as IPv6 Router Advertisements if you want to snatch someone’s traffic.
- Sending ICMP redirects is a performance killer.
And still, a modern network operating system has an obsolete 40-year-old technology enabled by default (still true on Nexus OS 9.3.8). Mindboggling.
On a tangential note, the current design suffers from traffic trombones: S1 and S2 send outgoing traffic to E1, which forwards it to E2 when the primary WAN link is down. That particular glitch would be easy to fix with anycast gateway or active-active VRRP. The proof is left as an exercise for the reader.
It’s All IETF Fault
Christopher Hart quickly pointed out that an IPv4 router must send ICMP Redirects as mandated by Section 22.214.171.124 of RFC 1812 (Requirements for IP Version 4 Routers).
Routers MUST be able to generate the Redirect for Host message (Code 1)
In the weird world of corporate marketing, other vendors' marketing teams would have a field day (so Christopher) if NX-OS were not an RFC-compliant IPv4 router due to disabled ICMP redirects, even if disabling this behavior by default was a universally good thing.
Cisco IOS initially disabled ICMP redirects on interfaces that had HSRP enabled – Jeroen van Bemmel sent me a link to the relevant page in Cisco IOS in a Nutshell book which was unfortunately last updated over 15 years ago.
It seems that they decided to change that behavior in IOS release 12.1, and added yet another nerd knob in a later IOS release to make it even more complex.
A router SHOULD send a redirect message, subject to rate limiting, whenever it forwards a packet that is not explicitly addressed to itself (i.e., a packet that is not source routed through the router) in which: [..a list of requirement that causes ICMP redirect to be sent…]
Want to know more? You’ll find way too many details in Christopher’s ICMP Redirects - How Data Plane Traffic Can Become Control Plane Traffic blog post.
But Wait, It Gets Worse
I focused on the sending router, but what happens when a router receives an ICMP redirect? RFC 1812 (same section) contains an interesting loophole (emphasis mine):
A router using a routing protocol (other than static routes) MUST NOT consider paths learned from ICMP Redirects when forwarding a packet.
But what happens to the locally-generated control-plane traffic? Could a router with hardware forwarding (aka a switch) listen to ICMP redirects and install them in the operating system forwarding table which is used for control-plane traffic but not in the forwarding hardware? Dmytro Shypovalov claims he’s seen that in real life:
It gets even uglier if the router accepts ICMP redirects (it shouldn’t by default, but I’ve seen some do). Redirect kernel cache entry is not installed in RIB or FIB, so local and transit traffic to the same destination take different paths.
One has to wonder: how crazy can it get?
- Added feedback by Christopher Hart, Darrell Root, Dmytro Shypovalov, and Jeroen van Bemmel.
Because nothing ever dies in IETF world ↩︎