Directed ARP and ICMP Redirects

One of my readers sent me this question:

When I did my ***redacted*** I encountered a question about Directed ARP. The RFC (https://tools.ietf.org/html/rfc1433) is in the "experimental" stage, and I found it really weird from ***** to include such a hidden gem in the ***redacted***.

Directed ARP is clearly one of those weird things that people were trying out in the early days of networking when packet forwarding and bandwidth were still expensive (read the RFC for more details), but I kept wondering “what exactly is going on when a host receives an ICMP redirect?” Time for a hands-on test.

Setup

I built a very simple VIRL topology with a router and two servers connected to the same segment (you’ll find the VIRL topology file and setup instructions in my VIRL Github repository).

The relevant part of the router configuration is here:

interface GigabitEthernet0/1
 description to L2
 ip address 10.0.1.1 255.255.255.0 secondary
 ip address 10.0.0.1 255.255.255.0

Getting the redirects

In theory, the router should send an ICMP redirect whenever two servers that try to communicate could communicate directly. In practice it doesn’t. It looks like Cisco IOS behavior changed recently (make that “in the last 10 years or so”) as I remember seeing ICMP redirects using a similar setup sometime in previous millennium.

Time for a dirty trick: define a /16 subnet with two IP addresses on the router and keep the /24 subnets on the hosts. The router will think both hosts are in the same subnet and the two hosts will think they have to communicate across the router.

interface GigabitEthernet0/1
 description to L2
 ip address 10.0.1.1 255.255.0.0 secondary
 ip address 10.0.0.1 255.255.0.0

Bingo! debug ip icmp displays ICMP redirects being sent to the hosts:

ICMP: redirect sent to 10.0.0.5 for dest 10.0.1.5, use gw 10.0.1.5
ICMP: redirect sent to 10.0.1.5 for dest 10.0.0.5, use gw 10.0.0.5

Lesson learned: Cisco IOS will send ICMP redirects only when the source and destination IP addresses are in the same subnet.

You might expect to see the redirect entries on the Linux servers. Guess what – you won’t:

cisco@S1:~$ ip route
default via 10.255.0.1 dev eth0
10.0.0.0/24 dev eth1  proto kernel  scope link  src 10.0.0.5
10.0.0.0/8 via 10.0.0.1 dev eth1
10.255.0.0/16 dev eth0  proto kernel  scope link  src 10.255.0.64

You have to know that there should be a redirect entry in the cache, and once you know that, you can display it:

cisco@S1:~$ ip route get 10.0.1.5
10.0.1.5 via 10.0.1.5 dev eth1  src 10.0.0.5
    cache <redirected>

Makes troubleshooting extra simple, right? Just FYI: here’s another epic struggle with the redirect cache.

2016-06-17: Mathieu Millet suggested using ip route show cache command to display the whole redirect cache. Unfortunately that command doesn't work on the Ubuntu 14.04.2 LTS included with VIRL.

ARPing around

OK, we got the ICMP redirect into the Linux IPv4 route cache. Now let’s see what happens when the two servers try to communicate directly.

Step#1: Start tcpdump on S2 to capture the ARP entries

cisco@S2:~$ sudo tcpdump -n -i eth1 arp &

VIRL uses eth0 on Linux servers to communicate with the outside world, and tcpdump uses the lowest-numbered interface by default, so I had to specify the interface name.

Step#2: Clear the ARP cache

cisco@S2:~$ sudo ip neighbor flush all

Step#3: Ping S2 from S1 and inspect the tcpdump printout on S2

09:41:27.200279 ARP, Request who-has 10.0.1.5 tell 10.0.0.5, length 28
09:41:27.200314 ARP, Reply 10.0.1.5 is-at fa:16:3e:9f:d0:87, length 28
09:41:27.200552 ARP, Request who-has 10.0.1.1 tell 10.0.1.5, length 28
09:41:27.203598 ARP, Reply 10.0.1.1 is-at fa:16:3e:63:f3:48, length 46

Lesson learned: Linux kernel has no problems ARPing any IP address (even if it’s outside of the interface subnet) as long as the IP routing table or IP cache claims the IP address is directly reachable.

And now for some extra weirdness

S1 created the redirect entry in its route cache while pinging S2, but S2 didn’t even though the router was continuously sending ICMP redirects to S2.

root@S2:~# ping 10.0.0.5
PING 10.0.0.5 (10.0.0.5) 56(84) bytes of data.
From 10.0.0.1: icmp_seq=1 Redirect Host(New nexthop: 10.0.0.1)
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=2.90 ms
From 10.0.0.1: icmp_seq=2 Redirect Host(New nexthop: 10.0.0.1)
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=5.13 ms
From 10.0.0.1: icmp_seq=3 Redirect Host(New nexthop: 10.0.0.1)
64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=10.5 ms
From 10.0.0.1: icmp_seq=4 Redirect Host(New nexthop: 10.0.0.1)
64 bytes from 10.0.0.1: icmp_seq=4 ttl=64 time=7.48 ms
From 10.0.0.1: icmp_seq=5 Redirect Host(New nexthop: 10.0.0.1)
64 bytes from 10.0.0.1: icmp_seq=5 ttl=64 time=11.2 ms
From 10.0.0.1: icmp_seq=6 Redirect Host(New nexthop: 10.0.0.1)

2016-06-17: Jim Hand sent me a nice email explaining the above problem - the ICMP redirect is coming from wrong source IP address. More in another blog post.

It’s also interesting that ping claimed the next hop in the ICMP redirect is 10.0.0.1 – I even ran tcmpdump to verify Cisco IOS sent the correct next hop in ICMP redirect:

IP (tos 0x0, ttl 64, id 24602, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.1.5 > 10.0.0.5: ICMP echo request, id 1520, seq 1, length 64
IP (tos 0x0, ttl 255, id 200, offset 0, flags [none], proto ICMP (1), length 56)
    10.0.0.1 > 10.0.1.5: ICMP redirect 10.0.0.5 to host 10.0.0.5, length 36
IP (tos 0x0, ttl 63, id 24602, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.1.5 > 10.0.0.5: ICMP echo request, id 1520, seq 1, length 64
IP (tos 0x0, ttl 64, id 25789, offset 0, flags [none], proto ICMP (1), length 84)
    10.0.0.5 > 10.0.1.5: ICMP echo reply, id 1520, seq 1, length 64

2016-06-17: This seems to be a bug in ping utility. See comment by Anonymous below.

Takeaway ;)

While you can observe a lot by just watching (or googling), you’ll definitely learn more by getting your hands dirty.

11 comments:

  1. ICMP Redirects are something that cause packets to be punted to CPU on the Brocade MLXe platform, and in almost every situation should be globally disabled. Brocade has interpreted the RFC to indicate that if a packets arrives and leaves the same physical interface a redirect should be sent, ignoring if they are different vlans, ve's, logical interfaces, etc. It's not something needed or wanted in modern networks. (http://puck.nether.net/pipermail/foundry-nsp/2006-December/000784.html)
    Replies
    1. "ICMP redirects cause packets to be punted to CPU" << That's true for every platform I know. It's really expensive to build hardware that would be able to send ICMP redirect replies.
  2. About the ip route command on Linux, the default behaviour is to show the main routing table.
    you can view other table with ip route show table xxxx (example : xxxx = default) or ip route show cache ...
    Replies
    1. More here: http://linux-ip.net/html/tools-ip-route.html
    2. Thank you... and this is how I'll slowly build my Linux skills ;))
  3. Sure looks like there is a bug in iputils/ping in ubuntu trusty:

    http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/trusty/iputils/trusty-updates/view/head:/ping.c#L1237

    Seems to me like, once pr_addr is called, it always returns the same buffer contents, so if it's called to render the source address of the ICMP packet, then when it's called to render the icp->un.gateway address, you get back the source address of the ICMP packet again

    --buck
    Replies
    1. Thank you! Any ideas what the "now you listen to redirects, now you don't" behavior might be caused by?
    2. There's mention of the behavior in

      http://www.cymru.com/gillsr/documents/icmp-redirects-are-bad.pdf

      which references Stevens, TCP/IP Illustrated Volume 2, which explains the behavior in 22.11 in the subsection "Redirects and Raw Sockets", but that's in reference to a BSD stack. Given that was the working standard, however, Linux probably exhibits compatible behavior
  4. Great post.

    I recall using this in the late 90s as the poor mans HSRP.
    Two routers on on same user subnet(can run rip etc).

    Setting your PCS def GW IP to itself(arps on everything)

    PCs will arp and get a response from the routers and use that one to forward traffic off subnet. If the router the PC is using has it off subnet destinations links down the router send icmp redirect pc to go to other router for the off subnet destination.

    I remember early day clients with 2 routers on same subnet and their primary off subnet destination would be on one and another off subnet service on the other router and in the traces you would see all the redirect traffic for each respective session redirected between routers, extra traffic and usually fixed with FHRP or destination parity on both routers. but this is a good case of icmp protocol doing its job.
  5. "Cisco IOS will send ICMP redirects only when the source and destination IP addresses are in the same subnet."

    I think that's the correct behavior, isn't it?

    RFC1009(A.2):

    A gateway will generate an ICMP Redirect if and only if the
    destination IP address is reachable from the gateway (as
    determined by the routing algorithm) and the next-hop gateway is
    on the same (sub-)network as the source host.

    RFC1122 (3.2.2.2):

    A Redirect message SHOULD be silently discarded if the new
    gateway address it specifies is not on the same connected
    (sub-) net through which the Redirect arrived

    This is an interesting corner. RFC1122 also requires hosts to discard redirects sourced from something other than the default router. In this case (interface with secondary addresses configured), how is the router able to determine which IP address should be stamped on the redirect source IP field? HSRP/VRRP handle this by keeping track of which MAC address the objectionalbe (mis-routed) frame was destined for. How does it work in this case?

    Some digging I once did in this area: http://www.fragmentationneeded.net/2011/06/to-redirect-or-not-to-redirect-that-is.html
    Replies
    1. "How does it work in this case?"

      Oops. I commented before I finished reading. I'm guessing it *didn't* work, and that's why S2 on the 'secondary' subnet refused to cache the redirect: It was sourced from something other than the default gateway.
Add comment
Sidebar