Process, Fast and CEF Switching and Packet Punting

I’m probably flogging a fossilized skeleton of a long-dead horse, but it seems I never wrote about this topic before, so here it is (and you might want to read this book for more details).

Process switching is the oldest, simplest and slowest packet forwarding mechanism. Packets received on an interface trigger an interrupt, the interrupt handler identifies the layer-3 protocol based on layer-2 packet headers (example: Ethertype in Ethernet packets) and queues the packets to (user mode) packet forwarding processes (IP Input and IPv6 Input processes in Cisco IOS).

Once the input queue of a packet forwarding process becomes non-empty, the operating system schedules it. When there are no higher-priority processes ready to be run, the operating system performs a context switch to the packet forwarding process.

When the packet forwarding process wakes up, it reads the next entry from its input queue, performs destination address lookup and numerous other functions that might be configured on input and output interfaces (NAT, ACL ...), and sends the packet to the output interface queue.

Not surprisingly, this mechanism is exceedingly slow ... and Cisco IOS is not the only operating system struggling with that – just ask anyone who tried to run high-speed VPN tunnels implemented in Linux user mode processes on SOHO routers.

Interrupt switching (packet forwarding within the interrupt handler) is much faster as it doesn’t involve context switching and potential process preemption. There’s a gotcha, though – if you spend too much time in an interrupt handler, the device becomes non-responsive, starts adding unnecessary latency to forwarded packets, and eventually starts dropping packets due to receive queue overflows (You don’t believe me? Configure debug all on the console interface of a Cisco router).

There’s not much you can do to speed up ACLs (which have to be read sequentially) and NAT is usually not a big deal (assuming the programmers were smart enough to use hash tables). Destination address lookup might be a real problem, more so if you have to do it numerous times (example: destination is a BGP route with BGP next hop based on static route with next hop learnt from OSPF). Welcome to fast switching.

Fast switching is a reactive cache-based IP forwarding mechanism. The address lookup within the interrupt handler uses a cache of destinations to find the IP next hop, outgoing interface, and outbound layer-2 header. If the destination is not found in the fast switching cache, the packet is punted to the IP(v6) Input process, which eventually performs full-blown destination address lookup (including ARP/ND resolution) and stores the results in the fast switching cache.

Fast switching worked great two decades ago (there were even hardware implementations of fast switching) ... until the bad guys started spraying the Internet with vulnerability scans. No caching code works well with miss rates approaching 100% (because every packet is sent to a different destination) and very high cache churn (because nobody designed the cache to have 100.000 or more entries).

When faced with a simple host scanning activity, routers using fast switching in combination with high number of IP routes (read: Internet core routers) experienced severe brownouts because most of the received packets had destination addresses that were not yet in the fast switching cache, and so the packets had to be punted to process switching. Welcome to CEF switching.

CEF switching (or Cisco Express Forwarding) is a proactive, deterministic IP forwarding mechanism. Routing table (RIB) as computed by routing protocols is copied into forwarding table (FIB), where it’s combined with adjacency information (ARP or ND table) to form a deterministic lookup table.

When a router uses CEF switching, there’s (almost) no need to punt packets sent to unknown destinations to IP Input process; if a destination is not in the FIB, it does not exist.

There are still cases where CEF switching cannot do its job. For example, packets sent to IP addresses on directly connected interfaces cannot be sent to destination hosts until the router performs ARP/ND MAC address resolution; these packets have to be sent to the IP Input process.

The directly connected prefixes are thus entered as glean adjacencies in the FIB, and as the router learns MAC address of the target host (through ARP or ND reply), it creates a dynamic host route in the FIB pointing to the adjacency entry for the newly-discovered directly-connected host.

Actually, you wouldn’t want to send too many packets to the IP Input process; it’s better to create the host route in the FIB (pointing to the bit bucket, /dev/null or something equivalent) even before the ARP/ND reply is received to ensure subsequent packets sent to the same destination are dropped, not punted – behavior nicely exploitable by ND exhaustion attack.

It’s pretty obvious that the CEF table must stay current. For example, if the adjacency information is lost (due to ARP/ND aging), the packets sent to that destination are yet again punted to the process switching. No wonder the router periodically refreshes ARP entries to ensure they never expire.

Next time ... hardware switching.

More information

You might want to read these blog posts:

3 comments:

  1. Actually, I'd recommend http://www.amazon.co.uk/Cisco-Express-Forwarding-Nakia-Stringfield/dp/1587052369 it details all the switching types, discusses and details CEF (on a per platform basis) AND talks about new(er) developments such as CSSR.

    ReplyDelete
  2. Talk about blast from the past, check out
    http://www.cisco.com/en/US/prod/collateral/iosswrel/ps6537/ps6554/ps6599/ps6630/qa_C67-726299.html

    ReplyDelete
  3. Bhargav Bhikkaji07 October, 2013 18:29

    Would probably define Control plane as following

    Control plane is a process by which certain states are derived to aid data plane. Most of the protocols (BGP, OSPF...) derive states independent of data plane. Certain protocols like Multicast, ICMP use data plane itself to arrive at states to aid data plane.

    Strictly speaking, Control plane policing (CoPP) could probably be renamed to CPU Policing, as policies both control and data plane.

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.