Whenever someone asks me about LISP, I answer, “it’s a nice idea, but cache-based forwarding never worked well.” Oldtimers familiar with the spectacular failures of fast switching and various incarnations of flow switching usually need no further explanation. Unfortunately, that lore is quickly dying out, so let’s start with the fundamentals: how does packet forwarding work?
Packet forwarding used by bridges and routers (or Layer-2/3 switches if you believe in marketing terminology) is just a particular case of statistical multiplexing – a mechanism where many communication streams share the network resources by slicing the data into packets that are sent across the network. The packets are usually forwarded independently; every one of them must contain enough information to be propagated by each intermediate device it encounters on its way across the network.
Packet forwarding in an intermediate device is always composed of these steps:
- Receive a packet;
- Extract forwarding information from the received packet;
- Find the next hop or outgoing interface from a lookup table;
- Enqueue the forwarded packet into an output queue where it waits to be sent to the next device.
The lookup step is often the most complex part of packet processing:
- Exact matches of a small set of values (VLANs, MPLS labels) are straightforward: allocate a large enough table to accommodate all possible values and populate it with forwarding information.
- Exact matches of a large set of values (MAC addresses, IP addresses, IPv4 /32 prefixes…) usually use hash tables. A data-munging function produces a unique small number out of the (too large) value and uses that as an entry into the forwarding table. Sounds easy until you realize that the function could produce the same result for multiple input values. Welcome to the wonderful world of hash collisions and hacks like Cuckoo hashing.
- Lookups of network prefixes (using longest-prefix matching) traditionally used a tree-like data structure (example: trie) that requires several accesses to get the answer, significantly reducing the packet forwarding performance. Alternate solutions include hash tables for prefixes with known length (example: /24 IPv4 prefixes) or Bloom filters (see Longest-Prefix Matching Using Bloom Filters for details).
- More complex lookups use TCAM – an extraordinarily versatile but also power-hungry and expensive bit of hardware – or dirty hacks like “let’s walk down an access control list and compare every line with the header of the packet we’re trying to forward.”
Compared to the awkwardly slow way we were taught access control lists work, TCAM works almost like magic:
- The network operating system stores matching values and corresponding bit masks into TCAM
- When presented with an input value, the best match (considering the match values, bitmasks, and matching priorities) pops up in a single lookup.
While we usually associate TCAM with hardware-based packet forwarding, I’ve seen what seems to be an efficient software implementation of TCAM in Open vSwitch. It was elegant but so complex that I don’t want to look at it again ;)
Ready to deal with even more complexity? How about recursive lookups, from BGP prefixes to BGP next-hops (which can be BGP prefixes themselves, adding another layer of indirection) to IGP next-hops or static routes. Modern hardware can accept most of that complexity through a hierarchical forwarding table1. In the good old days, complex hardware was too expensive, so the routers had to do the recursive walk through the routing table for every packet until someone got a bright idea: what if we would cache the results of the recursive lookup?
Coming up next: cache-based forwarding