Living with Small Forwarding Tables

Monday, May 9, 2022 06:46 UTC

Living with Small Forwarding Tables

A friend of mine working for a mid-sized networking vendor sent me an intriguing question:

We have a product using an old ASIC that has 12K forwarding entries, and would like to extend its lifetime. I know you were mentioning some useful tricks, would you happen to remember what they were?

This challenge has no perfect solution, but there are at least three tricks I’ve encountered so far (as always, comments are most welcome):

Conversational learning
Virtual aggregation
Selective route download

Conversational Learning

Conversational learning is what you use when you failed to learn the packet forwarding history lessons:

Build a forwarding table (Forwarding Information Base – FIB) in software
Start with empty hardware FIB, and punt all forwarded packets to the CPU
Whenever a new packet arrives to the CPU, find corresponding forwarding entry in the software FIB and install it in the hardware FIB

Congratulations, you reinvented cache-based forwarding, and you’ll have to deal with cache coherence, cache aging and eviction, and you’ll fail miserably when someone starts scanning the address space. It’s also a bit hard to implement a default route in hardware FIB. The proof is left as an exercise for the reader.

All that obviously doesn’t stop the networking vendors from retrying to reinvent this particular broken wheel whenever their hardware designers mess up (see also: Nexus 7000 F1 linecard), or whenever they have a bit of hardware they’re desperate to sell (see also: SmartSwitches).

Virtual Aggregation

Conversational learning or any other cache-based forwarding might work within a small network where the number of potential destinations is comparable to the hardware FIB size. Trying to use the same trick with the Internet Default Free Zone (DFZ) is a recipe for disaster as Cisco discovered ages ago when their fast switching mechanism caused severe brownouts in large ISPs.

Here’s another idea from the MacGyver & Co:

Imagine an edge router connected to two ISPs that happens to have small FIB and full DFZ BGP table (because whatever crazy reason).

┌────────────┐      ┌────────────┐
│   ISP-A    │      │   ISP-B    │
└────────────┘      └────────────┘
        ▲                ▲
        │                │
        │ ┌────────────┐ │
        └─┤    EDGE    ├─┘
          └────────────┘

Now assume that one of the ISPs is the transit ISP, and use a default route toward it. Bonus points if the default points to 1.1.1.1 or 8.8.8.8 to cope with ISP’s bad hair day¹
Once you have a more-specific and a less-specific prefix pointing to the same next hop in your routing table, you don’t have to install the more-specific prefix in the hardware FIB²

Obviously you’re trading FIB size for convergence time. For example, you cannot use Prefix Independent Convergence. You could also get into a situation where a particular failure scenario explodes the hardware FIB size beyond its capabilities. An example might be the primary ISP losing most of the DFZ BGP table while still announcing prefix toward the IP address you use as the next hop of the default route.

Selective Route Download

DFZ BGP table has almost a million entries, but you could safely ignore most of those prefixes. After all, do you really care about clients in Fiji or Madagascar trying to reach your e-commerce server when you don’t even ship to those countries? It turns out that in most cases a few thousand prefixes cover more than 90% of your Internet traffic. Combine that with the a reliable default route and you’re done. Now for the tricky question: how do you select those prefixes?

You could rely on a good network design. For example, if you’re peering at an Internet Exchange Point (IXP) and use an upstream ISP, the you don’t care about any prefix not advertised from the IXP peers:

Set up a default route pointing to the upstream ISP
Filter out all other prefixes received from the upstream ISP
Set a limit on the number of prefixes accepted from every IXP peer³
Go have a well-deserved beer⁴.

Unfortunately it’s a bit hard to package that idea into a shipping product when you know your customers will try to misuse your software in every imaginable way (and a gazillion others). That’s when it’s time to fall back to traffic monitoring:

Using the forwarding table, and whatever traffic monitoring technology you have available, identify the “hot” prefixes
Create a prefix list and use it as a filter between routing table and hardware forwarding table.
Periodically update the prefix list to cope with shifts in traffic patterns.

The selective route download⁵ functionality is available in (at least) Arista EOS, Junos, and Cisco IOS XE. If your favorite box supports it, please leave a comment.

For more details:

Listen to the SDN Router @ Spotify chat with David Barroso, and follow the related links.
In a follow-up episode, David described the operational experience (spoiler : it turned out in most cases they didn’t have a problem at all).
I also covered the idea in the SDN Use Cases webinar.

I wrote a ton of blog posts dealing with similar scenarios ages ago. Search for BGP blog posts written between 2006 and 2010. ↩︎
With a few caveats left for the reader to figure out. You could cheat and use RFC 6769 as an inspiration. ↩︎
You don’t want them to dump the whole DFZ BGP table into your lap due to a fat-finger incident. ↩︎
Or another beverage of your choice. You can even make it a non-alcoholic beer. ↩︎
Selective Route Download usually works as a filter between BGP table and routing table (RIB), not between routing table and FIB. If that’s the case on your platform, you can only use it for BGP routes. ↩︎

Conversational Learning

Virtual Aggregation

Selective Route Download

Recent posts in the same categories

IP routing

switching

5 comments: