If You Worry About 768K Day, You’re Probably Doing Something Wrong

A few years ago we “celebrated” 512K day - the size of the full Internet routing table exceeded 512K (for whatever value of K ;) prefixes, overflowing TCAMs in some IP routers and resulting in interesting brownouts.

We’re close to exceeding 768K mark and the beware 768K day blog posts have already started appearing. While you (RFC 2119) SHOULD check the size of your forwarding table and the maximum capabilities of your hardware, the more important question should be “Why do I need 768K forwarding entries if I’m not a Tier-1 provider

I wrote about this topic a long while ago and again around 512K day, and David Barroso proved you don’t need more than ~100K entries to cover 99.99% of the traffic of a very large content provider (and even less if you’re peering at an IXP), but it seems people still don’t want to grasp the details, so let’s try again:

  • You need the full Internet routing table if you’re within the Default Free Zone (DFZ);
  • You have to be in the DFZ if you’re not getting transit from an upstream provider, otherwise it’s easier to use a default route toward your upstream.
  • The only exception to the previous bullet is dealing with multiple upstream providers that don’t want to talk to each other and play de-peering chicken games (see also: Cogent). If that’s the case, switch the providers.

A very large majority of autonomous systems thus don’t need the full Internet routing table, but that doesn’t mean that you should blindly follow a default route toward your upstream provider. As the very minimum you should:

  • Accept prefixes for upstream providers’ customer directly from the upstream providers (hint: check whether the AS path contains two or three distinct AS numbers);
  • Use default routes that rely on reachability of a third-party prefix (hint: 8.8.8.8 or 1.1.1.1 if you don’t trust Google);

Alternatively, you could decide to use a default route toward your primary upstream provider, and install only the prefixes received from other upstreams or peering partners into your forwarding table.

Your device must support filters between BGP table, IP table and forwarding table for this approach to work. Most decent networking operating systems have such functionality.

Finally, you could use the approach David Barroso pioneered (and a vendor or two later promoted as the best thing they ever figured out):

  • Use full Internet routing table together with NetFlow data to figure out which autonomous systems matter to you.
  • Install those prefixes (plus potential exceptions) into the forwarding table and use the default route(s) for everything else.

Of course it’s easier to request full routing table from all upstreams, yammer when the BGP convergence takes forever (because the $0.99 CPU in your ancient switch has a hard time dealing with the changes), or blame the vendor for the too-small TCAM size.

Alternatively, you could decide to get the job done, in which case you might want to listen to Software Gone Wild episodes SDN Router @ Spotify and SDN Internet Router Is in Production, and watch the Forwarding Optimizations part of SDN Use Cases webinar.

3 comments:

  1. This comment has been removed by the author.
  2. Do you mean a default route with a next hop of a public DNS server prefix? Is that good enough? Or would relying on IP SLA for a default route be better? Is there even a better way to test that we have "internet reachability"?
  3. Here is an interesting link: https://labs.ripe.net/Members/emileaben/768k-day-will-it-happen-did-it-happen
    They also discuss the meaning of "k".
Add comment
Sidebar