Virtual aggregation: a quick fix for FIB/TCAM overflow

Quick summary for the differently-attentive: virtual aggregation solves TCAM overflow problems (high-level description of how it works).

During the Big Hot and Heavy Switches podcast, Dan Hughes complained that the Nexus 7000 switch cannot take the full BGP table. The reason is simple: it’s TCAM (FIB) has only 56.000 entries and the BGP table has almost 350.000 routes.

Nexus 7000 is a Data Center switch, so the TCAM size is not really a limitation (it would usually have a default route toward the WAN core), but the same problem is experienced by Service Providers all over the world – the TCAM/FIB size of their high-speed routers is limited.

It’s usually easy (and comparatively cheap) to upgrade router’s main memory which holds the BGP table and the IP routing table. Upgrading SRAM or TCAM that holds the FIB is either expensive or mission-impossible (the difference between IP routing table (RIB) and IP forwarding table (FIB) is explained in my RIB/FIB blog post), resulting in premature forced retirement of the high-speed gear.

Faced with this problem, some very smart researchers proposed virtual aggregation (and named it ViAggre), a technique that allows you to reduce FIB/TCAM requirements while still having the full BGP table in your router. For an overview of virtual aggregation, read my SearchTelecom article, for more details, read their paper.

5 comments:

  1. I always wander: why not just aggregate prefixes locally before installing to FIB? CEF and similar technologies are highly optimized and structured constructions that should be easily used for aggregating prefixes, especially for small-middle operators on the edge routers with very few next-hops. Really big part of internet is aggregates announced as specifics (for different purposes), but for most of distant AS-es they point to just one upstream. And you can always leave holes in aggregations as specifics. Dirty approach for that concept is just cut full-table to /23 or 22-21 for old router and point 0/0 to big and powerful core (with mpls)- we did that about 8 yeas ago and there was really small amount of suboptimal routing inside our AS.

    ReplyDelete
  2. You just described another great idea. In most cases, it's more than enough to have full routing in the core and default routing on the edge (more so if you're not providing generic transit), but most people don't get it and think they will get suboptimal routing because they might send the traffic toward a /24 in Elbonia in the wrong direction.

    The next problem are the BGP customers - they want full feed (maybe they bought too much RAM and/or care about Elbonia) and you either have full BGP table on the access router or use multihop EBGP into your core (in which case you have support problems with some customers)

    ReplyDelete
  3. For cisco gear there is not much devices with plenty of RAM that can handle full-table and have very limited amount of TCAM, i can only remember some of cats 4500 and non-XL versions of 6500. So, FIB-only optimization is nice to have, but not enough in most cases. Also, it should greatly improve time of convergence in some situations (when prefix independent convergence is not available).

    ReplyDelete
  4. Nexus 7000 ... but it doesn't support MPLS (yet).

    ReplyDelete
  5. Nexus 7K seems to have full table support with the newer XL cards.... I haven't heard of anyone using them yet but they are advertising it for service providers/internet exchange points.

    http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9402/data_sheet_c78-574928.html

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.