How could we filter extraneous BGP prefixes?
Did you know that approximately 40% of BGP prefixes polluting your RIB and FIB are not needed, as they could be either aggregated or suppressed (because an aggregate is already announced)? We definitely need “driver’s license for the Internet”, but that’s not likely to happen, and in the meantime everyone has to keep buying larger boxes to cope with people who cannot configure their BGP routing correctly.
Before you start writing a comment explaining how multiple prefixes are needed due to lack of traffic engineering capabilities in BGP – the report generated by Geoff Huston takes at least some of that in account.
Bill (the reader who wrote me about this issue) is facing a very painful problem: he cannot fit the full BGP routing information he’s receiving from two upstream ISPs into his Sup720-3B and would have to upgrade to Sup720-3BXL.
He noticed that in many cases a BGP prefix and one or more more-specific prefixes share the same next hop, so he could easily drop the more-specific prefixes without changing the forwarding behavior. He simulated this idea on the actual contents of his BGP table and figured out that he could safely drop around 40% of the prefixes he receives. He just needs the inbound filter.
There is no easy way to implement the “drop superfluous more-specific prefixes” filter in Cisco IOS. You could create multiple RIBs (with neighbor soft-reconfiguration command) and implement scripting kludges that would generate inbound filters, but those kludges wouldn’t react to real-time changes in BGP tables.
An alternative might be a host-based BGP daemon (like Quagga) that would connect to the upstream ISPs, collect the BGP prefixes, and pass the minimum subset required to the Catalyst 6500. I was never really interested in BGP daemons and thus have no idea where to start looking for such a beast (OK, I do know how to use Google to find Quagga ;). Could you help Bill, preferably with pointers to solutions that already implemented what he’s looking for? Thank you!
http://www.data.proidea.org.pl/plnog/6edycja/materialy/prezentacje/Robert_Raszuk.pdf
https://puck.nether.net/pipermail/cisco-nsp/2012-January/083087.html
I assume that nothing smaller than /24 is being received? You could take it one step further by filtering anything smaller than a /23. Yes, you would loose some minor prefixes but do you really need them? Maybe you could use a default route for those prefixes. Otherwise you could try to do something a bit more clever. If you know that you will never need to send traffic to Asia or some part of the world then you could try to filter ranges allocated to APAC. It will not be 100% accurate but close enough.
SUP720-3B is very old by now so it might be time for an upgrade anyway but I know it can be difficult getting money allocated.
I would have a look at recent coference papers on the topic:
"On Route Aggregation"
http://conferences.sigcomm.org/co-next/2011/papers/1569470145.pdf
"SMALTA: Practical and Near-Optimal FIB Aggregation"
http://conferences.sigcomm.org/co-next/2011/papers/1569469057.pdf
http://conferences.sigcomm.org/co-next/2011/slides/Uzmi-SMALTA.pptx
"Making Routers Last Longer with ViAggre"
http://www.usenix.org/event/nsdi09/tech/full_papers/ballani/ballani.pdf
http://www.cs.cornell.edu/~hitesh/talks/talk-nsdi09-va.pdf
For the second case, you could probably mess arround with some conditional advertisement. For the first case, some feature like "conditional automatic aggregation" would be a cool thing, but it doesn't exist afaik (at least in IOS). Maybe it's possible to do that in real-time in Junos with custom tools or scripts since it's basicly a BSD system, but for IOS...
If he has full-table-capable routers near the 720s, he can originate a default-route from the router and suboptimal routing should stay near enough.
1. The best 'PC' based BGP implementation is currently OpenBGPD (run on on OpenBSD) . No serious BGP engineer can afford to ignore it these days. Even has MPLS & VRF support.
2. Not sure which of the 3B's resources you are specifically discussing, but soft-reconfiguration would double some memory requirements for each peer, so that is unlikely to be feasible anyway.
3. Standard tricks in this situation are to implement prefix length filtering e.g. drop > /24, if that still doesn't fit, then > /23 etc. Maybe throw in a static default to null, and then judiciously log non-RFC5735 packets that go there to see who's not aggregating.
4. In the end the 3B can't cut the mustard anymore. Especially if it's holding v6 tables as well. If you have traffic to justify the peerings, then you can presumably afford the 3CXL (or whatever). Otherwise see (1.) for alternative cheaper option..
Wrote about Virtual Aggregation two years ago (http://searchtelecom.techtarget.com/tip/Virtual-aggregation-Lifeline-for-exploding-Internet-routing-tables), that one solves the edge FIB problem by using large-scale core routers. Although the idea seemed interesting, not much has been done in the meantime.
SMALTA seems interesting and they claim they have a Quagga/Zebra implementation. Probably a lot of extra glue would be needed to get it running ...
ftp://ftp-eng.cisco.com/cons/isp/security/Ingress-Prefix-Filter-Templates/
Will drastically reduce the # of prefixes loaded onto the SUPs. Also take defaults along with the full tables from providers just to cover any small cases that don't adhere to RIR policies.