Oversized AS Paths: Cisco IOS Bug Details

Numerous articles describing the widespread routing instabilities caused by sloppy parser of a small router vendor (including posts at BGPmon, Renesys, Arbor Security and my blog) hinted that the unusual BGP update caused so many problems because the ISPs were using outdated Cisco IOS releases. This is definitely not the case; all classic IOS releases were affected.

Rodney Dunn from Cisco and myself were quickly able to reproduce so far unknown bug in Cisco IOS that occurs only when the inbound AS-path contains close to 255 AS numbers and the router does inbound or outbound AS-path prepending. The new bug is tracked as CSCsx73770 and affects downstream EBGP or IBGP sessions as follows:

read more see 3 comments

Root Cause Analysis: Oversized AS Paths

The “BGP experiment” a small European ISP performed in February 2009 has generated quite a splash: Cisco has discovered a new BGP bug that can be triggered only if you have a long enough AS-path and do outbound AS-path prepending (and a few of us learned more BGP intricacies we never wanted to know), lots of people have (hopefully) discovered the importance of the bgp maxas-limit configuration command and at least some ISPs have implemented inbound prepending filters that I wrote about almost a year ago. However, most of us thought that the original problem arose due to inexperienced operators of a leaf AS.

Mikael Abrahamsson was the first to notice that the number of prepends matches the low-order 8 bits of the offending AS number. Further contributors to NANOG mailing list confirmed that two autonomous systems with very long prepends are using BGP routers from Mikrotik. You configure those boxes with commands that have syntax deceptively close to Cisco's, but expect the number of AS numbers to prepend, not the AS-path. Obviously no range checking is done on the configuration parameter and the high-order 8 bits are ignored.

So it looks like the incident started with a box that accepts invalid configuration parameter used in an AS with very high value in low-order 8 bits (quite improbable, but obviously not impossible). Numerous ISPs that did not limit the BGP updates they were propagating and an IOS bug did the rest.

see 2 comments

Protect Your Network with BGP maxas-limit

In February 2009, a greenhorn ISP (they joined RIPE less than four months before the incident) in central Europe managed to generate a BGP update with too many AS numbers in the AS path, confusing older routers. You can find the details in an old Renesys blog post; at the peak of the instability, they were receiving over 100.000 BGP updates per second.

It’s very easy to protect yourself (and your downstream neighbors) from an operational error like this one. Cisco has implemented the AS-path length limiting code in IOS release 12.2. One would hope that the major ISPs would have started using this feature years ago; obviously that’s not the case, so here’s how you do it… just to make sure everyone understands what the bgp maxas-limit command does and hopefully implements it in this millennium.

The following text written by Ivan Pepelnjak in 2009 was originally published on CT3 wiki. That web site became unreachable in early 2019. We retrieved the original text from the Internet Archive, cleaned it up, updated it with recent information if necessary, and republished it on ipSpace.net blog on December 5, 2020

BGP allows numerous attributes (including AS-path, metrics, local preference and communities) to be attached to every advertised IP prefix. The total length of BGP attributes attached to a single IP prefix can be very large (up to 64K bytes). IP prefixes with excessive amount of attribute data residing in the BGP table can results in significant memory utilization and trigger software bugs.

AS-path attribute having more than 255 AS numbers is expressed as multiple AS_SEQUENCE segments. This unusual AS-path composition caused problems in older Cisco IOS releases and could result in continuously flapping BGP session. Using bgp maxas-limit avoids this behavior unless the route-map based AS-path prepending extends the AS-path length beyond 255 AS numbers.

The extended length bit in the BGP UPDATE message that has to be used when the AS-path length exceeds 128 AS numbers also caused errors in older IOS releases (Cisco bug ID CSCdr54230).

Cisco IOS can limit the maximum length of the AS-path attribute with the bgp maxas-limit length router configuration command. It’s highly advisable that you use this command together with other BGP security measures to reduce the potential impact of oversized AS-path attributes on the operation of your network.

The maximum sensible length of the AS-path attribute depends on your position within the Internet. Core operators observe lower AS-path lengths than the edge points. Due to CSCdr54230, accepting AS-paths having more than approximately 100 AS numbers was best avoided; reasonable values are usually much lower… and as always, Geoff Huston published a measurement giving you the answer to that question.

Configuring the bgp maxas-limit command does not impact the regular BGP operation. The maxas-limit is checked during the inbound update processing. Prefixes with oversized AS-path length are simply ignored; BGP sessions are not disrupted.

Test bed description

The bgp maxas-limit functionality can be easily demonstrated in a test bed consisting of only two routers.

Configuration of R1 - the sender of long AS paths
hostname R1
!
ip cef
!
interface Loopback0
 ip address 10.0.1.1 255.255.255.255
!
interface Serial1/0
 description Link to R2 s1/0
 ip address 10.0.7.13 255.255.255.252
 encapsulation ppp
!
router bgp 65000
 no synchronization
 bgp log-neighbor-changes
 network 10.1.1.0 mask 255.255.255.0
 network 10.1.2.0 mask 255.255.255.0
 neighbor 10.0.7.14 remote-as 65100
 neighbor 10.0.7.14 route-map prepend out
 no auto-summary
!
ip classless
!
ip route 10.1.1.0 255.255.255.0 Null0
ip route 10.1.2.0 255.255.255.0 Null0
!
ip prefix-list prepend seq 5 permit 10.1.2.0/24
!
route-map prepend permit 10
 match ip address prefix-list prepend
 set as-path prepend 65000 65000 65000 65000
!
route-map prepend permit 20
!
line con 0
 exec-timeout 0 0
 privilege level 15
 logging synchronous
 transport preferred none
 stopbits 1
!
ntp logging
end 
Configuration of R2 - the receiver of long AS paths
hostname R2
!
ip cef
!
interface Loopback0
 ip address 10.0.1.2 255.255.255.255
!
interface Serial1/0
 description Link to R1 s1/0
 ip address 10.0.7.14 255.255.255.252
 encapsulation ppp
!
router bgp 65100
 no synchronization
 bgp log-neighbor-changes
 bgp maxas-limit 3
 neighbor 10.0.7.13 remote-as 65000
 no auto-summary
!
line con 0
 exec-timeout 0 0
 privilege level 15
 logging synchronous
 transport preferred none
 stopbits 1
!
ntp logging
end 

Exception logging

The bgp maxas-limit functionality does not impact the regular BGP operation. Whenever an inbound BGP update is received with an oversized AS-path attribute, the router logs a warning message and ignores the update.

Log message generated after an inbound update has been ignored
%BGP-6-ASPATH: Long AS path 65000 65000 65000 65000 65000
 received from 10.0.7.13: More than configured MAXAS-LIMIT 

The AS-path length limiting functionality can also be observed with any of the debug ip bgp update commands. A sample printout is included below:

BGP debugging printout generated on R2
BGP(0): 10.0.7.13 rcv UPDATE w/ attr: nexthop 10.0.7.13, origin i,
  metric 0, originator 0.0.0.0, path 65000 65000 65000 65000 65000,
  community , extended community , SSA attribute
BGPSSA ssacount is 0
BGP(0): 10.0.7.13 rcv UPDATE about 10.1.2.0/24 -- DENIED due to:
  AS-PATH length over 4072;
BGP(0): 10.0.7.13 rcvd UPDATE w/ attr: nexthop 10.0.7.13, origin i,
  metric 0, path 65000
BGP(0): 10.0.7.13 rcvd 10.1.1.0/24...duplicate ignored 
add comment

Writing good exam questions

Readers who commented on some of my previous certification-related posts have complained about the vagueness of exam questions. I have to agree with them; I’ve seen my fair share of dubious questions in the exams I’ve taken. For example, when I was developing EIGRP and BGP courses for Cisco, my lowest scores on the CCIE recertification exams were in those two categories. I knew too many details and was confused by the vagueness of the questions.

read more see 1 comments

Yellow journalism at work: Previously Unknown DNS Attacks

When I’ve stumbled across the headline Porn site feud spawns new DNS attack on NetworkWorld’s web site, the urge to read the article was simply irresistible. The article starts with the following paragraph (emphasis mine):

A scrap between two pornographic Web sites turned nasty when one figured out how to take down the other by exploiting a previously unknown quirk in the Internet's DNS.

The link in the paragraph points to another article documenting a completely different DNS attack. The next paragraph contradicts the first one (emphasis yet again mine):

The attack is known as DNS Amplification. It has been used sporadically since December, but it started getting talked about last month when ISPrime, a small New York ISP, started getting hit hard with what's known as a distributed denial of service (DDoS) attack.
read more see 4 comments

Off-topic: Workstation vulnerability — FUD at its best

Reading an interestingly-titled article on InformIT, I’ve stumbled across the following text:

The survival time is an estimate of how long an un-patched computer will remain uncompromised once it’s connected to the Internet. While the actual time varies, historically it tends to run between 4 and 20 minutes.

This is such an obvious nonsense that I had to check the source, which is also full of alarming messages, but admits at the end that the problems described largely disappeared with XP SP2. Just to put things in perspective: XP SP2 was released in August 2004 and the graph in the alarming blog post displays data from 2008.

Next step: investigate the source of the graph. The »average survival time« is defined as the time between probes on numerous TCP or UDP ports, regardless of whether the port was actually enabled in the workstation and whether the probe was successful or not.

My personal conclusion: as most workstations include some sort of rudimentary firewall these days, the whole approach is bogus. More precisely, it measures an important parameter (average time between probes), but claims it represents something completely different (average survival time). Would you agree with my conclusion?

Lesson learned: Never trust alarming over-simplifying statements based on misunderstood data.

see 2 comments

Dance around IOS bugs with Tcl and EEM

Recently, on an IPSec-based customer network, we installed one of the brand new platforms introduced by Cisco Systems. The initial software release had memory leaks (no problem, we all know these things happen), so we upgraded the box to the latest software. It works perfectly … until you reload it. The software we’re forced to use cannot get IPSec to work if the startup configuration includes interface-level crypto-maps. Interestingly, you can configure crypto-maps manually and they work … until you save them into the startup configuration and reload the box.

read more add comment

Things you cannot do with Tclsh

What would you think if you’d receive three queries about the same (somewhat obscure) feature within six hours? It started with a nice e-mail from an engineer that I’ve corresponded with in the past. He wanted to send a Wake-on-LAN packet to a PC in a remote office. Usually you could use the ip directed-broadcast feature, but he wanted to use the remote office router to generate the packet.

read more see 8 comments

Decent DNS, DHCP and HTTP server on an ISR router

Readers of my blog have probably noticed that I’m occasionally documenting the shortcomings of DNS and DHCP servers built into Cisco IOS (I will not even mention the HTTP server, this one gets constantly degraded). On the other hand, although you could centralize all these services, the centralization makes the branch offices completely dependent on the availability of WAN uplinks; without a working uplink, a branch office stops completely.

read more see 8 comments

I need to slow down :)

I’ve just opened the January Technical Services News from Cisco. Nothing in there that would really interest me. Almost no routing protocols (one OSPF article), no BGP, no MPLS VPN. Based solely on this newsletter, one could get the feeling that I’m producing more documents covering core IP routing in a month than Cisco (I am positive that’s not the case).

But maybe Cisco’s engineers are refocusing on the new Support Wiki. Not really. After I’ve filtered out sequential changes to a single document, there were only 11 significantly changed documents in the Support Wiki in the last 30 days.

So I’m left wondering … what’s going on? Has everything already been written about the core IP routing features and the productive minds have shifted to voice and wireless? Are the engineers focused on IP routing becoming the dinosaurs? What’s your perspective?

But one thing is clear: I need to slow down.

see 3 comments

Interactions between IP routing and QoS

One of my readers sent me an interesting question a while ago:

I reviewed one of your blog posts "Per-Destination or Per Packet CEF Load Sharing?" and wondered if you had investigated previously on how MQC QoS worked together with the CEF load-sharing algorithm (or does it interact at all)? For example, let's say I have two equal cost paths between two routers and the routing table (as well as CEF) sees both links as equal paths to the networks behind each router. On each link I have the same outbound service policy applied with a simple LLQ, BW, and a class-default queues. Does CEF check each IP flow and make sure both link's LLQ and BW queues are evenly used?

Unfortunately, packet forwarding and QoS are completely uncoupled in Cisco IOS. CEF performs its load balancing algorithm purely on source/destination information and does not take in account the actual utilization of outbound interfaces. If you have bad luck, most of the traffic ends on one of the links and the packets that would easily fit on the other link will be dropped by the QoS mechanisms.

You could use multilink PPP to solve the problem in low-speed environments. With MLPPP, CEF sends the traffic to a single output interface (the Multilink interface) and the queuing mechanisms evenly distribute packet fragments across the links in the bundle.

In high-speed environments, you can only hope that the number of traffic flows traversing the links will be so high that you’ll get a good statistical distribution (which is usually the case).

see 3 comments

Flash-based DHCP database

Pete sent me an interesting question a while ago:

It might be interesting to write an article about ip dhcp database flash:dhcp-db command, documenting the pros of surviving a reboot versus cons of wear on the flash device.

I’ve already written about a few problems that can be solved with the DHCP database (but obviously a longer text is warranted … already stored in my to-do list) and it took me a while to find the time to dig out the relevant information on the flash device wear.

read more see 5 comments

EBGP Multipath Load Sharing and CEF

When I was discussing the details of the BGP troubleshooting video with one of my readers, he pointed out that I should mention the need for CEF switching in EBGP multipath scenario. My initial response was “Why would you need CEF? EBGP multipath is older than CEF” and his answer told me I should turn on my gray cells before responding to emails: “Your video as well as Cisco’s web site recommends CEF for EBGP multipath design… but interestingly, it does work without CEF”.

The real reason we need CEF in EBGP load sharing designs is the efficacy of load distribution. Without CEF, the router will send all traffic toward a single BGP prefix over one of the links (fast switching performs per-destination-prefix load sharing). With CEF, the load is distributed based on the source-destination IP address pair combinations. Even if multiple clients send the traffic toward the same server, the load is spread across available links.

see 3 comments

Generate HTTP(S) requests from Tcl shell

A few days ago, a reader sent me an e-mail titled “Telnet Automation from a Cisco Router” and complained that IOS Tcl does not support the expect commands (spawn, send and expect). Since Expect is a Tcl extension, not part of the core Tcl, it’s not included in Cisco IOS, which was the only answer I could give.

You might be able to port Expect to IOS as a Tcl package if it doesn’t require external libraries.
read more see 14 comments
Sidebar