Blog Posts in February 2009
Anyone who has ever had the “privilege” of interviewing a certified individual with purely theoretical knowledge appreciates the value of hands-on tests. The creators of certifications in the IT industry (including Cisco Systems) have responded by including more and more hands-on exercises in the certification exams. Unfortunately, Cisco decided not to use the real equipment, but rather simulations. While this is definitely better than relying exclusively on multiple-choice tests, students can still work their way through the simulations without having a decent level of hands-on experience.
I’ve got an unusual question a few days ago:
Does a loop (cable returning back to same switch) in one switch affect other switches? How can I detect that there is such a problem in a particular switch?
The correct answer to the first question is obviously it depends. To start with, it depends on whether the two ports will be able to communicate. With a crossover (switch-to-switch) cable (and assuming there are no negotiation issues), the physical layer will work correctly. If you’re using a standard RJ-45 patch cable, you’re “out of luck” unless the switch is too smart and has auto-MDI sensing (like the Linksys switches, now well hidden under obscure part numbers like Cisco SRW248G4). In this case, the two ports will become active even connected with a patch cable.
Numerous articles describing the widespread routing instabilities caused by sloppy parser of a small router vendor (including posts at BGPmon, Renesys, Arbor Security and my blog) hinted that the unusual BGP update caused so many problems because the ISPs were using outdated Cisco IOS releases. This is definitely not the case; all classic IOS releases were affected.
Rodney Dunn from Cisco and myself were quickly able to reproduce so far unknown bug in Cisco IOS that occurs only when the inbound AS-path contains close to 255 AS numbers and the router does inbound or outbound AS-path prepending. The new bug is tracked as CSCsx73770 and affects downstream EBGP or IBGP sessions as follows:
The “BGP experiment” a small European ISP performed in February 2009 has generated quite a splash: Cisco has discovered a new BGP bug that can be triggered only if you have a long enough AS-path and do outbound AS-path prepending (and a few of us learned more BGP intricacies we never wanted to know), lots of people have (hopefully) discovered the importance of the bgp maxas-limit configuration command and at least some ISPs have implemented inbound prepending filters that I wrote about almost a year ago. However, most of us thought that the original problem arose due to inexperienced operators of a leaf AS.
Mikael Abrahamsson was the first to notice that the number of prepends matches the low-order 8 bits of the offending AS number. Further contributors to NANOG mailing list confirmed that two autonomous systems with very long prepends are using BGP routers from Mikrotik. You configure those boxes with commands that have syntax deceptively close to Cisco's, but expect the number of AS numbers to prepend, not the AS-path. Obviously no range checking is done on the configuration parameter and the high-order 8 bits are ignored.
So it looks like the incident started with a box that accepts invalid configuration parameter used in an AS with very high value in low-order 8 bits (quite improbable, but obviously not impossible). Numerous ISPs that did not limit the BGP updates they were propagating and an IOS bug did the rest.
In February 2009, a greenhorn ISP (they joined RIPE less than four months before the incident) in central Europe managed to generate a BGP update with too many AS numbers in the AS path, confusing older routers. You can find the details in an old Renesys blog post; at the peak of the instability, they were receiving over 100.000 BGP updates per second.
It’s very easy to protect yourself (and your downstream neighbors) from an operational error like this one. Cisco has implemented the AS-path length limiting code in IOS release 12.2. One would hope that the major ISPs would have started using this feature years ago; obviously that’s not the case, so here’s how you do it… just to make sure everyone understands what the bgp maxas-limit command does and hopefully implements it in this millennium.
BGP allows numerous attributes (including AS-path, metrics, local preference and communities) to be attached to every advertised IP prefix. The total length of BGP attributes attached to a single IP prefix can be very large (up to 64K bytes). IP prefixes with excessive amount of attribute data residing in the BGP table can results in significant memory utilization and trigger software bugs.
AS-path attribute having more than 255 AS numbers is expressed as multiple AS_SEQUENCE segments. This unusual AS-path composition caused problems in older Cisco IOS releases and could result in continuously flapping BGP session. Using bgp maxas-limit avoids this behavior unless the route-map based AS-path prepending extends the AS-path length beyond 255 AS numbers.
The extended length bit in the BGP UPDATE message that has to be used when the AS-path length exceeds 128 AS numbers also caused errors in older IOS releases (Cisco bug ID CSCdr54230).
Cisco IOS can limit the maximum length of the AS-path attribute with the bgp maxas-limit length router configuration command. It’s highly advisable that you use this command together with other BGP security measures to reduce the potential impact of oversized AS-path attributes on the operation of your network.
The maximum sensible length of the AS-path attribute depends on your position within the Internet. Core operators observe lower AS-path lengths than the edge points. Due to CSCdr54230, accepting AS-paths having more than approximately 100 AS numbers was best avoided; reasonable values are usually much lower… and as always, Geoff Huston published a measurement giving you the answer to that question.
Configuring the bgp maxas-limit command does not impact the regular BGP operation. The maxas-limit is checked during the inbound update processing. Prefixes with oversized AS-path length are simply ignored; BGP sessions are not disrupted.
Test bed description
The bgp maxas-limit functionality can be easily demonstrated in a test bed consisting of only two routers.
hostname R1 ! ip cef ! interface Loopback0 ip address 10.0.1.1 255.255.255.255 ! interface Serial1/0 description Link to R2 s1/0 ip address 10.0.7.13 255.255.255.252 encapsulation ppp ! router bgp 65000 no synchronization bgp log-neighbor-changes network 10.1.1.0 mask 255.255.255.0 network 10.1.2.0 mask 255.255.255.0 neighbor 10.0.7.14 remote-as 65100 neighbor 10.0.7.14 route-map prepend out no auto-summary ! ip classless ! ip route 10.1.1.0 255.255.255.0 Null0 ip route 10.1.2.0 255.255.255.0 Null0 ! ip prefix-list prepend seq 5 permit 10.1.2.0/24 ! route-map prepend permit 10 match ip address prefix-list prepend set as-path prepend 65000 65000 65000 65000 ! route-map prepend permit 20 ! line con 0 exec-timeout 0 0 privilege level 15 logging synchronous transport preferred none stopbits 1 ! ntp logging end
hostname R2 ! ip cef ! interface Loopback0 ip address 10.0.1.2 255.255.255.255 ! interface Serial1/0 description Link to R1 s1/0 ip address 10.0.7.14 255.255.255.252 encapsulation ppp ! router bgp 65100 no synchronization bgp log-neighbor-changes bgp maxas-limit 3 neighbor 10.0.7.13 remote-as 65000 no auto-summary ! line con 0 exec-timeout 0 0 privilege level 15 logging synchronous transport preferred none stopbits 1 ! ntp logging end
The bgp maxas-limit functionality does not impact the regular BGP operation. Whenever an inbound BGP update is received with an oversized AS-path attribute, the router logs a warning message and ignores the update.
%BGP-6-ASPATH: Long AS path 65000 65000 65000 65000 65000 received from 10.0.7.13: More than configured MAXAS-LIMIT
The AS-path length limiting functionality can also be observed with any of the debug ip bgp update commands. A sample printout is included below:
BGP(0): 10.0.7.13 rcv UPDATE w/ attr: nexthop 10.0.7.13, origin i, metric 0, originator 0.0.0.0, path 65000 65000 65000 65000 65000, community , extended community , SSA attribute BGPSSA ssacount is 0 BGP(0): 10.0.7.13 rcv UPDATE about 10.1.2.0/24 -- DENIED due to: AS-PATH length over 4072; BGP(0): 10.0.7.13 rcvd UPDATE w/ attr: nexthop 10.0.7.13, origin i, metric 0, path 65000 BGP(0): 10.0.7.13 rcvd 10.1.1.0/24...duplicate ignored
Readers who commented on some of my previous certification-related posts have complained about the vagueness of exam questions. I have to agree with them; I’ve seen my fair share of dubious questions in the exams I’ve taken. For example, when I was developing EIGRP and BGP courses for Cisco, my lowest scores on the CCIE recertification exams were in those two categories. I knew too many details and was confused by the vagueness of the questions.
When I’ve stumbled across the headline Porn site feud spawns new DNS attack on NetworkWorld’s web site, the urge to read the article was simply irresistible. The article starts with the following paragraph (emphasis mine):
A scrap between two pornographic Web sites turned nasty when one figured out how to take down the other by exploiting a previously unknown quirk in the Internet's DNS.
The link in the paragraph points to another article documenting a completely different DNS attack. The next paragraph contradicts the first one (emphasis yet again mine):
The attack is known as DNS Amplification. It has been used sporadically since December, but it started getting talked about last month when ISPrime, a small New York ISP, started getting hit hard with what's known as a distributed denial of service (DDoS) attack.
Reading an interestingly-titled article on InformIT, I’ve stumbled across the following text:
The survival time is an estimate of how long an un-patched computer will remain uncompromised once it’s connected to the Internet. While the actual time varies, historically it tends to run between 4 and 20 minutes.
This is such an obvious nonsense that I had to check the source, which is also full of alarming messages, but admits at the end that the problems described largely disappeared with XP SP2. Just to put things in perspective: XP SP2 was released in August 2004 and the graph in the alarming blog post displays data from 2008.
Next step: investigate the source of the graph. The »average survival time« is defined as the time between probes on numerous TCP or UDP ports, regardless of whether the port was actually enabled in the workstation and whether the probe was successful or not.
My personal conclusion: as most workstations include some sort of rudimentary firewall these days, the whole approach is bogus. More precisely, it measures an important parameter (average time between probes), but claims it represents something completely different (average survival time). Would you agree with my conclusion?
Lesson learned: Never trust alarming over-simplifying statements based on misunderstood data.
Recently, on an IPSec-based customer network, we installed one of the brand new platforms introduced by Cisco Systems. The initial software release had memory leaks (no problem, we all know these things happen), so we upgraded the box to the latest software. It works perfectly … until you reload it. The software we’re forced to use cannot get IPSec to work if the startup configuration includes interface-level crypto-maps. Interestingly, you can configure crypto-maps manually and they work … until you save them into the startup configuration and reload the box.
What would you think if you’d receive three queries about the same (somewhat obscure) feature within six hours? It started with a nice e-mail from an engineer that I’ve corresponded with in the past. He wanted to send a Wake-on-LAN packet to a PC in a remote office. Usually you could use the ip directed-broadcast feature, but he wanted to use the remote office router to generate the packet.