Blog Posts in February 2009
Anyone who has ever had the “privilege” of interviewing a certified individual with purely theoretical knowledge appreciates the value of hands-on tests. The creators of certifications in the IT industry (including Cisco Systems) have responded by including more and more hands-on exercises in the certification exams. Unfortunately, Cisco decided not to use the real equipment, but rather simulations. While this is definitely better than relying exclusively on multiple-choice tests, students can still work their way through the simulations without having a decent level of hands-on experience.
I’ve got an unusual question a few days ago:
Does a loop (cable returning back to same switch) in one switch affect other switches? How can I detect that there is such a problem in a particular switch?
The correct answer to the first question is obviously it depends. To start with, it depends on whether the two ports will be able to communicate. With a crossover (switch-to-switch) cable (and assuming there are no negotiation issues), the physical layer will work correctly. If you’re using a standard RJ-45 patch cable, you’re “out of luck” unless the switch is too smart and has auto-MDI sensing (like the Linksys switches, now well hidden under obscure part numbers like Cisco SRW248G4). In this case, the two ports will become active even connected with a patch cable.
Numerous articles describing the widespread routing instabilities caused by sloppy parser of a small router vendor (including posts at BGPmon, Renesys, Arbor Security and my blog) hinted that the unusual BGP update caused so many problems because the ISPs were using outdated Cisco IOS releases. This is definitely not the case; all classic IOS releases were affected.
Rodney Dunn from Cisco and myself were quickly able to reproduce so far unknown bug in Cisco IOS that occurs only when the inbound AS-path contains close to 255 AS numbers and the router does inbound or outbound AS-path prepending. The new bug is tracked as CSCsx73770 and affects downstream EBGP or IBGP sessions as follows:
The Tuesday's BGP experiment has generated quite a splash: Cisco has discovered a new BGP bug that can be triggered only if you have a long enough AS-path and do outbound AS-path prepending (and a few of us learned more BGP intricacies we never wanted to know), lots of people have (hopefully) discovered the importance of the bgp maxas-limit configuration command and at least some ISPs have implemented inbound prepending filters that I wrote about almost a year ago. However, most of us thought that the original problem arose due to inexperienced operators of a leaf AS.
Mikael Abrahamsson was the first to notice that the number of prepends matches the low-order 8 bits of the offending AS number. Further contributors to NANOG mailing list confirmed that two autonomous systems with very long prepends are using BGP routers from Mikrotik. You configure those boxes with commands that have syntax deceptively close to Cisco's, but expect the number of AS numbers to prepend, not the AS-path. Obviously no range checking is done on the configuration parameter and the high-order 8 bits are ignored.
So it looks like the incident started with a box that accepts invalid configuration parameter used in an AS with very high value in low-order 8 bits (very improbable, but obviously not impossible). Numerous ISPs that did not limit the BGP updates they were propagating and an IOS bug did the rest.
One of my fellow engineers was working with a customer who wanted to have UMTS backup for his primary link. They’ve got it to work, including IPSec+GRE running over UMTS to provide the necessary privacy/security. However, it turned out that all wireless solutions (at least as offered by nearby Service Providers) are ridden with ridiculous round-trip times. For example, UMTS transmission delay is equivalent to the time it takes to send an IP packet from Central Europe to US West coast.
Obviously UMTS technology is not ready to provide backup for VoIP traffic (which prompts an interesting question: how do the Service Providers plan to provide reasonable quality IP-based voice calls over UTMS?). The reality is way worse than that: a few hundred millisecond round-trip time can kill web browsing performance if you have awfully-designed web sites (and a lot of intranets are not well-designed because the developers never considered the delay impact).
What could you do if your MPLS VPN Service Provider offers BGP between PE and CE routers, while the layer-3 switches within your site support only OSPF or RIP? If you’re studying for your CCIE lab exam, you should be able to give me at least two answers in three milliseconds. Unfortunately, most enterprise engineers are not so familiar with BGP or multiple routing protocols in a single network.
To help them, I’ve prepared the next video in my “Introduction to BGP” series. It discussed two design options: two-way redistribution or default routing in the OSPF part of the network. The “Layer-3 switching in an MPLS VPN site” article in the CT3 wiki contains the high-quality video and the network diagram as well as the initial and final router configurations. You can also watch the video served from Vimeo.
If you’ve ever tried to design QoS in a large Service Provider network, you’ve probably realized that QoS is a zero-sum game: you can give some traffic preferential treatment only if you decide to drop or delay some other traffic. Furthermore, QoS is not linked to IP routing; interface queues cannot apply backpressure to IP routing tables or influence load sharing ratios.
The only technology that can shift the excess traffic in high-speed IP-only networks from congested links to underutilized alternate paths is MPLS traffic engineering. The “Using MPLS TE to avoid core network congestion” article I’ve written for SearchTelecom describes the basic interactions of MPLS TE and QoS and introduces autotunnel and autobandwidth concepts.
I was wrong about the details of yesterday's Internet brownout: older IOS releases don't recognize AS-paths having more than 128 AS numbers due to improper handling of extended length flag in the BGP UPDATE message (CSCdr54230).
However, quick stress tests indicate that classic IOS releases (including 12.2SRC) can't handle AS-paths having more than 255 AS numbers. IOS is able to accept (and properly process) inbound updates with two AS_SEQUENCE segments, but does not generate valid AS-path attribute in outbound update when there are more than 255 AS numbers in the AS-path, resulting in a NOTIFICATION message and continuously flapping BGP session. The only global protection you have against this behavior is the bgp maxas-limit router configuration command.
I've also updated the Wiki article.
Two 3725 “core” routers: $15.000
A layer-3 switch to connect them: $3.000
Connectivity to two upstream ISPs: $3000/month (estimate)
Have you noticed how slow the Internet was yesterday? I almost blamed my kids (sometimes they manage to overload my WAN link), but it turned out to be a global problem. It looks like a greenhorn ISP (they joined RIPE less than four months ago) in central Europe managed to generate a BGP update with too many AS numbers in the AS path, confusing older routers. It’s my wild guess that those routers did not anticipate two AS_SEQUENCE attributes in the BGP update message. You can find the details in the Renesys blog; at the peak of the instability, they were receiving over 100.000 BGP updates per second.
It’s very easy to protect yourself (and your downstream neighbors) from an operational error like this one. Cisco has implemented the AS-path length limiting code in IOS release 12.2. One would hope that the major ISPs would have started using this feature years ago; obviously that’s not the case. I wrote an article in the CT3 Wiki describing the “intricate” details of this obviously ignored IOS feature just to make sure everyone understands what the bgp maxas-limit command does (and hopefully implements it in this millennium)
One of the readers of our forums was looking for an interesting solution: he would like to be able to display interface configuration while configuring the same interface. Obviously you could always use the do show running interface name command, but he was looking for a single command without parameters that would display the configuration of the currently selected interface.
Boštjan Šuštar devised a brilliant solution based on EEM Tcl policies and expanded it to include the routing protocol configuration. After installing two EEM policies and adding a few aliases, you can use the config command in interface or routing protocol configuration modes to see the current configuration of the object you’re configuring.
Isaac was trying to implement the HTTP PUT solution I’ve described in the IP Corner article Using a Web Server to Manage Your Router Configurations. He couldn’t get IIS or Apache to work and finally discovered that he was facing an IOS problem. When he’s upgraded the routers to IOS release 12.4(20)T or 12.4(22)T, the copy running-config http://host/path command worked as expected.
Isaac, thanks for the feedback!
Readers who commented on some of my previous certification-related posts have complained about the vagueness of exam questions. I have to agree with them; I’ve seen my fair share of dubious questions in the exams I’ve taken. For example, when I was developing EIGRP and BGP courses for Cisco, my lowest scores on the CCIE recertification exams were in those two categories. I knew too many details and was confused by the vagueness of the questions.
When I’ve stumbled across the headline Porn site feud spawns new DNS attack on NetworkWorld’s web site, the urge to read the article was simply irresistible. The article starts with the following paragraph (emphasis mine):
A scrap between two pornographic Web sites turned nasty when one figured out how to take down the other by exploiting a previously unknown quirk in the Internet's DNS.
The link in the paragraph points to another article documenting a completely different DNS attack. The next paragraph contradicts the first one (emphasis yet again mine):
The attack is known as DNS Amplification. It has been used sporadically since December, but it started getting talked about last month when ISPrime, a small New York ISP, started getting hit hard with what's known as a distributed denial of service (DDoS) attack.
A few hours ago, Internetwork Expert launched an ingenious solution: the Poly-Lab Assessments. The basic idea is brilliantly simple: if you’re using a common hardware and L2/L3 topology for all labs (and hundreds of tasks within these labs), it’s possible to mix individual tasks into a customized lab tailored to the needs of a CCIE candidate. The mix-and-match approach allows you to customize the difficulty of selected tasks (and technologies from the CCIE blueprint) to the candidate’s capabilities.
The fourth article in the IPSec series written by Boštjan Šuštar deals with Dynamic Multipoint VPN (DMVPN). Boštjan describes the design and implementation aspects of DMVPN without going into too many unnecessary details, behavior of RIP, OSPF and EIGRP over DMVPN clouds and the resilience (multiple hubs), performance and scalability considerations of DMVPN.
Reading an interestingly-titled article on InformIT, I’ve stumbled across the following text:
The survival time is an estimate of how long an un-patched computer will remain uncompromised once it’s connected to the Internet. While the actual time varies, historically it tends to run between 4 and 20 minutes.
This is such an obvious nonsense that I had to check the source, which is also full of alarming messages, but admits at the end that the problems described largely disappeared with XP SP2. Just to put things in perspective: XP SP2 was released in August 2004 and the graph in the alarming blog post displays data from 2008.
Next step: investigate the source of the graph. The »average survival time« is defined as the time between probes on numerous TCP or UDP ports, regardless of whether the port was actually enabled in the workstation and whether the probe was successful or not.
My personal conclusion: as most workstations include some sort of rudimentary firewall these days, the whole approach is bogus. More precisely, it measures an important parameter (average time between probes), but claims it represents something completely different (average survival time). Would you agree with my conclusion?
Lesson learned: Never trust alarming over-simplifying statements based on misunderstood data.
Finally I found time to organize the various interesting Tclsh bits-and-pieces I’ve blogged about in a comprehensive Tclsh on Cisco IOS tutorial.
It’s not a classic tutorial; I’m assuming you know what Tcl is and how to write Tcl programs. The articles in the tutorial document the implementation details and discrepancies that are not (to my knowledge) documented anywhere else.
You might be aware that there are two ways of executing IOS CLI commands from IOS tclsh: either you insert the IOS commands directly in the Tcl code (tclsh passes any unknown commands to the CLI parser) or you execute them with the exec command. There are subtle differences between the two methods, described in the “Executing IOS commands from Tcl shell article” in the CT3 wiki.
Recently, on an IPSec-based customer network, we installed one of the brand new platforms introduced by Cisco Systems. The initial software release had memory leaks (no problem, we all know these things happen), so we upgraded the box to the latest software. It works perfectly … until you reload it. The software we’re forced to use cannot get IPSec to work if the startup configuration includes interface-level crypto-maps. Interestingly, you can configure crypto-maps manually and they work … until you save them into the startup configuration and reload the box.
What would you think if you’d receive three queries about the same (somewhat obscure) feature within six hours? It started with a nice e-mail from an engineer that I’ve corresponded with in the past. He wanted to send a Wake-on-LAN packet to a PC in a remote office. Usually you could use the ip directed-broadcast feature, but he wanted to use the remote office router to generate the packet.
Very recent IOS releases (12.2SRC and 12.4(22)T) have a handy command: show running vrf name, which displays VRF, interface and routing protocol configurations of the specified VRF. It does not, however, include referenced access-lists or route-maps.
Davor Koncic solved this problem years ago: he wrote a Tcl script that does the same job better; his script displays most configuration parameters related to a VRF (he’s missing the MQC or IPSec parts). Great job (and a nice illustration of the power of Tclsh on IOS).