Building Network Automation Solutions
6 week online course starting in September 2017

Sometimes the path is more important than the destination ...

I received an interesting comment on one of my knowledge/certification-related posts:

I used to think that certifications were a useful indicator of knowledge or at least initiative, but I’m changing my mind. [...] I feel like I’ve gotten a lot out of studying for certifications, especially CCIE, but I’m starting to wonder if that’s the exception.

I guess a lot of prospective internetworking engineers are thinking along the same lines, so here’s my personal perspective on this issue.

This is why I don’t trust “independent experts”

The Network World recently published a story describing the results of an independent security product testing lab, where they’ve discovered (surprise, surprise) that adding security features to Cisco routers “presents a tremendous bottleneck” and “can turn a 60G router into a 5G one or even a 100M bit/sec device”.

The test results haven’t been published yet; I’ve got all the quotes from the NW story, so they might be the result of an ambitious middleware.

We don’t need “independent experts” for that. Anyone who has ever configured VPNs in a high-speed environment can tell you how to kill the performance. The basics are always the same: make sure the dedicated silicon can’t handle the job, so the packets have to be passed to the CPU. Here are a few ideas:

  • Configure GRE over IPSec and make sure you don’t tweak the MTU on the GRE tunnel. This will result in IP fragmentation and the receiving router will have to process every fragment in process switching path. A sure killer for any box, not just the 6500/7600.
  • Make sure you configure features for which you have no hardware accelerator installed in the high-end boxes and watch the performance fall (at least) 100x.
  • Even if you’ve managed to install an accelerator, configure the network in a way that effectively disables the hardware. For example, configure multiple GRE tunnels terminating on the same loopback interface
  • Design your test so that all the traffic has to pass through a bottleneck. FWSM with its 3-5GBps throughput is an ideal candidate.

What these tests prove to me is that someone who doesn’t understand what he’s doing can destroy the performance of almost any device … but we don’t need independent tests to prove that. Am I missing something? Please let me know.

IS-IS over partially meshed Frame Relay

A member of NIL’s forums wanted to run IS-IS over a hub-and-spoke Frame Relay network without using subinterfaces. I hope the question is not related to a production network; running IS-IS over a generic partially-meshed multi-access WAN network is not a good idea.

More details are available in the CT3 wiki.

Stuffing the polls: the adventures of a convoluted mind

You might remember that the last polls I did using Blogger all resulted in every option having exactly the same number of votes. At that time, I've blamed Google ... and I have to apologize. It was obviously someone who has nothing better to do in his life. The log files I've collected indicate he's coming from Poland and I would appreciate if my Polish readers could help me persuade this troubled individual that he should spend his time doing something else (details in the rest of the post).

I've decided to use another polling service for the current set of polls, just to make sure it was not a Blogger problem. Polls went smoothly and displayed an expected spread of votes, but yesterday morning I've noticed that the number of votes for each option were getting more and more equal. Fortunately, the new polling service allows me to track votes by IP address, so I was quickly able to discover that someone using the IP address 88.220.105.18 was stuffing the ballot box. I've cleared the votes and hoped he'd realize he's been discovered and stop.

Well, this individual realized he's been discovered ... and moved over to a proxy server belonging to a system integrator (88.250.50.210; wawproxy.solidex.com.pl) and a private DSL connection (83.24.94.111; dnm111.neoplus.adsl.tpnet.pl). By this morning he submitted over 600 votes and now he moved back to 88.220.105.18 (that could be where he works, as the IP address is just one hop away from the POS interface of a Telenergo router).

As said above, anything you can do to help me would be much appreciated (I would prefer this over writing complaining e-mails to the postmaster and abuse aliases of the affected networks). Thanks!

Tailor the certification training to your needs

Let’s assume that you’re the manager of the internetworking team for a large enterprise network. You’ve just decided to migrate less-critical sites in your network from traditional (expensive) WAN offerings to IPSec running over the public Internet. Your internetworking architect has worked with the vendors to select the best technology and chose dynamic multipoint VPN (DMVPN) with a CA server running on a router. The proof-of-concept lab has been built and now you’re ready to order the new boxes and start the deployment. But there’s a major roadblock in this otherwise rosy scenario. Your engineers have to be trained on the new technology before the rollout; otherwise, you can expect interesting fallouts when the first problems inevitably start to appear.

An offer you should not refuse

Just stumbled across this: Amazon is offering the MPLS and VPN Architectures, MPLS and VPN Architectures, Volume II and Internet Routing Architectures (2nd Edition) (Networking Technology) (from Sam Halabi) for a total of $160.

Online sessions in December 2008: please vote!

The post describing my ideas about interactive online sessions resulted in a few comments and several off-line suggestions. Unfortunately most of the suggestions you’ve made in the comments are too generic. Remember, I was talking about 30-60 minute sessions and some suggestions would easily fill a week’s worth of training at the level of detail I’m aiming at. Running high-level introductory sessions is not my idea of fun; you could get as many of them as you want at Networkers.

Several suggestions are still “in the pipeline”: I have to envision how to structure them to make them manageable. In the meantime, the rest of the post lists the topics we can definitely cover. Please vote on them, the most popular one will be featured in December session.

Building a transit autonomous system with no BGP in the core

This idea came from the discussion in the CCIE Journey blog: how do I pass packets across a network that does not run BGP on every router (for example, from X1 to X2 in the following diagram). The solution in the CCIE Journey blog used GRE tunnels between edge routers, we’ll use MPLS.

Dynamic routing across a firewall

This topic started as a simple question: “How can I achieve dynamic failover to disaster recovery site if my security engineer refuses to configure dynamic routing on the firewall”. We’ll solve the problem in a simple network shown in the following diagram:

Reducing the size of the BGP table

Anyone who uses a hardware-based layer-3 switching device (which is almost any high-speed router these days) for a core router could be hit by this problem: as the number of routable prefixes in the Internet increases, you might run out of hardware lookup entries (TCAM, for example). How do you reduce the size of the IP routing table without losing too much flexibility? What are the drawbacks and the caveats?

BGP Autonomous System split

What happens if your BGP autonomous system splits in half due to a link failure? Can you patch it together? What are the caveats?

How should I cover ACE XML Gateway and Web Application Firewall?

I was delighted when I got access to Cisco's ACE XML Gateway/Web Application Firewall (WAF) box. This box is the perfect intersection of three fields I'm really interested in: networking, security and web programming, so I'll work with it quite a lot in the future and post interesting tips and tricks about its usage.

As this blog is currently focused exclusive on Cisco IOS, I'm wondering how to cover these new products. I won't create another blog; it simply doesn't make sense to build another blog from the ground up, but there are a few other options. Please help me select the best one by voting in the poll.

Annotate your router sessions

The November Technical Services News from Cisco included the Annotating Troubleshooting Sessions document from the Cisco’s support wiki. The document describes two well hidden features of Cisco IOS:

  • The send log exec-level command writes a line in the syslog, allowing you to delineate logging or debugging outputs.
  • The exclamation mark used as the first character in any IOS command line (not just in the configuration) serves as a comment. If you’re logging the TTY session, you can use these comments to document the session.

3 reasons why I would like to have DNS lookups in IOS access lists

When I chose the word “unfortunately” in my post describing how Cisco IOS performs DNS lookup when you enter a host name in an access list, I’ve triggered several responses that disagreed with my choice of words. Here’s why I still think IOS ACL could be improved with dynamic DNS lookup:

  • Things change. If you have to match a specific host in your ACL, there’s no guarantee that the host’s IP address will stay the same indefinitely. If the host is within your network and your ACL breaks because the host’s IP address was changed, it’s your problem (you should have kept better documentation and implemented proper change management procedures). When you have to use an external IP address (for example, the ISP’s SMTP gateway), you’ll notice it has changed when the phones start to ring.
  • Self-documentation. If the hostnames would remain in the ACLs and the router would perform a lookup as needed, the access lists would be self-documenting. When the hostnames get replaced by IP addresses, you have to perform reverse lookup manually to figure out what host the IP address is referring to.

You could use remark commands in access-lists to document what you’re doing. Although you can use multiple remark commands in the same ACL, they cannot be edited like the filtering lines in the ACL.

  • Reverse lookup problems. The IP address entered in the ACL does not necessarily translate back into the host name you’ve used. In some cases (hosted applications), the reverse lookup might give you a host name in a completely different domain, making your deciphering job even harder (assuming, of course, that your predecessor left no documentation behind).

There are, of course, numerous minor issues that would need to be addressed, for example:

  • Load balancing. Properly implemented DNS-based load balancers return numerous randomly mixed IP addresses as a response to the A query. The IOS could convert multiple returned addresses into a network object group automatically.
  • TTL issues. In most cases, the DNS zone files contain meaningful TTL values (the IP addresses stay valid for minutes or hours). Even if the router performed the DNS lookup for every packet (which would be total nonsense), it would usually get the same results on every query due to a cache somewhere in the chain between the router and the final DNS server. The DNS lookup thus only makes sense when the DNS A record expires.
  • Short TTL issues. Sometimes the responses returned by the DNS server contain very low TTL values (TTL might also be set to zero to disable caching). In these cases, IOS could provide a minimum TTL parameter and warn the operator when a hostname is used that results in a response with TTL below the threshold.

In any case, the saddest part of the story is that the IOS already supports the same functionality in a different part of the code: dynamic DNS lookups are used in zone-based firewall policies to identify masquerading applications like MSN and Yahoo messenger (see Chapter 5 of the Deploying Zone-Based Firewalls digital book).

Network Address Translation of DNS responses

I “always knew” that Cisco IOS supports NAT translations between local and global addresses in DNS replies … until I wanted to use this functionality in one of my sample configurations and discovered it doesn’t work as expected.

A few tests later, I discovered the true story: DNS requests and responses are translated if and only if you define IP-level NAT translations using either the ip nat inside source static or the ip nat inside source list pool configuration command. The translations should not use any additional filters (do not use the route-map keyword) and cannot result in PAT translations (do not use the overload keyword).

You can find more details in the “Network address translation of DNS responses” article in the CT3 wiki.

Using hostnames in IP access lists

When I was configuring the access list that should prevent spammers from misusing my workstations, I obviously had to figure out the IP address of the ISP’s SMTP server (access lists and object groups accept IP addresses). I almost started nslookup on my Linux workstation, but then decided to try entering a hostname in an IOS ACL … and it works. Unfortunately, IOS performs a DNS lookup when you enter the hostname (assuming you have configured the ip name-server) and stores the resulting IP address in the ACL definition:

rtr(config)#ip access-list extended InsideList
rtr(config-ext-nacl)#permit tcp any host smtp.example.com eq smtp
Translating "smtp.example.com"...domain server (192.168.0.1) [OK]
rtr(config-ext-nacl)#do show access-list InsideList
Extended IP access list InsideList
    10 permit tcp any host 192.168.2.3 eq smtp

You can enter hostnames in ACLs or network object groups. In both cases, the name is immediately translated into an IP address.

The best way to learn: solve a hard challenge

We’ve spotted some of our best engineers when they were in the final years of their undergraduate studies. To continue the trend, NIL offers a student-engagement program that attracts highly promising candidates each year. They offer them CCNA training (after which the students have to pass the exam), a few weeks of hands-on instructor-led introductory bootcamps and the first CCNP course. These training courses should give students a solid foundation and a framework that they can expand on their own—which is the point where it's time to stress-test them with advanced bootcamps.

MPLS QoS: Implementing the best model for guaranteed service

My MPLS QoS: Implementing the best model for guaranteed service article published by SearchTelecom gives you a high-level overview of the pipe and hose QoS models in the MPLS VPN environment. I’m also describing basic DiffServ QoS mechanisms available in an MPLS backbone.

If you’re new to IP QoS, you should start with the IP QoS: Two generations of class-of-service tools article.

Interactive online sessions: your input is highly appreciated

In mid-December, I’ll do my first IOS Hints Online Session. These sessions will be short (30-60 minutes), very interactive (I hope, but that’s your choice) and focused on an interesting design/deployment aspect. The description of the design/deployment challenge addressed by the session will be available well in advance at the time when you’ll be able to register.

Each session will start with a few diagrams explaining the proposed solution to the session’s topic and continue with hands-on explanation on actual devices. Each session will be limited to ~15 participants who will be able to actively participate, ask questions, propose alternative solutions or even discuss their actual issues (assuming they are somewhat related to the primary topic of the session).

I have a “few” ideas what could be covered in these sessions, but having a real-life challenge coming from the readers of my blog would be much better. If you have a good idea that could fit into this concept, please send me a short description before Friday, November 21st. I’ll collect the best ones, publish short descriptions in a blog post and you’ll prioritize which ones you’d like to see first.

ACL object groups

I always thought that there was no need to restrict outbound sessions across a firewall in low-security environments. My last encounter with malware has taught me otherwise; sometimes we need to protect the rest of the Internet from our clumsiness. OK, so I decided to install an inbound access-list on the inside interface of my SOHO router that will block all SMTP traffic not sent to a well-known SMTP server (and let the ISP’s SMTP server deal with relay issues).

This is the point where my laziness kicked in: if I want to add another SMTP server in the future, I wouldn’t like to hack my ACL. I might also need to enter the SMTP server addresses in multiple ACLs and it would be annoying if I would add the server in one ACL but forget all the other related ACLs (because, you know, we don’t really need documentation). Fortunately, IOS release 12.4(20)T provides just the tool I need: the ACL object groups. I can define a group of host addresses and use them as an object in my ACL:

object-group network SMTP_Server
 description ISP SMTP server
 host 192.168.0.2
 host 172.16.2.3
!
ip access-list extended Inside
 permit tcp any object-group SMTP_Server eq smtp
 deny   tcp any any eq smtp log
 permit ip any any
!
interface Vlan1
 ip access-group Inside in

IOS implements network and service object groups. Network object groups can include hosts, IP prefixes or ranges. Service object groups define TCP, UDP or ICMP services (including all ACL options like ranges of ports). You can also nest object groups and define new groups as unions of already defined groups.

Control plane protection overview

Control plane (the main CPU that runs the routing protocols and all other application-layer services) is the most vulnerable part of your router. A determined attacker can quickly overload the CPU of any router (or switch) with a targeted denial-of-service attack, either by sending IP packets that are propagated from the switching fabric (or interrupt code on software-only platforms) to the control plane processes or by targeting individual services running on the router (see, for example, the problems one of the readers had with public DNS server running on the router).

Cisco IOS offers several control plane protection mechanisms. I’ve summarized them in the “Protecting the router’s control plane” article in the CT3 wiki and Sebastian Majewski has provided sample router configuration.

Becoming a spammer: hands-on experience

Reading the stories of Windows workstations becoming members of a spam botnet becomes way less enjoyable when you’re faced with the same problem (one of my kids managed to install a Trojan). It took me a day to clean the infected computer (it would have been easier to just format it, but the repeated installation of the Windows XP + Office software is so boring), but I’ve learned a few interesting networking lessons in the process that I’ll document in the next days.

Book review: Cisco Secure Firewall Services Module

I was very anxious to get my copy of Cisco Secure Firewall Services Module (FWSM) from Cisco Press, as I’m a purely router-focused person, and I wanted to understand the capabilities of the Firewall Services Module (PIX/ASA-like blade for the Catalyst 6500 switching system with virtual firewall capability). I have a good background in IOS-based firewalls and network address translation (NAT), so the book was a perfect fit for me. However, if you’re looking for “best practices for securing networks with FWSM,” you’ve been misled by the subtitle.

Off-topic: disappointed by the antivirus industry

One of my kids managed to get infected with a particularly sneaky Facebook Trojan: a link from a friend (probably also infected) pointed to a web page with a video that required installation of a newer version of the Flash player … which was actually the first part of the Trojan. It quickly downloaded a few more components and made itself cozy deep within Windows XP.

Before you start telling me that kids would click anything … we had “a few” not so very pleasant discussion after previous infections and they know not to open anything or click on something that looks strange. Unfortunately the update-happy industry has conditioned them to constant prompts to upgrade one or another component and the request to upgrade the Flash player was obviously too legitimate-looking.

Of course the workstations have anti-virus software which served me very well in the past. It identified the malware and claimed it had been quarantined. WRONG. Repeated scans with the same software always found the malware and claimed it has been cleaned. WRONG. On-line scanner from the same vendor identified a different malware and “removed” it. WRONG.

The worst part of the experience was a total lack of in-depth information that I became used to in the past (for example, the names of the infected files) as well as the claim that this is a “low threat” malware (which is why I was not alerted when the infection happened … if the anti-virus software tells you you’ve got low-threat infection and it has been cleaned, you don’t start panicking).

The only anti-virus package that really helped me was coming from an unbelievable source: Microsoft. Its monthly anti-malware program correctly identified four different Trojan components and pointed me to Microsoft’s anti-virus online solution, which contained all the information I needed, including the list of infected files that it could not remove. A safe-mode reboot, manual cleanup and a few more scans solved the problem.

After this experience, I’m left wondering. In the past, people claimed you should use anti-virus software from an independent source, and now it looks those sources are worse than Microsoft. Should I really give up and go for a one-vendor solution? Or should I reformat all the workstations in house and move to Fedora :). What are your experiences?

Interesting links | 2008-11-08

As always, Jeremy Stretch posted several interesting articles: how to hijack HSRP, introduction to split horizon in distance vector routing protocols and (long needed) default redistribution metrics.

Petr Lapukhov started playing with HTTP URL regular expressions within NBAR and documented his findings. The most interesting is the last Q/A pair: can I use NBAR as a content filtering engine?

And last but definitely not least, if you’re worried what will happen to WPA2 now that WPA has been cracked, Robert Graham explains the fundamental differences between WPA and WPA2. Also, make sure you read the detailed explanation of the WPA flaw to understand its implications.

Bidirectional Forwarding Detection

BFD is one of those simple ingenious ideas that make you wonder “Why did it take them so long to figure this out?” It’s a UDP-based protocol that replaces dozens of link-level failure-detection mechanisms and routing protocol tweaks with a simple, focused solution: detect hop-by-hop layer-3 failures.

I wanted to write about BFD a year ago when it was first advertised as being available in the low-end routers (BFD support on high-end platforms is much better, but I simply don’t have a GSR and a CRS-1 at home … yet), but it failed to work, so I had to shelve the idea until the IOS release 12.4(15)T matured to a point where BFD on ISR started working in IOS, not just in Powerpoint.

In this month’s IP corner article, “Improve the Convergence of Mission-Critical Networks with Bidirectional Forwarding Detection (BFD)”, I’m describing BFD principles, its configuration on Cisco IOS and give you practical examples how you can use BFD to improve next-hop failure detection.

Using Quagga in BGP tests

Quagga is a terrifically useful tool when you need to build a BGP test lab. Not only can you quickly add an extra BGP router in your network; it also allows you to insert BGP routes with almost any attribute you want. I’ve described some of its features and included a sample Quagga-to-router connectivity scenario in the “Use Quagga to generate BGP routes” article published in the CT3 wiki.

Isn’t Quagga extinct?

Those readers that have been discussing technical issues with me probably know that I rarely write something without testing it first. Somehow I didn’t feel like powering up our spare CRS, so you might wonder how I’ve tested the interoperability between four-byte AS implementations and Cisco IOS. Fortunately, there’s open-source routing protocol software suite named Quagga (which is an extinct subspecies of zebra in the real world) that has already implemented the new BGP standards and allowed me to do all the tests with just a router and a Linux host.

To help you get started, I wrote an article in the CT3 wiki describing the Quagga installation and configuration process on Fedora Linux.

[email protected]: Quagga is also available as binary package (RPM) for Red Hat/CentOS/Fedora, Solaris, Debian and Gentoo, but you'll most probably get at least a year old version. Vitaliy Gladkevitch provided RPM installation instructions.

OSPF Challenge #2: Final results

I’ve received almost a dozen responses to the second OSPF challenge, most of them correct. The key to the solution is the way OSPF checks neighbor’s IP address on point-to-point links (we already know that the subnet mask is ignored):

  • If the interface is unnumbered, the router ignores the source IP address in the OSPF hello packets.
  • If there’s an IP address configured on the interface, the router checks that the neighbor’s IP address (the source IP address in the OSPF hello packets) belongs to the same subnet. If the source IP address is not in the same subnet, the OSPF hello packet is ignored.

R1 and R2 (the router configuration can be found in the challenge) would establish adjacency only if the source IP address of the packets sent by R1 would be in the same subnet as the IP address on R2. Since the serial interface on R1 is unnumbered, R1 would use the IP address of the loopback interface in the OSPF hello packets. IP address of the loopback interface on R1 thus has to be in the 10.1.2.0/29 subnet, giving you five choices (you cannot use the 10.1.2.0, 10.1.2.7 and 10.1.2.3).

However, as Yuri pointed out in his response, the routers do establish adjacency (so the challenge is solved) but do not build valid routing tables. The reason is the weird IP address used in the Link Data field of each unnumbered point-to-point link. According to RFC 2328, the router should use the MIB-II ifIndex as the IP address of an unnumbered interface. IOS performs subnet checks on SPF tree as well as on the OSPF hello packets and therefore R2 declares that R1 is not reachable. The following printout shows the R1’s router LSA as seen on R2:

R2#show ip ospf data router 10.1.2.4

            OSPF Router with ID (10.1.2.3) (Process ID 1)

                Router Link States (Area 1)

  Adv Router is not-reachable
  LS age: 758
  Options: (No TOS-capability, DC)
  LS Type: Router Links
  Link State ID: 10.1.2.4
  Advertising Router: 10.1.2.4
  LS Seq Number: 80000016
  Checksum: 0x296B
  Length: 48
  Number of Links: 2

    Link connected to: a Stub Network
     (Link ID) Network/subnet number: 10.1.2.4
     (Link Data) Network Mask: 255.255.255.255
      Number of TOS metrics: 0
       TOS 0 Metrics: 1

    Link connected to: another Router (point-to-point)
     (Link ID) Neighboring Router ID: 10.1.2.3
     (Link Data) Router Interface address: 0.0.0.6
      Number of TOS metrics: 0
       TOS 0 Metrics: 64

All readers that sent me a correct response received a small award from our Remote Labs team: free access to the OSPF default mysteries e-lesson which includes a recorded presentation and three remote lab exercises.

Will we run out of BGP AS numbers?

In the “internet meltdown” post I’ve described the main reason for the routing problems we’re experiencing in the Internet: everyone wants to be truly multihomed. All these end-customers obviously need their own AS number and it’s no wonder the experts predict we’ll run out of AS numbers in two to three years.

There’s no need to panic: the technical solution (four byte AS numbers) has been ready for several years … but it’s not implemented yet in majority of Cisco IOS-based platforms. Does that mean we’ll experience Internet-wide problems when the regional registries start allocating AS numbers larger than 65536 in a few months? Luckily, the answer is NO, the new BGP standards are completely backward-compatible … but if you’re a Service Provider, you have to start thinking about the upgrade path.

You can find more answers on this topic
in the article I wrote for SearchTelecom.

The list of all articles I wrote for SearchTelecom is available in the CT3 wiki.

Was it really only a century ago?

This post brought back some ancient memories … and I’m always amazed how far we’ve got in the last 30 years. For me, it all started with an IBM 360, having 48K (forty eight kilobytes) of core memory in which it ran an operating system and three user partitions. Fortran IV was the only programming language and card reader the only input device.

Moving to a VAX 11/780 was a major improvement; it was a multitasking environment with real terminals. VAX was an interesting beast: the first step in the boot process was to start an embedded PDP-11 processor that read an 8” floppy disc and uploaded the microcode to the main CPU. The only drawback was that 30 users had to share 2M (two megabytes) of main memory and so I couldn’t crash the machine whenever I wanted.

A few years later, I managed to get access to a really cute research PDP-11 running RSX-11M. Finally I could start writing device drivers and kernel code without risking the wrath of dozens of users years older than myself. And then the personal computers appeared and I probably made one the best choices I could – the BBC Micro from Acorn. It was never popular, but it had an amazingly well-designed operating system that you could extend in any way you wish (and even symbolic assembly language built into its BBC BASIC).