Disable Flapping BGP Neighbors

It looks like we’re bound to experience a widespread BGP failure once every few months. They all follow the same pattern:

  • A “somewhat” undertested BGP implementation starts advertising paths with “unexpected” set of attributes.
  • A specific downstream BGP implementation (and it could be a different implementation every time) a few hops down the road hiccups and sends a BGP notification message to its upstream neighbor.
  • BGP session must be reset following a notification message; the routes advertised over it are lost and withdrawn, causing widespread ripples across the Internet.
  • The offending session is reestablished seconds later and the same set of routes is sent again, causing the same failure and a session reset. If the session stays up long enough, some of the newly received routes might get propagated and will flap again when the session is reset.
  • The cyclical behavior continues until a manual intervention.
read more see 8 comments

Do Not EVER Run OSPF or IS-IS With Your Internet Customers

Someone started an interesting discussion on the NANOG mailing list. He inherited a network that extended its internal OSPF to its multihomed customers and wondered whether he should leave the network, change OSPF to IS-IS, or deploy BGP. Here are a few thoughts from my reply.

Please remember that we were discussing running global OSPF with the customer routers. Running OSPF in a VRF is a different story, as the customer cannot impact another customer’s routing (they can only burn your CPU cycles).
read more see 6 comments

What Went Wrong: the Socket API

You might think that the lack of a decent session layer in the TCP/IP protocol suite is the main culprit for our reliance on IP multihoming and related explosion of the IP routing tables. Unfortunately, we have an even bigger problem: the Berkeley Socket API, which is around 40 years old and used in almost all TCP/IP software implementations and clients (including high-level scripting languages like PERL or Python).

read more see 18 comments

Turn a switch into a hub … the Microsoft way

If you’ve ever tried to get advanced Cisco certifications, you’ve probably encountered questions dealing with the mismatch between the end device ARP timeouts and the L2 switch CAM (MAC address cache) timeouts. If you’re still wondering what the underlying problem is (it took me a while to figure it out), read the Unicast Flooding in Switched Campus Networks document from Cisco.

In all scenarios, traffic sent to unknown unicast MAC address causes layer-2 flooding, which can significantly reduce switch performance. Microsoft took this problem to a completely new level with its Network Load Balancing implementation: Windows servers send ARP replies containing MAC address X from MAC address Y, causing all the traffic toward the servers to be flooded – effectively turning an Ethernet switch into a hub.

see 3 comments

Questions on BGP AS-Path Filters in Non-Transit Networks

I’ve sent a link to my Filter excessively prepended AS-paths article as an answer to a BGP route-map question to the NANOG mailing list and got several interesting questions from Dylan a few hours later. As they are pretty common, you might be interested in them as well.

In my environment, we are not doing full routes. We have partial routes from AS X and then fail to AS Y. Is their any advantage for someone like me to do this, as we are not providing any IP transit so we are not passing the route table to anyone else?

read more see 2 comments

What Went Wrong: TCP/IP Lacks a Session Layer

One of the biggest challenges facing the Internet core today is the explosion of the IP routing and forwarding tables, which is caused primarily by traffic engineering and multihoming requirements. Things were supposed to get better when IPv6 introduced strict hierarchical addressing (similar to the phone number addressing, where the first few digits always denote the country code).

Unfortunately, the hierarchical IPv6 addressing idea relied on incredible belief that the world will shape itself according to the wills of the IETF working group members. Not surprisingly, that didn’t happen and the hierarchical IPv6 addressing idea was quietly scrapped, giving us plenty more prefixes to play with when trying to pollute the global IPv6 routing tables.

read more see 10 comments

Broadband traffic management

The discussions following my “All-I-can-eat mentality” post have helped me get a much better understanding of the broadband access business issues. I’ve already shared some of them in a follow-up post. A few weeks later (just before leaving for my summer vacation) I’ve tried to provide as balanced perspective as I could manage in the “Broadband traffic management: Finding rational solutions to ease congestion” article I wrote for SearchTelecom.

Read the SearchTelecom article

add comment

Blackhat 2009 Router Exploitation presentation

My favorite yellow press outlet has decided to propagate hearsay instead of writing “original contributions” (but their mastery of creating sensationalistic titles remains unchallenged). This time, they claim that “New features embedded in Cisco IOS like VoIP and Web services can present an opportunity for hackers”.

The only supporting documentation they provide is a story in SearchSecurity with a sensationalistic title (New Cisco IOS bugs pose tempting targets, says Black Hat researcher) followed by two pages of confusion including gems like “… new deployments of Internet Protocol version 6 (IPv6) and VoIP installations may make router exploitation more vulnerable to remote attackers …” supported by “… IPv6 was considered a security threat due to the many net tunnels used to connect to IPv6 …”, which, as anyone who has some basic clue about IPv6 knows, has nothing to do with router vulnerabilities.

Fortunately, Blackhat is a serious undertaking (unlike conferences with grandiose titles like “Cyber Infrastructure Protection”) and provides its presentations online for anyone to see what the presenters were really discussing. I would strongly recommend that you check the excellent “Router Exploitation” presentation by Felix Lindner, which provides a very reasonable analysis of current situation: the routers are exploitable (no surprise there), but it’s very hard to do. Not surprisingly, SNMP is mentioned only once, IPv6 in passing and VoIP only twice (with a good recommendation: don’t run VoIP on your core routers).

add comment

Aggressive BGP Fall-Over Behavior

Soon after I wrote the Designing Fast Converging BGP Networks article (you’ll find it somewhere in this list, one of my regular readers sent me an interesting problem: BGP sessions would be lost in his (IS-IS based) core network if he would use fall-over on IBGP neighbors and the BGP router would have a primary and a backup path to the IBGP neighbor.

It turned out to be an interesting side effect of aggressive route table purge following a link failure: the route to BGP neighbor was removed from the routing table before IS-IS ran SPF and installed an alternate route, and BGP decided it’s time to give up and terminate the session.

For more details, read the What Exactly Happens after a Link Failure blog post.

see 2 comments

What is a CTunnel interface?

You might have noticed that your IOS release supports a ctunnel interface (hint: your image has to support CLNS) and wondered what it could do. Well, it’s a GRE tunnel between a pair of NSAPs, so you can transport IP traffic across your well-engineered CLNS network without ever exposing the core routers to the dangers of IP.

But wait, it gets better: starting with IOS releases 12.3(7)T and 12.2(33)SRA, you can transport IPv6 across the ctunnel interface. Unfortunately, they haven’t implemented MPLS over GRE over CLNS yet (the mpls ip command is present, but does not work).

It looks like there's at least one potentially very large-scale application that could use this feature.

see 4 comments

Netflix summary

Many thanks to those of you that responded with Netflix details (special thanks to Volcker for sending me the packet capture). Immediately after someone mentioned firewalls, I knew what the most sensible answer should be: to get across almost anything, use HTTP. No surprise, Netflix chose to use it. However, they’ve managed to deploy streaming video over TCP, which is not a trivial task. So, how did they do it?

read more see 2 comments

Summer schedule

I’m switching to the “traditional” summer schedule: until mid-August, I’ll post two to three shorter articles per week. I don’t want to spend too much of my vacation time writing, but I also don’t want to see you bored with dormant blogosphere.

Some of my projects will simply have to wait for the temperatures to drop, including a few selected Service Provider issues I’ve started writing about in the last weeks and the ADSL QoS topics (don’t worry, I haven’t forgotten them).

add comment

Wimax: the next disruptive technology?

Fifteen years ago, the focus of the “true” service provider was on voice traffic and data offerings based on virtual circuits, implemented with a plethora of semi-compatible technologies slowly developed within the ITU organization: X.25, ISDN, Frame Relay and the all-encompassing ATM.
In the meantime, some relatively small companies (including Cisco, Wellfleet and 3Com) were producing so-called “routers” that supported two technologies nobody took seriously: Ethernet and IP.

read more see 2 comments

Followup: VLAN interface status

Thanks to my readers, I often learn something completely new about the intricacies of Cisco IOS. The “VLAN Interface Status post resulted in a comment about the SVI autostate concept, which is (not surprisingly) a somewhat muddy topic:

  • In most cases, the SVI interface tracks the state of access and trunk ports using the VLAN. The details are well explained in the Understanding SVI Autostate section of the Cisco IOS documentation.

The important part of the SVI autostate calculation is the “port is in STP forwarding state for the VLAN” requirement. If a VLAN is not carried in a trunk port (for example, due to switchport trunk allowed configuration command), the trunk port’s status does not influence the autostate.

  • In some IOS releases, you can exclude the individual physical ports from the autostate calculation with the switchport autostate exclude interface configuration command. Most commonly you’d want to exclude uplink ports on access switches.
  • In some unspecified IOS releases (including 12.4T), you can use the (currently undocumented according to Command Lookup Tool) no autostate VLAN interface configuration command, which disables the autostate algorithm and makes the SVI interface permanently active.
see 5 comments
Sidebar