Your browser failed to load CSS style sheets. Your browser or web proxy might not support elliptic-curve TLS

Building network automation solutions

6 week online course

reserve a seat

What went wrong: SCTP

Someone really wants to hear my opinion on SCTP (RFC 4960); he’s added a “what about SCTP” comment to several Internet-related posts I wrote in the last weeks. So, here are my totally unqualified (I have no hands-on experience) thoughts about SCTP.

Let me reiterate: I’m taking a 30,000-foot perspective here and whatever I’m writing could be completely wrong. If that’s the case, please point out my mistakes in your comments.

From the distance, the protocol looks promising. It provides datagram (unreliable messages), reliable message (record) and stream transport. Even more important, each connection can run across multiple IP addresses on each endpoint, providing native support for scalable IP multihoming (where each multihomed host resides in multiple PA address blocks from various Service Providers).

Second-hand evidence points to the viability of SCTP: it’s used in complex real-life signaling applications (SS7-over-IP), it’s implemented in Cisco IOS and IOS uses it for a variety of its transport needs (including high-availability solutions and reliable export of Netflow data).

However, SCTP will not solve the current IP multihoming issues (unless we’ll experience a world-wide Internet crash first). Here are just a few non-technical reasons why (if you have links to more in-depth information, please add them in the comments):

  • It was designed by the wrong working group. SCTP was a byproduct of the SIGTRAN working group which was focused on transport of PSTN signaling over IP networks.
  • It was never properly promoted. The SIGTRAN working group solved their problem and moved on.
  • It’s not shipped with Windows, which is a major showstopper as most clients that would benefit from SCTP’s IP multihoming support run on Windows.
  • Although it’s bundled in recent Linux kernels, the support files are not included in out-of-the box Linux distributions. To get it on my Fedora 11 distribution, I had to install lksctp-tools.
  • You can get libraries, source code, kernel patches and even commercial implementations of SCTP for most operating systems, but in most cases you have to do some installation and integration work. This is a great option if you want to play with the protocol, but not if you want to deploy generic applications over it.
  • Since it’s rarely used, there’s no support for it in the networking equipment. You can’t even match SCTP by name with extended access lists in Cisco IOS (you have to use its numeric protocol number). Obviously you cannot perform matches on SCTP port numbers. Passing SCTP across a firewall is a nightmare, as there’s no stateful inspection of SCTP traffic.

However, by far the biggest showstopper to SCTP adoption is the lack of session layer in TCP/IP and the broken Socket API. If you want to use SCTP with the Socket API, you have to indicate the protocol to use in the socket call, which means that every application that would benefit from SCTP support must be changed, recompiled and tested. There is no way that you could take existing applications, add SCTP support in the operating system and have a better-performing Internet as the result.

see 15 comments

Disable flapping BGP neighbors

It looks like we’re bound to experience a widespread BGP failure once every few months. They all follow the same pattern:

  • A “somewhat” undertested BGP implementation starts advertising paths with “unexpected” set of attributes.
  • A specific downstream BGP implementation (and it could be a different implementation every time) a few hops down the road hiccups and sends a BGP notification message to its upstream neighbor.
  • BGP session must be reset following a notification message; the routes advertised over it are lost and withdrawn, causing widespread ripples across the Internet.
  • The offending session is reestablished seconds later and the same set of routes is sent again, causing the same failure and a session reset. If the session stays up long enough (because the routers have to exchange approximately 300,000 routes), some of the newly received routes might get propagated and will flap again when the session is reset.
  • The cyclical behavior continues until a manual intervention.

I don’t understand why Cisco IOS allows the cyclical BGP session behavior (as it’s pretty obvious things will not get better by themselves once the same session is reset a few times), but there’s at least something we can do with Embedded Event Manager: shut down the offending neighbor after seeing the BGP-3-NOTIFICATION syslog message often enough in a short period of time.

You’ll find the EEM applet that shuts down a flapping BGP neighbor in the CT3 wiki.

see 8 comments

Do not EVER run OSPF or IS-IS with your Internet customers

Clue started an interesting discussion on the NANOG mailing list. He’s inherited a network that extended its internal OSPF to its multihomed customers and wondered whether he should leave the network as it is, change OSPF to IS-IS or deploy BGP. Here are a few thoughts from my reply.

Please remember that we were discussing running global OSPF with the customer routers. Running OSPF in a VRF is a different story, as the customer cannot impact another customer’s routing (they can only burn your CPU cycles).

Do not ever run an SPF routing protocol (OSPF or IS-IS) with your customer. They can insert anything they want into it, be it due to configuration mistake, malicious intent or third-party hijacking, and your whole network (or at least the other customers) will be affected.

Just to give you a few examples:

  • They could hijack the host route to your DNS server and spoof every other customer of yours that uses your DNS (I haven’t seen this one yet, but it’s feasible).
  • They could hijack the host route to your POP3 server and collect the usernames and passwords of your residential users (I’ve seen this in a production network, but the attack vector was not OSPF but another routing protocol).
  • Company A could hijack the host route to the web server of company B.
  • They could insert a better default route than you do and at least some of your routers will listen to them (I’ve seen this done with OSPF).
  • If they ever make a total mess and start flapping their LSAs, your whole network will be affected and all your routers will burn CPU running SPF algorithm.

If you absolutely insist on not using BGP (but then BGP is the only currently available routing protocol designed to handle routing in scenarios where the two parties don't necessarily trust each other), use RIP. It's safer than OSPF; at least you can filter the incoming updates.

I’ve also seen a Service Provider running RIP with their customer … but they were not using any filters when redistributing RIP routes into their IGP.

Numerous other respondents shared my feelings and the best summary was provided by Steve Bertrand: “if in the same sentence you read ‘my network’ and ‘customer network’, use BGP.”

see 6 comments

What went wrong: the Socket API

You might think that the lack of a decent session layer in the TCP/IP protocol suite is the main culprit for our reliance on IP multihoming and related explosion of the IP routing tables. Unfortunately, we have an even bigger problem: the Berkeley Socket API, which is over 25 years old and used in almost all TCP/IP software implementations (including the high-level scripting languages like PERL).

read more see 18 comments

Turn a switch into a hub … the Microsoft way

If you’ve ever tried to get advanced Cisco certifications, you’ve probably encountered questions dealing with the mismatch between the end device ARP timeouts and the L2 switch CAM (MAC address cache) timeouts. If you’re still wondering what the underlying problem is (it took me a while to figure it out), read the Unicast Flooding in Switched Campus Networks document from Cisco.

In all scenarios, traffic sent to unknown unicast MAC address causes layer-2 flooding, which can significantly reduce switch performance. Microsoft took this problem to a completely new level with its Network Load Balancing implementation: Windows servers send ARP replies containing MAC address X from MAC address Y, causing all the traffic toward the servers to be flooded – effectively turning an Ethernet switch into a hub.

see 3 comments

Regular expressions in EEM applets

EEM 3.0 (available in IOS release 12.4(22)T) includes support for variables, programming logic and regular expressions. Not surprisingly, the IOS documentation of these exciting features is a bit sparse, so I’ve decided to write a step-by-step EEM regular expression tutorial.

Read the Regular expressions in Embedded Event Manager applets article in the CT3 wiki

Add comment

Questions on BGP AS-path filters in non-transit networks

I’ve sent a link to my “Filter excessively prepended AS-paths” wiki article as an answer to a BGP route-map question to the NANOG mailing list and got several interesting questions from Dylan a few hours later. As they are pretty common, you might be interested in them as well.

In my environment, we are not doing full routes. We have partial routes from AS X and then fail to AS Y. Is their any advantage for someone like me to do this, as we are not providing any IP transit so we are not passing the route table to anyone else?

You could use inbound AS-path filters to protect your BGP table, but as you're not a transit AS, it does not make much sense. You might also use them to reduce the size of your BGP table, but obviously you’ve already taken care of that.

When I run the "sh ip bgp quote-regexp "_([0-9]+)_\1_\1_\1_\1_ \1_" | begin Network" I am seeing many paths that would be filtered by this access-list.

Obviously a lot of people are prepend-happy.

As I’ve stated in the original article, if prepending your AS number a few times doesn’t help, nothing will.

What happens to those networks, are they unreachable from my AS, or do I just route those networks to my upstream provider and let them deal with it?

If I understood correctly, you're using a default route toward AS Y, which means that anything that is not in your BGP table (and consequently your IP routing table) will be sent out of your AS via the default route. If you're getting the paths you're filtering from AS X that means that more traffic will go to AS Y.

This last question is a little off-topic, but relates to your access list. In the event of some kind of DOS attack coming from one of a few AS numbers (let's assume it's AS Z), what is the feasibility of using

ip as-path access-list 100 deny _([0-9]+)_\1_\1_\1_\1_
ip as-path access-list 100 deny Z
ip as-path access-list 100 permit .*

Would this have any affect at all, or would my pipe from my upstream still be congested with garbage traffic?

No. You cannot influence the inbound traffic apart from not advertising some of your prefixes to some of your neighbors or giving them hints with BGP communities or AS-path prepending. Whatever you do with BGP on your routers influences only the paths the outbound traffic is taking. What you'd actually need is remote-triggered black hole. Frank Bulk provided a number of links in this message.

see 2 comments

What went wrong: TCP/IP lacks a session layer

One of the biggest challenges facing the Internet core today is the explosion of the IP routing and forwarding tables, which is caused primarily by traffic engineering and multihoming requirements. Things were supposed to get better when IPv6 introduced strict hierarchical addressing (similar to the phone number addressing, where the first few digits always denote the country code). Unfortunately, the hierarchical IPv6 addressing idea relied on incredible belief that the world will shape itself according to the wills of the IETF workgroup members. Not surprisingly, that didn’t happen and the hierarchical IPv6 addressing idea was quietly scrapped, giving us plenty more prefixes to play with when trying to pollute the global IPv6 routing tables.

If we take a step back and ask ourselves: “why do we need IP multihoming in the first place?” the answer is simple: because the client application (layer-7 entity) connects to the IP address (layer-3 entity) of the server. If we want to have persistent sessions, the server IP address must remain reachable – the headaches are caused by tight coupling across numerous protocol layers.

The session persistence, session restarts, checkpoints and the ability to connect to multiple L3 addresses to reach the same service are the jobs of the OSI session layer … only there is none in the TCP/IP protocol stack. Twenty years ago the IP zealots were quick to point out that “if you know what you’re doing, four layers are enough, if you don’t, even seven layers won’t help you”. Guess who’s laughing now (although the laugh is somewhat bitter and utterly irrelevant).

The worst part of the story is that the IETF community had the chance to fix their design problems when they realized that IPv4 will have to be replaced by something else. Unfortunately, the only thing they could agree upon was to replace IPv4 addresses with longer IPv6 addresses while retaining the rest of the (then) 15-year old design. Some people realized years later that the whole approach is broken and tried to fix it with SHIM6, but it was too little done way too late. The SHIM6 protocols are still in drafts, there are no commercial implementations and we’re already running out of IPv4 addresses.

see 10 comments

IOS access list numbering scheme

Shane sent me a really interesting question: he was wondering why there's such a huge gap between the (numbered) extended IP ACL (100 – 199) and the extended range of standard IP ACL (1300 – 1999).

Some of you might be old enough to know that Cisco IOS supports (or used to support) around 10 different layer-3 protocols (IP being the most popular these days) and each one of them (if it was added to IOS early enough when the parser was still somewhat immature) required its own range of numbered ACL. I’ve summarized all of them in the “IOS Access List numbering scheme” article in the CT3 wiki.

Continue reading the article …

see 3 comments

Broadband traffic management

The discussions following my “All-I-can-eat mentality” post have helped me get a much better understanding of the broadband access business issues. I’ve already shared some of them in a follow-up post. A few weeks later (just before leaving for my summer vacation) I’ve tried to provide as balanced perspective as I could manage in the “Broadband traffic management: Finding rational solutions to ease congestion” article I wrote for SearchTelecom.

Read the SearchTelecom article

Add comment

IOS Hints comments work with Facebook Connect

Migration to JS-Kit comments had numerous additional positive side effects (the first one was that the commenting system finally started to work, resulting in a significant increase in the number of comments). Among other goodies, JS-Kit supports Facebook Connect: you can post comments on my blog using your Facebook account (the initial setup could fail occasionally, but it looks more like a Facebook failure). The comments made from a Facebook account are automatically inserted in your Facebook newsfeed (if you so desire).

see 1 comments

Blackhat 2009 Router Exploitation presentation

My favorite yellow press outlet has decided to propagate hearsay instead of writing “original contributions” (but their mastery of creating sensationalistic titles remains unchallenged). This time, they claim that “New features embedded in Cisco IOS like VoIP and Web services can present an opportunity for hackers”.

The only supporting documentation they provide is a story in SearchSecurity with a sensationalistic title (New Cisco IOS bugs pose tempting targets, says Black Hat researcher) followed by two pages of confusion including gems like “… new deployments of Internet Protocol version 6 (IPv6) and VoIP installations may make router exploitation more vulnerable to remote attackers …” supported by “… IPv6 was considered a security threat due to the many net tunnels used to connect to IPv6 …”, which, as anyone who has some basic clue about IPv6 knows, has nothing to do with router vulnerabilities.

Fortunately, Blackhat is a serious undertaking (unlike conferences with grandiose titles like “Cyber Infrastructure Protection”) and provides its presentations online for anyone to see what the presenters were really discussing. I would strongly recommend that you check the excellent “Router Exploitation” presentation by Felix Lindner, which provides a very reasonable analysis of current situation: the routers are exploitable (no surprise there), but it’s very hard to do. Not surprisingly, SNMP is mentioned only once, IPv6 in passing and VoIP only twice (with a good recommendation: don’t run VoIP on your core routers).

Add comment

Router LSA quiz: the results

The Friday’s OSPF quiz has generated numerous answers … unfortunately many of them incorrect. Some readers (probably those that recently attended a Cisco certification exam) thought I was asking a trick question, as I’ve forgotten to include the IP addresses in the sample configuration, which only proves how hard it is to write good bulletproof questions.

Those that assumed the IP addresses would have to be configured on the interfaces made two common errors:

  • Some assumed a type-2 LSA would be generated for the LAN interface. Wrong: type-2 LSA is generated only if needed (there is more than one router attached to the LAN interface).
  • Others thought the router would generate a type-1 LSA per interface. Wrong: an OSPF router generates only a single type-1 LSA per area.

To clarify these issues, I wrote an article in the CT3 wiki documenting how the type-1 (router) LSA describes various interface types and inter-router links.

Read the Type-1 (Router) LSA article in the CT3 wiki

see 5 comments