Don’t Run OSPF with Your Customers

Monday, March 14, 2016 10:05 +0100

Don’t Run OSPF with Your Customers

Salman left an interesting comment on my Running BGP on Servers blog post:

My prior counterparts thought running OSPF on Mainframes was a good idea. Then we had a routing blackhole due to misconfiguration on the server. Twice! The main issue was the Mainframe admins lack of networking/OSPF knowledge.

Well, there’s a reason OSPF is called Interior Routing Protocol.

Honestly, mainframe administrators have no other options: IBM, in their infinite wisdom, implemented only RIP and OSPF, and OSPF seems to be the lesser evil.

However, even some networking engineers didn’t get the memo. A long time ago, I encountered a service provider who ran OSPF with their customers, and all customers happily shared area 0 with the provider… until a customer accidentally managed to create an intra-area default route (don’t ask me how), which was preferred over provider’s external default route. And so, an early attempt at plug-and-pray networking (because it’s oh-so-much-easier to run OSPF with your customers than to configure static routes) failed miserably.

30K Foot View

Ignoring the technicalities, the main difference between OSPF (which I would never run on a host) and BGP (which I’d recommended in some cases) is the intended use:

OSPF is an Interior Routing Protocol designed to exchange information within an autonomous system.
BGP is an Exterior Routing Protocol with enough safeguards to be used between autonomous systems.

You might claim that the mainframe Salman mentioned belongs to the same autonomous system as the data center switches. However, even the early definitions of AS (going all the way back to RFC 1654) don’t talk about physical proximity:

The classic definition of an Autonomous System is a set of routers under a single technical administration…

Obviously, the mainframe team and the networking team weren’t a single technical administration.

Technical Differences

The intended use cases heavily influenced the design and behavior of OSPF (or IS-IS) and BGP:

BGP uses a pretty conservative approach to information propagation: receive → filter → evaluate → filter → propagate best information.
OSPF is focused on speed-of-convergence and uses a radically different approach: receive → flood everything → evaluate.

In other words, anyone who’s part of an OSPF domain can insert any stupidity they wish into the domain, and there’s nothing anyone else can do to stop the propagation of that stupidity within an area, and it stays in the area for at least half an hour. There are (as expected) vendor-specific kludges one can use between areas, but within area flooding rules (and external routes get flooded across area boundaries unless you use NSSA areas).

To Summarize

As I wrote 2.5 years ago: Don’t ever run OSPF with a third party, even if that third party happens to be your friendly server administrator. It’s not that you wouldn’t trust him, it’s just that you don’t need so many additional sources of semi-reliable information plugged straight into the heart of your network.

Finally, to learn more about running BGP between servers and ToR switches, watch the Leaf-and-Spine Fabric Designs webinar.

Latest blog posts in BGP in Data Center Fabrics series

Recent posts in the same categories

OSPF

data center

BGP

15 comments:

Anonymous 14 March 2016 11:20

Design books states that partner's network should be connected as an NSSA area or at least some kind of a stub area. So the main issue is not the Mainframe admins lack of networking/OSPF knowledge but the network team. ;)

Replies

Ivan Pepelnjak 14 March 2016 18:14

I would love to see which design book recommends using OSPF to connect to a partner's network. Thank you!

Anonymous 14 March 2016 19:33

Ivan, every book which describes NSSA states about importing AS external routes. You wrote about connecting Internet via NSSA: http://wiki.nil.com/External_default_route_in_NSSA_area. C'mon a partner network is more trustful than this.

Let's assume there is a mainframe area, core area (OSPF) and the WAN edge (BGP). Would you recommend BGP in the MF area? I would prefer OSPF stub or NSSA areas + floating static routes. A design would be easier comparing to the BGP solution as a double BGP-OSPF redistribution doesn't look easy to maintain. Of course someone could say "migrate all areas to BGP!". This is an option in some places. Not everywhere.

Ivan Pepelnjak 15 March 2016 12:05

Dear $Anonymous,

Now I really started wondering what your story (and the design you're trying to justify) is.

I wrote about "generating a default route into NSSA area". You understood it meant "connecting to Internet", which might or might not be accurate. I never ever suggested _running OSPF routing protocol_ with a non-trusted entity.

Also, "importing AS external routes" is definitely not equal to "establishing OSPF adjacency with an external router".

As for the second part of your question, as always the answer is "it depends", and I could easily justify at least three different options. Anyway, I try not giving out generic recipes because they are so often misapplied.

Salman Naqvi 14 March 2016 13:30

Salman here! Lesson learned indeed :) - Either run BGP or just use plain old static routing with 'servers' where even a big system like the Mainframe is essentially just a special type of 'server'. (Unfortunately, NSSA is not an option AFAIK on Mainframe).

Matt 14 March 2016 17:41

Ivan,

This might be in the realm of stupidly hypothetical but what if the network team was able to control the host side networking and the server/systems team managed the rest with all the relevant permissions and isolation? Lets say you have a Docker host running Calico or maybe Contrail where you have a Vrouter shim controlling all traffic in and out of the host. Obviously both Calico and Contrail wisely use BGP but like I said hypothetical the networking team can control the host routing wouldn't then this qoute apply?

"The classic definition of an Autonomous System is a set of routers under a single technical administration"

Replies

Ivan Pepelnjak 14 March 2016 18:17

Because of the way OSPF floods and evaluates information, it's pretty hard to stop a multi-homed host from becoming a transit node after a configuration error, unless you put every host in a separate area, which then generates other interesting challenges.

In short, don't do it.

Matt 14 March 2016 19:14

Lets say no manual configuration on the hosts. The OSPF configuration is basic and templated using unnumbered links with Vrouter loopback being assigned from IPAM as well as Docker networks host being assigned for IPAM as to avoid duplicate addressing. Lets take this further and say you are advertising docker host routes from the host to the TOR and then doing summarization at the TOR.Then using a centralized controller and fibbing for traffic engineering.

I admit this is getting to into the weeds and fiddling with nerd knobs just to do something different. Would there be *any* possible benefit to a solution like this over say BGP and expection routing a la Lapukhov?

Ivan Pepelnjak 15 March 2016 12:07

The only benefit I could see is getting slightly cheaper switches (at significantly increased complexity) because your favorite $vendor wants to charge for BGP, in which case switch to a different vendor (and if low price is the primary concern, go with Cumulus + Dell or something along these lines)

R.-Adrian F. 15 March 2016 22:20

Terminology-wise, you could even pretend that BGP is NOT a routing protocol in the proper sense, but a "route announcement/presence protocol".

Concerning "OSPF with customers", that may be acceptable for the provider (but should not be for the client) if each client has its own OSPF instance, strictly disjoint from the provider's IGP. Not that it's something that a customer should ask from their provider, but some of them do this anyways...

Replies

Ivan Pepelnjak 16 March 2016 16:43

"...may be acceptable for the provider..." - until the customer brings down a PE router control plane with OSPF floods or burns its memory when they decide redistributing from full Internet feed into OSPF is cool, or overflows the forwarding table.

Some (but not all) of these things can be controlled in MPLS/VPN scenario. Fewer tools are usually available in the global routing table.

R.-Adrian F. 16 March 2016 22:04

You are correct. However, a service provider should limit the use the "main table" as much as possible (ideally for MPLS core only). But I know, reality proves otherwise...

Anonymous 16 March 2016 17:17

> Honestly, mainframe administrators have no other options: IBM in their infinite wisdom implemented only RIP and OSPF, and OSPF seems to be the lesser evil.

Once I was talking to a small cloud-whatever company that installed cheap d-link routers at the customers, ran some VPN and ran RIP over it. I naturally asked "why the hell are you using RIP?" Turns out the routers don't know anything else apart from OSPF, and the only goal of dynamic routing was to simplify configuration - the customer gets a default (or a couple of prefixes for some nets?), the hub - a prefix to the customer. RIP is easy to configure. It can be filtered by prefixes at any spot just like BGP. It's supported anywhere. In their case, convergence was not an issue, there was no redundancy, a minute or two of up/down delays didn't matter.

It seems to me like it's a viable option in a point-to-multipoint design if the spokes are unaware of BGP and are controlled by someone else. OSPF would have been more difficult to implement and it would provide no benefits to them.

Replies

R.-Adrian F. 16 March 2016 22:13

RIP is a viable solution as "route advertisment/presence protocol". Even when BGP is supported, learning client routes via RIP may be preferred in some cases : support from low-end vendors, less configuration needed (imagine configuring hundreds of BGP sessions). I occasionally use it for this purpose, however, in those cases customer does NOT have configuration access to the CPE - so it's not really "with the customer".

Ivan Pepelnjak 17 March 2016 08:31

Agreed. There were cases where I told the customer "just use RIP" (and of course they were totally surprised at such a low-tech solution).

Add comment