Don’t Run OSPF with Your Customers
Salman left an interesting comment on my Running BGP on Servers blog post:
My prior counterparts thought running OSPF on Mainframes was a good idea. Then we had a routing blackhole due to misconfiguration on the server. Twice! The main issue was the Mainframe admins lack of networking/OSPF knowledge.
Well, there’s a reason OSPF is called Interior Routing Protocol.
However, even some networking engineers didn’t get the memo. A long time ago, I encountered a service provider who ran OSPF with their customers, and all customers happily shared area 0 with the provider… until a customer accidentally managed to create an intra-area default route (don’t ask me how), which was preferred over provider’s external default route. And so, an early attempt at plug-and-pray networking (because it’s oh-so-much-easier to run OSPF with your customers than to configure static routes) failed miserably.
30K Foot View
Ignoring the technicalities, the main difference between OSPF (which I would never run on a host) and BGP (which I’d recommended in some cases) is the intended use:
- OSPF is an Interior Routing Protocol designed to exchange information within an autonomous system.
- BGP is an Exterior Routing Protocol with enough safeguards to be used between autonomous systems.
You might claim that the mainframe Salman mentioned belongs to the same autonomous system as the data center switches. However, even the early definitions of AS (going all the way back to RFC 1654) don’t talk about physical proximity:
The classic definition of an Autonomous System is a set of routers under a single technical administration…
Obviously, the mainframe team and the networking team weren’t a single technical administration.
Technical Differences
The intended use cases heavily influenced the design and behavior of OSPF (or IS-IS) and BGP:
- BGP uses a pretty conservative approach to information propagation: receive → filter → evaluate → filter → propagate best information.
- OSPF is focused on speed-of-convergence and uses a radically different approach: receive → flood everything → evaluate.
In other words, anyone who’s part of an OSPF domain can insert any stupidity they wish into the domain, and there’s nothing anyone else can do to stop the propagation of that stupidity within an area, and it stays in the area for at least half an hour. There are (as expected) vendor-specific kludges one can use between areas, but within area flooding rules (and external routes get flooded across area boundaries unless you use NSSA areas).
To Summarize
As I wrote 2.5 years ago: Don’t ever run OSPF with a third party, even if that third party happens to be your friendly server administrator. It’s not that you wouldn’t trust him, it’s just that you don’t need so many additional sources of semi-reliable information plugged straight into the heart of your network.
Finally, to learn more about running BGP between servers and ToR switches, watch the Leaf-and-Spine Fabric Designs webinar.
Let's assume there is a mainframe area, core area (OSPF) and the WAN edge (BGP). Would you recommend BGP in the MF area? I would prefer OSPF stub or NSSA areas + floating static routes. A design would be easier comparing to the BGP solution as a double BGP-OSPF redistribution doesn't look easy to maintain. Of course someone could say "migrate all areas to BGP!". This is an option in some places. Not everywhere.
Now I really started wondering what your story (and the design you're trying to justify) is.
I wrote about "generating a default route into NSSA area". You understood it meant "connecting to Internet", which might or might not be accurate. I never ever suggested _running OSPF routing protocol_ with a non-trusted entity.
Also, "importing AS external routes" is definitely not equal to "establishing OSPF adjacency with an external router".
As for the second part of your question, as always the answer is "it depends", and I could easily justify at least three different options. Anyway, I try not giving out generic recipes because they are so often misapplied.
This might be in the realm of stupidly hypothetical but what if the network team was able to control the host side networking and the server/systems team managed the rest with all the relevant permissions and isolation? Lets say you have a Docker host running Calico or maybe Contrail where you have a Vrouter shim controlling all traffic in and out of the host. Obviously both Calico and Contrail wisely use BGP but like I said hypothetical the networking team can control the host routing wouldn't then this qoute apply?
"The classic definition of an Autonomous System is a set of routers under a single technical administration"
In short, don't do it.
I admit this is getting to into the weeds and fiddling with nerd knobs just to do something different. Would there be *any* possible benefit to a solution like this over say BGP and expection routing a la Lapukhov?
Concerning "OSPF with customers", that may be acceptable for the provider (but should not be for the client) if each client has its own OSPF instance, strictly disjoint from the provider's IGP. Not that it's something that a customer should ask from their provider, but some of them do this anyways...
Some (but not all) of these things can be controlled in MPLS/VPN scenario. Fewer tools are usually available in the global routing table.
Once I was talking to a small cloud-whatever company that installed cheap d-link routers at the customers, ran some VPN and ran RIP over it. I naturally asked "why the hell are you using RIP?" Turns out the routers don't know anything else apart from OSPF, and the only goal of dynamic routing was to simplify configuration - the customer gets a default (or a couple of prefixes for some nets?), the hub - a prefix to the customer. RIP is easy to configure. It can be filtered by prefixes at any spot just like BGP. It's supported anywhere. In their case, convergence was not an issue, there was no redundancy, a minute or two of up/down delays didn't matter.
It seems to me like it's a viable option in a point-to-multipoint design if the spokes are unaware of BGP and are controlled by someone else. OSPF would have been more difficult to implement and it would provide no benefits to them.