Redundant BGP Connectivity on a Single ISP Connection

A while ago Johannes Weber tweeted about an interesting challenge:

We want to advertise our AS and PI space over a single ISP connection. How would a setup look like with 2 Cisco routers, using them for hardware redundancy? Is this possible with only 1 neighboring to the ISP?

Hmm, so you have one cable and two router ports that you want to connect to that cable. There’s something wrong with this picture ;)

Joking aside, whenever faced with a challenge like this you have to ask yourself “what problem are we trying to solve?”. If the answer is “increase the availability of our AS” the next question should be “and what will be most likely to fail in this setup?” Hint: unless you’re buying your boxes on eBay, probably not the router’s power supply ;)

Whenever you’re trying to increase the reliability of a system, you’ll get the best results if you (A) increase the reliability of the weakest link or (B) add redundancy to the weakest link. Anything else is marginally better than rearranging deck chairs on Titanic.

Watch the great Reliability Theory webinar if you’re interested in a more formal approach to this topic.

Conclusion: We should get the second ISP connection… and all of a sudden have a very familiar and well-understood scenario of two links connected to two routers.

Assuming the second ISP connection is not available (or there’s no budget), is there another option? Of course, we could build a whole Rube Goldberg machine out of this… but keep in mind that one should never over-complicate redundant architectures, as the added complexity often kills the system before a failure does (see also: redundant supervisors and chassis switches).

In the scenario we’re discussing the simplest solution would be to physically move the cable to a (cold standby) router in case the primary router fails, and have the second router preconfigured with the same WAN IP address and the same BGP session.

If that’s not good enough, find a simple layer-1 device (like this automatic failover switch) with one uplink and active/standby downlinks so you don’t have to drive to the office and replug the cable into another port.

Being a networking engineer, it’s easy to get fancier than that:

  • Insert a layer-2 switch in front of the two routers;
  • Persuade your ISP to assign a /29 to the connecting subnet;
  • Connect both routers to the layer-2 switch;
  • Establish two BGP sessions with the ISP.

You might get it done if you’re really friendly with your ISP, otherwise I wish you luck persuading their support team to jump through the hoops for you.

Alternatively, you could bond the two routers into a single control plane (assuming your vendor supports something as stupid as that), connect the outside layer-2 switch to both boxes, configure a shared IP address on a VLAN interface… Honestly, don’t.

However, regardless of what you do, regardless of how much complexity you throw at the problem, you cannot get rid of the single point of failure without the second ISP connection… or as my father-in-law used to say: "no matter which way you turn, your a**e is always behind you".

2 comments:

  1. I usually think about these redundancy considerations in terms of *my* flexibility.

    Yes, we should have redundant circuits / ISPs; however, when I want to perform maintenance, patches, reboots, replacements, I want to have dual peering to each ISP. I don't want my downstream customers to churn and reroute to alternate paths just because I choose to maintain my networks.

    Keeping all my ISP paths available - in sickness and in health (even considering the initial pain of convincing your friendly provider to give you a /29) was worth it for me.
  2. Adding one more device introduces one more single point of failure, even worse than the original design.

    Getting a second link is much easier.
Add comment
Sidebar