Is Layer-3 DCI Safe?

One of my readers sent me a great question:

I agree with you that L2 DCI is like driving without a seat belt. But is L3 DCI safer in case of DCI link failure? Let's say you have your own AS and PI addresses in use. Your AS spans multiple sites and there are external BGP peers on each site. What happens if the L3 DCI breaks? How will that impact your services?

Simple answer: while L3 DCI is orders of magnitude safer than L2 DCI, it will eventually fail, and you have to plan for that.

External IP Routing

If you advertise a single provider-independent (PI) prefix into the Internet from both data centers, and the DCI link fails, you’re no better off than a L2 DCI design – some of your traffic will arrive at the wrong location and will be dropped.

You should always advertise:

  • A data-center-specific prefix from each data center to ensure each data center can operate on its own;
  • An aggregate prefix from both data centers to ensure a data center isn’t totally isolated if its Internet uplink fails while the DCI link is still up.

If you can’t aggregate the data center prefixes, advertise the prefix belonging to the remote data center with higher MED, longer AS-path, or ISP-specific backup path BGP communities.

Too-Specific Prefixes

Sometimes you cannot advertise a more-specific prefix. You might have a single /24 IPv4 prefix or a single /48 IPv6 prefix. The IPv6 case is easily solved: go and ask for another prefix, it’s not like we’re running short on IPv6 prefixes. In the IPv4 case, you’re limited to kludges:

Advertise /25s to your ISP. Split the /24 prefix into two /25s. Advertise the /24 and one of the /25s from each data center. Your ISP has to be willing to accept the /25s (usually they are filtered), and you should advertise them with no-export BGP community so they’re not leaked into the wider Internet.

This trick works well if you use the same ISPs in both data centers. You can have connections to multiple ISPs in each data center, but each ISP has to be present in all data centers.

Build a backup DCI link over the Internet. An IP VPN tunnel across the Internet could provide last-resort connectivity between data centers if the DCI link fails.

If you decide to go down this route, don’t use the backup DCI link for regular inter-DC traffic (example: storage replication). Use it solely for emergency purposes like shunting the inbound traffic toward the target data center/server.

Don’t Forget the Applications

Perfect external routing design won’t help you if your application developers decided to spread a single service across multiple data centers. They (RFC 2119) MUST understand that each mission-critical service MUST be self-contained with no inter-DC dependencies.

A simple bad practice example: you’re offering IaaS cloud services implemented with vCloud Director, and you’re running a single vCloud Director instance in one of the data centers (yes, it is supported). If the DCI link fails, you’ll experience a control-plane failure like the one Amazon had in July 2012.

More Information

Numerous L2 and L3 DCI designs are described in my Data Center Interconnects webinar (also available as part of the yearly subscription). If you need a quick second opinion or a design review, check out my ExpertExpress service or other consulting options.

3 comments:

  1. Very good post and its something we are coming up against as we are trying to implement BGP across two DCs with on a single /24 prefix.

    ReplyDelete
  2. Are there any other sources/references for that statement?
    "They (RFC 2119) MUST understand that each mission-critical service MUST be self-contained with no inter-DC dependencies."

    I completely agree with this, but sometimes you need to convince people harder :-)

    ReplyDelete
    Replies
    1. It's sometimes hard to get quotable references for common sense ;) Looking at Amazon's July failure might help (link in the blog post).

      Delete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.