IP Renumbering in Disaster Avoidance Data Center Designs

It’s hard for me to admit, but there just might be a corner use case for split subnets and inter-DC bridging: even if you move a cold VM between data centers in a controlled disaster avoidance process (moving live VMs rarely makes sense), you might not be able to change its IP address due to hard-coded IP addresses, be it in application code or configuration files.

Disaster recovery is a different beast: if you’ve lost the primary DC, it doesn’t hurt if you instantiate the same subnet in the backup DC.

However, before jumping headfirst into a misty pool filled with unicorn tears (actually, a brittle solution with too many moving parts that usually makes no sense), let’s see if there are alternatives. Here are some ideas in exponentially decreasing order of preference:

Ever heard of DNS? If the application uses hardcoded addresses in its clients or between servers, there’s not much you can do, but one would expect truly hardcoded addresses only in home-brewed craplications … and masterpieces created by those “programming gurus” that never realized hostnames should be used in configuration files instead of IP addresses.

If your application is somewhat well-behaved, there are all sorts of dynamic DNS solutions that you can use to automatically associate server’s new IP address with its DNS FQDN. Windows clusters do that automatically, many DHCP servers automatically create dynamic DNS entries after client address allocation, and there are numerous Linux clients that you can use even with static IP addresses.

Use low TTL values if you’re changing DNS records, or the clients won’t be able to connect to the migrated servers due to stale local DNS caches.

Host routes? For whatever reason some people think host routes are worse than long-distance bridging. They’re not – if nothing else, you have all the forwarding information in one place … and modern L3 switches use hosts routes for directly connected IP hosts anyway.

Automatic network-side configuration of host routes is mission impossible. Local Area Mobility (LAM) worked years ago, but was not supported in data center switches until Cumulus Networks reinvented it with Redistribute ARP.

As of late 2020, most data center switching vendors support some variant of routing on IP host addresses in their EVPN implementations. For more details, watch EVPN Technical Deep Dive webinar.

… and the only other mechanism I could think of that wouldn’t involve loads of homemade scripting glue is OpenFlow-driven RIB modification supposedly working on Juniper MX routers.

Routing protocols? Routing protocols running on servers were pretty popular years ago (that’s how IBM implemented IP multipathing on mainframes). Instead of configuring hardcoded IP address on server’s LAN interface, configure it on a loopback, and run BGP between servers and adjacent ToR switches… and whatever you do, please don’t use OSPF. Some IBM mainframes were a single link failure away from becoming the core data center router.

Yeah, I know, a stupid solution like this requires actual changes to server configurations … and it’s so much easier to pretend the problem doesn’t exist and claim that the network should support whatever we throw at it ;)

Route Health Injection on load balancers? Same idea as server-side routing protocols, but implemented in front of the whole application infrastructure.

Assuming your application sits behind a load balancer and you’re doing a cold migration of all application components in one step, you can preconfigure all the required IP subnets in the disaster recovery site (after all, they’re hidden behind a load balancer) and rely on the load balancer to insert the publicly visible route to the application’s public IP address once everything is ready to go.

The universal duct tape – NAT. If the clients use DNS to connect to the servers, but the servers have to use fixed IP addresses, use NAT to hide server subnets behind different public IP addresses (one per site).

Obviously you have to move the whole application infrastructure at once if you want to use this approach or things will break really badly.

Apart from the usual NAT-is-bad and NAT-breaks-things mantras, there are a few additional drawbacks:

  • Clients have to rely on changed DNS records as you cannot insert a host route into the outside network like a load balancer can with RHI.
  • NAT devices usually don’t support dynamic DNS registration, so you have to change the DNS entries “manually”.

Virtual appliances? Duncan Epping proposed using vShield Edge as a NAT device.

While that idea didn’t sound so great when I wrote the original blog post, that’s how most overlay virtual networks implement overlay-to-physical gateways, and at least VMware decided to use BGP as the only routing protocol in this scenario.

Anything else? I’m positive I’ve missed an elegant idea or two. Your comments are most welcome … including those telling me why the ideas mentioned above would never be implementable.

Revision History

2020-17-11
Cleaned up the blog post, removed references to LISP VM mobility, and inserted pointers to routing based on host IP addresses, and running BGP on servers.

Latest blog posts in Disaster Recovery series

26 comments:

  1. Hey Ivan,

    I'm working on a large scheme like this now.

    Moving the subnet without re-numbering as you suggest seems natural, but makes DR *testing* really complicated, so my customer is on the "change IP numbers" track.

    Here are some complications that DNS can't solve:
    - multiple security zones in the data center? DNS doesn't help the firewalls much.
    - reconfiguring the VM's IP is easy (VMware SRM does this), but reconfiguring application listen directives (think apache config file) is not easy. lots of custom per-application scripting work is required.

    Vendors like to claim their offerings are turn-key DR. Customers who believe it will be disappointed :-)
  2. Simply had to check Apache documentation:

    * Virtual hosts can use DNS FQDN
    * Listen directives seem to use only IP addresses, not interface names or DNS FQDN.

    No comment :( ... and I thought Apache web server was a shining example of good programming.
  3. Don't blame Apache for utilizing the OS socket library and having the same limitations. Besides, if you bind to * and run one IP per VM it should be a non-issue.
  4. #1 - Seems we're continuously running around in circles, eventually landing at http://blog.ioshints.info/2009/08/what-went-wrong-socket-api.html

    #2 - Absolutely agree.
  5. "zoning" is the key term to firewalling there, you simply don't create any per host rules. You can have firewalls that are aware of networks on your other sites. State table handling is different box of worms.
  6. Hi Ivan,

    In terms of the listed solutions:
    1. DNS - customers complain about the reaction time. On the other hand disaster recovery does not require fast detection and a DNS refresh.
    2. Hosts routes - harder to manage. LISP will be a good choice but it is emerging technology and more time is required.
    3. Routing protocols on servers not an option at this moment but in the virtualized DC it is feasible. We can have not only firewalls, load-balancers, switches but also routing protocols running on the hipervisor.
    4. Load-Balancers seem to be the best option listed. The global director or simply a router with a conditional route advertisement can do a prefix insertion.
    5. NAT harder to manage and consumes resources unless the stateless translation is used. NAT is better to avoid not because it will not work but to leave it as the last resort solution in the case of problems to IPv6 migration.

    Another options:
    6. Proxy ARP + L3 Edge. There are some security drawbacks especially if a firewall is in between but it is considerable.
    7. Fabric-Path/TRILL for IP. In this option DC is fully L3 routed. There is no such standards nowadays but we could create it. ;)
  7. Ivan, this issue with a hard-coded IP it is not a big deal in a disaster recovery data center. A bigger issue is with an active/active scenario. Even with load-balancers there will be problems with asymmetric routing which increases latency (it is very undesirable in mobile networks) and it is hard to manage on a firewall.
  8. #1 (DNS): use low TTL. The "only" problem is that the web visitors might have to restart the browser session (even though DNS pinning is not as aggressive these days at is used to be).

    #2 - if you want to retain IP addresses without stretching L2 subnets, host routes (or LISP) are unfortunately the only way to go.

    #3 - We had routing protocols on servers 30 years ago. Come on ;)

    #6 - I used to believe in Proxy ARP. No longer. Anyhow, it solves only server-to-GW problem, nothing more.

    #7 - Don't even try to go there. Either you solve the problem on L3, and then you're back to LISP or host routes, or on L2, where you could easily use VXLAN (but we don't want split subnets anyway).
  9. I forgot how many times I've been ranting against stretched subnets citing exactly these reasons.

    BTW, I might get invited to PLNOG, in which case we can continue our discussion in a more pleasant environment.
  10. Excluding some solutions flavors we are on the same page in terms of DC.

    BTW, 'I might get invited' is not a case. You must be invited. 8-)
  11. 1. DNS can be an option for active/standby DCs but it is not perceived as a fast solution for active/active DCs.

    2. The only way to go? What about load-balancers? With a disaster recovery DC it is also an option. The secondary LB + edge router start advertising VIP prefixes after a switchover button is pushed.

    3. It is not an option not because there is no routing on servers but because IPs of GWs are fixed.

    6. The funny thing Proxy ARP is shown by the Cisco BU as a part of the LISP solution in the case of IP address movement. That's right that Proxy ARP solves only to-GW problem but from this point there is pure L3 network with the conditional advertisements/IP SLA/PfR and so on.

    7. By saying 'Fabric-Path/TRILL for IP' I did not mean the existing feature. I will rephrase it in an another way. Imagine that a network of switches compose a topology based on Switch-IDs. This network behaves like L3 not STP. After connecting a host/server with IP this address will be advertised to other participants which require a connectivity. In such a way we could achieve a full mobility of IPs which will have /128 address. Hosts do not need to use GWs as all links could be point-to-point to the nearest SW.
  12. #1 - I was talking about cold VM migration. I have stronger opinions about other migration options.

    #2 - Think about it. If you have to advertise VIP prefixes, you're advertising static routes (or something similar).

    #3 - Server loopback IP = fixed. Server LAN IP = DHCP. OSPF over LAN interface. Default route comes from next-hop router. What's wrong with this picture?

    #7 - You just reinvented IP host routes carried in IS-IS (if you want to stay close to TRILL). Congratulations :-P
  13. #1 - I was talking about cold VM migration. I have stronger opinions about other migration options.

    #2 - Think about it. If you have to advertise VIP prefixes, you're advertising host routes (or something similar).

    #3 - Server loopback IP = fixed. Server LAN IP = DHCP. OSPF over LAN interface. Default route comes from DHCP or OSPF. What's wrong with this picture?

    #7 - You just reinvented IP host routes carried in IS-IS (if you want to stay close to TRILL). Congratulations :-P
  14. 2. Static routes are not necessary. A disaster recovery DC can have the same addressing scheme as the primary one so rather connected/dynamic routes. Using LBs in the active/standby scenario is easier comparing to the active/active.

    3. In this scenario it is fine if only loopbacks were used.

    7. There is a major difference between such a future feature and existing IS-IS. The IS-IS topology is Switch-ID based not IP. It means that a switch in-between does not need to know IPs. All IPs could be divided using additional segment ID to allow virtualization. It doesn't have to be IS-IS. It can be BGP or another protocol which will create a cloud for connected IPs. Another version would be fully routed DC environment and LISP VM Mobility.
  15. #2 - Wanted to write "host routes", not static (see below). Somehow a deleted comment resurfaced :(

    #7 - Now you've reinvented MPLS/VPN 8-) Although we agree it would be a good solution, I don't see it happening for way too many reasons.

    http://blog.ioshints.info/2011/04/vcloud-architects-ever-heard-of-mpls.html
  16. 7. Not exactly MPLS VPN. In Fabric Path there is a conversational learning and just one control protocol responsible for routing. :)
  17. 7. Conversational learning doesn't work @ L3 (no ARP broadcast ... but I might be braindead).

    Also, in a totally masochistic design you could use BGP to propagate BGP next hops O:-)
  18. 7. Upon the connectivity of an IP device to a cloud a register message would be required. Then you don't need ARP over the cloud. The nearest edge device could serve as a gateway. Other steps will be similar to FabPath. But c'mon, don't require the full description from me today. I've just started thinking it up. ;)

    I am now considering another issue with a default route in such a solution. Whether to use the same mechanism as it is in 6rd or something else. I will get back to you within couple of months.
  19. If you can change the end-device then there are all sorts of other options. Unfortunately most of the time they tell you "NO".

    But I was truly braindead yesterday - conversation-based learning is what LISP/NHRP do.
  20. I honestly think load balancers are the best approach currently. F5 showed a nice functional example during Tech Field Day 3, and it's the only moderately-convincing solution I've seen yet. But it's not optimal, with a massive trombone sometimes added and added latency of the device itself.

    As for DNS, I'm really concerned about all the "optimization" that various clients and NAT devices do interfering with a short TTL. And how short practically can the TTL be, anyway? You don't want clients hitting your DNS server for every request, after all. And DNS solutions will still break sessions in progress.

    What about some kind of long-distance distributed VXLAN? Have a virtualization-aware gateway in the network that maintains tunnels to the proper location. I guess I just invented a load balancer! :-D

    Finally, where can I learn more about LISP? I keep hearing about it but know very little at this point.

    Thanks, Ivan!
  21. Running routing protocols on the servers is a fantastic solution; I don't understand why it isn't done more often.

    1) the IP moves elegantly with the service
    2) if your application is well written you can now go active active (anycasting)
    3) DR testing is also a breeze -- bring up the test scenario on your live network but with the test node's route "slugged" (huge metric) so the local test systems see the DR test environment but nobody else sees it.

    And if you're deeply afraid of server guys doing bad things to your routing table, you can always put all the servers into an unrouted vlan and put a "router" VM such as one from vyatta or something similar in front of that vlan and have that do the route advertisements.

    I say it's time for datacenter servers to stop having static routes.
  22. DNS: Remember that we're discussing cold migration here, not the "live worldwide mobility powered by unicorn tears". It usually takes a minute or more for the VM to be shut down and restarted in the other DC, and TTL values in tens of seconds are not unusual.

    Long-distance VXLAN: forget it. It's the wrong tool for the problem http://blog.ioshints.info/2011/09/vxlan-otv-and-lisp.html

    LISP and VM mobility? Coming soon ... 8-)
  23. LISP seems like a good solution for VM mobility, very much like LAM in concept for the DC but obviously a more complicated protocol involving dynamic tunnelling. However the tunneling aspect surely means that added functionality is needed in the DC firewalls between security zones as they cannot yet inspect this traffic before it gets to the ETR which is directly conencted to the subnet that the VM is on.

    I'm struggling with this part because I believe LISP would solve a big problem, but we would need the firewall to have LISP intelligence to see inside the LISP packet. Maybe this is coming too.
    Replies
    1. LISP is great for cold VM mobility, you still need L2 connectivity for hot VM mobility.

      LISP can help you get optimal inbound traffic after VM move, but the stretched L2 subnet is stil required. Watch NFD3 presentation from Victor Moreno.
  24. What do you think about trying to use static routes pointing out of the interface without a next-hop? A static route configured without a next-hop pointing out of an ethernet interface will still cause the device to ARP for hosts. It gives the benefit over a connected route in that you can still control administrative distance of the static route. You can add your host subnet this way with a higher Administrative Distance then redistribute the static route into BGP. You can then strip the weight and set the Local Preference and do all sorts of fun things to it in BGP so even a received BGP route for the same subnet will be preferred over the route on the local device but you can advertise it as a backup. This would allow you to prefer your primary datacenter over the backup even from backup datacenter's router / firewall itself... so long as you have a better route received from BGP from the primary datacenter.

    This works for the host subnet route but for the gateway address for that subnet, depending on the hardware, you may be able to:
    1) Use proxy-arp for the specific gateway address only, if this is supported. Some gear will let you specify the host addresses themselves you want to proxy arp for. If you have equipment that can do this it is nice (some equipment will not even require an IP on the interface to do this, if it does require an address you could use a throw-away network and proxy-arp specifically for the gateway you really want to).
    2) If you can't do an explicit proxy-arp limited to the gateway address (since I don't like ARPing for all addresses) perhaps trying to add a minimal subnet size to the interface for the gateway IP to live there just for ARP purposes. Some equipment can even do a /32 subnet on an ethernet interface so you could only have the specific gateway as a connected route with no other addresses. Other equipment may require a /31 or /30 on the interface for the gateway address. It is good to limit the gateway address subnet size as much as possible because it will become a connected route and the connected route will always be preferred on that local device. This means you lose the ability to ping the gateway address at the opposite datacenter but you really care about the hosts anyway, right? The static route should still route for the hosts the way you want so you can ping to the hosts.

    What do you think? Definitely needs more real world testing on various gear as I suspect not all vendors handle these things the same. It's also a clunky solution and requires documentation for the poor guy who comes after you trying to understand the config. You know what they say though, "Never point out a solution without providing at least one problem"....or maybe that quote goes the other way around... ;)
  25. Woops, I just realized I proposed a solution for full subnets moving where all other suggestions are focused on individual hosts moving. I think I may have missed the main point when I commented :)
Add comment
Sidebar