IP renumbering in disaster avoidance Data Center designs

It’s hard for me to admit, but there just might be a corner use case for split subnets and inter-DC bridging: even if you move a cold VM between data centers in a controlled disaster avoidance process (moving live VMs rarely makes sense), you might not be able to change its IP address due to hard-coded IP addresses, be it in application code or configuration files.

Disaster recovery is a different beast: if you’ve lost the primary DC, it doesn’t hurt if you instantiate the same subnet in the backup DC.

However, before jumping headfirst into a misty pool filled with unicorn tears (actually, a brittle solution with too many moving parts that usually makes no sense), let’s see if there are alternatives. Here are some ideas in exponentially decreasing order of preference:

Ever heard of DNS? If the application uses hardcoded addresses in its clients or between servers, there’s not much you can do, but one would expect truly hardcoded addresses only in home-brewed craplications ... and masterpieces created by those “programming gurus” that never realized hostnames should be used in configuration files instead of IP addresses.

If your application is somewhat well-behaved, there are all sorts of dynamic DNS solutions that you can use to automatically associate server’s new IP address with its DNS FQDN. Windows clusters do that automatically, many DHCP servers automatically create dynamic DNS entries after client address allocation, and there are numerous Linux clients that you can use even with static IP addresses.

Use low TTL values if you’re changing DNS records, or the clients won’t be able to connect to the migrated servers due to stale local DNS caches.

Host routes? For whatever reason some people think host routes are worse than long-distance bridging. They’re not – if nothing else, you have all the forwarding information in one place ... and modern L3 switches use hosts routes for directly connected IP hosts anyway.

Automatic network-side configuration of host routes is mission impossible. Local Area Mobility (LAM) worked years ago, but is not supported in data center switches ... and the only other mechanism I could think of that wouldn’t involve loads of homemade scripting glue is OpenFlow-driven RIB modification supposedly working on Juniper MX routers.

LISP is another option. LISP VM Mobility detects out-of-subnet IP addresses and uses LISP mappings to steer traffic toward them. LISP VM mobility is somewhat similar to host routes, but uses tunneling across IP core instead of forwarding state distribution. Another bonus: host routes (actually LISP mappings for /32 prefixes) are created automatically.

Routing protocols? Routing protocols running on servers were pretty popular years ago (that’s how IBM implemented IP multipathing on mainframes). Instead of configuring hardcoded IP address on server’s LAN interface, configure it on a loopback and run OSPF over DHCP-addressed LAN interfaces.

Assuming you still use Cat 6500 in your data center instead of Nexus switches, you could use OSPF database filters to minimize the impact of changes in OSPF topology.

Yeah, I know, a stupid solution like this requires actual changes to server configurations ... and it’s so much easier to pretend the problem doesn’t exist and claim that the network should support whatever we throw at it ;)

Route Health Injection on load balancers? Same idea as server-side routing protocols, but implemented in front of the whole application infrastructure.

Assuming your application sits behind a load balancer and you’re doing a cold migration of all application components in one step, you can preconfigure all the required IP subnets in the disaster recovery site (after all, they’re hidden behind a load balancer) and rely on the load balancer to insert the publicly-visible route to the application’s public IP address once everything is ready to go.

The universal duct tape – NAT. If the clients use DNS to connect to the servers, but the servers have to use fixed IP addresses, use NAT to hide server subnets behind different public IP addresses (one per site).

Obviously you have to move the whole application infrastructure at once if you want to use this approach or things will break really badly.

Apart from the usual NAT-is-bad and NAT-breaks-things mantras, there are a few additional drawbacks:

  • Clients have to rely on changed DNS records as you cannot insert a host route into the outside network like a load balancer can with RHI.
  • NAT devices usually don’t support dynamic DNS registration, so you have to change the DNS entries “manually”.

Virtual appliances? Duncan Epping proposed using vShield Edge as a NAT device. I wouldn’t – on top of all the other drawbacks mentioned above, vShield Edge becomes a single point of failure that uses VMware HA to provide somewhat better availability.

Anything else? I’m positive I’ve missed an elegant idea or two. Your comments are most welcome ... including those telling me why the ideas mentioned above would never be implementable.

24 comments:

  1. Hey Ivan,

    I'm working on a large scheme like this now.

    Moving the subnet without re-numbering as you suggest seems natural, but makes DR *testing* really complicated, so my customer is on the "change IP numbers" track.

    Here are some complications that DNS can't solve:
    - multiple security zones in the data center? DNS doesn't help the firewalls much.
    - reconfiguring the VM's IP is easy (VMware SRM does this), but reconfiguring application listen directives (think apache config file) is not easy. lots of custom per-application scripting work is required.

    Vendors like to claim their offerings are turn-key DR. Customers who believe it will be disappointed :-)

    ReplyDelete
  2. Simply had to check Apache documentation:

    * Virtual hosts can use DNS FQDN
    * Listen directives seem to use only IP addresses, not interface names or DNS FQDN.

    No comment :( ... and I thought Apache web server was a shining example of good programming.

    ReplyDelete
  3. Don't blame Apache for utilizing the OS socket library and having the same limitations. Besides, if you bind to * and run one IP per VM it should be a non-issue.

    ReplyDelete
  4. #1 - Seems we're continuously running around in circles, eventually landing at http://blog.ioshints.info/2009/08/what-went-wrong-socket-api.html

    #2 - Absolutely agree.

    ReplyDelete
  5. "zoning" is the key term to firewalling there, you simply don't create any per host rules. You can have firewalls that are aware of networks on your other sites. State table handling is different box of worms.

    ReplyDelete
  6. Piotr Jablonski24 January, 2012 12:37

    Hi Ivan,

    In terms of the listed solutions:
    1. DNS - customers complain about the reaction time. On the other hand disaster recovery does not require fast detection and a DNS refresh.
    2. Hosts routes - harder to manage. LISP will be a good choice but it is emerging technology and more time is required.
    3. Routing protocols on servers not an option at this moment but in the virtualized DC it is feasible. We can have not only firewalls, load-balancers, switches but also routing protocols running on the hipervisor.
    4. Load-Balancers seem to be the best option listed. The global director or simply a router with a conditional route advertisement can do a prefix insertion.
    5. NAT harder to manage and consumes resources unless the stateless translation is used. NAT is better to avoid not because it will not work but to leave it as the last resort solution in the case of problems to IPv6 migration.

    Another options:
    6. Proxy ARP + L3 Edge. There are some security drawbacks especially if a firewall is in between but it is considerable.
    7. Fabric-Path/TRILL for IP. In this option DC is fully L3 routed. There is no such standards nowadays but we could create it. ;)

    ReplyDelete
  7. Piotr Jablonski24 January, 2012 12:44

    Ivan, this issue with a hard-coded IP it is not a big deal in a disaster recovery data center. A bigger issue is with an active/active scenario. Even with load-balancers there will be problems with asymmetric routing which increases latency (it is very undesirable in mobile networks) and it is hard to manage on a firewall.

    ReplyDelete
  8. #1 (DNS): use low TTL. The "only" problem is that the web visitors might have to restart the browser session (even though DNS pinning is not as aggressive these days at is used to be).

    #2 - if you want to retain IP addresses without stretching L2 subnets, host routes (or LISP) are unfortunately the only way to go.

    #3 - We had routing protocols on servers 30 years ago. Come on ;)

    #6 - I used to believe in Proxy ARP. No longer. Anyhow, it solves only server-to-GW problem, nothing more.

    #7 - Don't even try to go there. Either you solve the problem on L3, and then you're back to LISP or host routes, or on L2, where you could easily use VXLAN (but we don't want split subnets anyway).

    ReplyDelete
  9. I forgot how many times I've been ranting against stretched subnets citing exactly these reasons.

    BTW, I might get invited to PLNOG, in which case we can continue our discussion in a more pleasant environment.

    ReplyDelete
  10. Piotr Jablonski24 January, 2012 18:17

    Excluding some solutions flavors we are on the same page in terms of DC.

    BTW, 'I might get invited' is not a case. You must be invited. 8-)

    ReplyDelete
  11. Piotr Jablonski24 January, 2012 18:19

    1. DNS can be an option for active/standby DCs but it is not perceived as a fast solution for active/active DCs.

    2. The only way to go? What about load-balancers? With a disaster recovery DC it is also an option. The secondary LB + edge router start advertising VIP prefixes after a switchover button is pushed.

    3. It is not an option not because there is no routing on servers but because IPs of GWs are fixed.

    6. The funny thing Proxy ARP is shown by the Cisco BU as a part of the LISP solution in the case of IP address movement. That's right that Proxy ARP solves only to-GW problem but from this point there is pure L3 network with the conditional advertisements/IP SLA/PfR and so on.

    7. By saying 'Fabric-Path/TRILL for IP' I did not mean the existing feature. I will rephrase it in an another way. Imagine that a network of switches compose a topology based on Switch-IDs. This network behaves like L3 not STP. After connecting a host/server with IP this address will be advertised to other participants which require a connectivity. In such a way we could achieve a full mobility of IPs which will have /128 address. Hosts do not need to use GWs as all links could be point-to-point to the nearest SW.

    ReplyDelete
  12. #1 - I was talking about cold VM migration. I have stronger opinions about other migration options.

    #2 - Think about it. If you have to advertise VIP prefixes, you're advertising static routes (or something similar).

    #3 - Server loopback IP = fixed. Server LAN IP = DHCP. OSPF over LAN interface. Default route comes from next-hop router. What's wrong with this picture?

    #7 - You just reinvented IP host routes carried in IS-IS (if you want to stay close to TRILL). Congratulations :-P

    ReplyDelete
  13. #1 - I was talking about cold VM migration. I have stronger opinions about other migration options.

    #2 - Think about it. If you have to advertise VIP prefixes, you're advertising host routes (or something similar).

    #3 - Server loopback IP = fixed. Server LAN IP = DHCP. OSPF over LAN interface. Default route comes from DHCP or OSPF. What's wrong with this picture?

    #7 - You just reinvented IP host routes carried in IS-IS (if you want to stay close to TRILL). Congratulations :-P

    ReplyDelete
  14. Piotr Jablonski24 January, 2012 20:16

    2. Static routes are not necessary. A disaster recovery DC can have the same addressing scheme as the primary one so rather connected/dynamic routes. Using LBs in the active/standby scenario is easier comparing to the active/active.

    3. In this scenario it is fine if only loopbacks were used.

    7. There is a major difference between such a future feature and existing IS-IS. The IS-IS topology is Switch-ID based not IP. It means that a switch in-between does not need to know IPs. All IPs could be divided using additional segment ID to allow virtualization. It doesn't have to be IS-IS. It can be BGP or another protocol which will create a cloud for connected IPs. Another version would be fully routed DC environment and LISP VM Mobility.

    ReplyDelete
  15. #2 - Wanted to write "host routes", not static (see below). Somehow a deleted comment resurfaced :(

    #7 - Now you've reinvented MPLS/VPN 8-) Although we agree it would be a good solution, I don't see it happening for way too many reasons.

    http://blog.ioshints.info/2011/04/vcloud-architects-ever-heard-of-mpls.html

    ReplyDelete
  16. Piotr Jablonski26 January, 2012 21:37

    7. Not exactly MPLS VPN. In Fabric Path there is a conversational learning and just one control protocol responsible for routing. :)

    ReplyDelete
  17. 7. Conversational learning doesn't work @ L3 (no ARP broadcast ... but I might be braindead).

    Also, in a totally masochistic design you could use BGP to propagate BGP next hops O:-)

    ReplyDelete
  18. Piotr Jablonski26 January, 2012 23:29

    7. Upon the connectivity of an IP device to a cloud a register message would be required. Then you don't need ARP over the cloud. The nearest edge device could serve as a gateway. Other steps will be similar to FabPath. But c'mon, don't require the full description from me today. I've just started thinking it up. ;)

    I am now considering another issue with a default route in such a solution. Whether to use the same mechanism as it is in 6rd or something else. I will get back to you within couple of months.

    ReplyDelete
  19. If you can change the end-device then there are all sorts of other options. Unfortunately most of the time they tell you "NO".

    But I was truly braindead yesterday - conversation-based learning is what LISP/NHRP do.

    ReplyDelete
  20. I honestly think load balancers are the best approach currently. F5 showed a nice functional example during Tech Field Day 3, and it's the only moderately-convincing solution I've seen yet. But it's not optimal, with a massive trombone sometimes added and added latency of the device itself.

    As for DNS, I'm really concerned about all the "optimization" that various clients and NAT devices do interfering with a short TTL. And how short practically can the TTL be, anyway? You don't want clients hitting your DNS server for every request, after all. And DNS solutions will still break sessions in progress.

    What about some kind of long-distance distributed VXLAN? Have a virtualization-aware gateway in the network that maintains tunnels to the proper location. I guess I just invented a load balancer! :-D

    Finally, where can I learn more about LISP? I keep hearing about it but know very little at this point.

    Thanks, Ivan!

    ReplyDelete
  21. Running routing protocols on the servers is a fantastic solution; I don't understand why it isn't done more often.

    1) the IP moves elegantly with the service
    2) if your application is well written you can now go active active (anycasting)
    3) DR testing is also a breeze -- bring up the test scenario on your live network but with the test node's route "slugged" (huge metric) so the local test systems see the DR test environment but nobody else sees it.

    And if you're deeply afraid of server guys doing bad things to your routing table, you can always put all the servers into an unrouted vlan and put a "router" VM such as one from vyatta or something similar in front of that vlan and have that do the route advertisements.

    I say it's time for datacenter servers to stop having static routes.

    ReplyDelete
  22. DNS: Remember that we're discussing cold migration here, not the "live worldwide mobility powered by unicorn tears". It usually takes a minute or more for the VM to be shut down and restarted in the other DC, and TTL values in tens of seconds are not unusual.

    Long-distance VXLAN: forget it. It's the wrong tool for the problem http://blog.ioshints.info/2011/09/vxlan-otv-and-lisp.html

    LISP and VM mobility? Coming soon ... 8-)

    ReplyDelete
  23. LISP seems like a good solution for VM mobility, very much like LAM in concept for the DC but obviously a more complicated protocol involving dynamic tunnelling. However the tunneling aspect surely means that added functionality is needed in the DC firewalls between security zones as they cannot yet inspect this traffic before it gets to the ETR which is directly conencted to the subnet that the VM is on.

    I'm struggling with this part because I believe LISP would solve a big problem, but we would need the firewall to have LISP intelligence to see inside the LISP packet. Maybe this is coming too.

    ReplyDelete
    Replies
    1. LISP is great for cold VM mobility, you still need L2 connectivity for hot VM mobility.

      LISP can help you get optimal inbound traffic after VM move, but the stretched L2 subnet is stil required. Watch NFD3 presentation from Victor Moreno.

      Delete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.