… updated on Tuesday, November 17, 2020 16:49 UTC
IP Renumbering in Disaster Avoidance Data Center Designs
It’s hard for me to admit, but there just might be a corner use case for split subnets and inter-DC bridging: even if you move a cold VM between data centers in a controlled disaster avoidance process (moving live VMs rarely makes sense), you might not be able to change its IP address due to hard-coded IP addresses, be it in application code or configuration files.
Disaster recovery is a different beast: if you’ve lost the primary DC, it doesn’t hurt if you instantiate the same subnet in the backup DC.
However, before jumping headfirst into a misty pool filled with unicorn tears (actually, a brittle solution with too many moving parts that usually makes no sense), let’s see if there are alternatives. Here are some ideas in exponentially decreasing order of preference:
Ever heard of DNS? If the application uses hardcoded addresses in its clients or between servers, there’s not much you can do, but one would expect truly hardcoded addresses only in home-brewed craplications … and masterpieces created by those “programming gurus” that never realized hostnames should be used in configuration files instead of IP addresses.
If your application is somewhat well-behaved, there are all sorts of dynamic DNS solutions that you can use to automatically associate server’s new IP address with its DNS FQDN. Windows clusters do that automatically, many DHCP servers automatically create dynamic DNS entries after client address allocation, and there are numerous Linux clients that you can use even with static IP addresses.
Host routes? For whatever reason some people think host routes are worse than long-distance bridging. They’re not – if nothing else, you have all the forwarding information in one place … and modern L3 switches use hosts routes for directly connected IP hosts anyway.
Automatic network-side configuration of host routes is mission impossible. Local Area Mobility (LAM) worked years ago, but was not supported in data center switches until Cumulus Networks reinvented it with Redistribute ARP.
… and the only other mechanism I could think of that wouldn’t involve loads of homemade scripting glue is OpenFlow-driven RIB modification supposedly working on Juniper MX routers.
Routing protocols? Routing protocols running on servers were pretty popular years ago (that’s how IBM implemented IP multipathing on mainframes). Instead of configuring hardcoded IP address on server’s LAN interface, configure it on a loopback, and run BGP between servers and adjacent ToR switches… and whatever you do, please don’t use OSPF. Some IBM mainframes were a single link failure away from becoming the core data center router.
Yeah, I know, a stupid solution like this requires actual changes to server configurations … and it’s so much easier to pretend the problem doesn’t exist and claim that the network should support whatever we throw at it ;)
Route Health Injection on load balancers? Same idea as server-side routing protocols, but implemented in front of the whole application infrastructure.
Assuming your application sits behind a load balancer and you’re doing a cold migration of all application components in one step, you can preconfigure all the required IP subnets in the disaster recovery site (after all, they’re hidden behind a load balancer) and rely on the load balancer to insert the publicly visible route to the application’s public IP address once everything is ready to go.
The universal duct tape – NAT. If the clients use DNS to connect to the servers, but the servers have to use fixed IP addresses, use NAT to hide server subnets behind different public IP addresses (one per site).
Obviously you have to move the whole application infrastructure at once if you want to use this approach or things will break really badly.
Apart from the usual NAT-is-bad and NAT-breaks-things mantras, there are a few additional drawbacks:
- Clients have to rely on changed DNS records as you cannot insert a host route into the outside network like a load balancer can with RHI.
- NAT devices usually don’t support dynamic DNS registration, so you have to change the DNS entries “manually”.
Virtual appliances? Duncan Epping proposed using vShield Edge as a NAT device.
While that idea didn’t sound so great when I wrote the original blog post, that’s how most overlay virtual networks implement overlay-to-physical gateways, and at least VMware decided to use BGP as the only routing protocol in this scenario.
Anything else? I’m positive I’ve missed an elegant idea or two. Your comments are most welcome … including those telling me why the ideas mentioned above would never be implementable.
Revision History
- 2020-17-11
- Cleaned up the blog post, removed references to LISP VM mobility, and inserted pointers to routing based on host IP addresses, and running BGP on servers.
I'm working on a large scheme like this now.
Moving the subnet without re-numbering as you suggest seems natural, but makes DR *testing* really complicated, so my customer is on the "change IP numbers" track.
Here are some complications that DNS can't solve:
- multiple security zones in the data center? DNS doesn't help the firewalls much.
- reconfiguring the VM's IP is easy (VMware SRM does this), but reconfiguring application listen directives (think apache config file) is not easy. lots of custom per-application scripting work is required.
Vendors like to claim their offerings are turn-key DR. Customers who believe it will be disappointed :-)
* Virtual hosts can use DNS FQDN
* Listen directives seem to use only IP addresses, not interface names or DNS FQDN.
No comment :( ... and I thought Apache web server was a shining example of good programming.
#2 - Absolutely agree.
In terms of the listed solutions:
1. DNS - customers complain about the reaction time. On the other hand disaster recovery does not require fast detection and a DNS refresh.
2. Hosts routes - harder to manage. LISP will be a good choice but it is emerging technology and more time is required.
3. Routing protocols on servers not an option at this moment but in the virtualized DC it is feasible. We can have not only firewalls, load-balancers, switches but also routing protocols running on the hipervisor.
4. Load-Balancers seem to be the best option listed. The global director or simply a router with a conditional route advertisement can do a prefix insertion.
5. NAT harder to manage and consumes resources unless the stateless translation is used. NAT is better to avoid not because it will not work but to leave it as the last resort solution in the case of problems to IPv6 migration.
Another options:
6. Proxy ARP + L3 Edge. There are some security drawbacks especially if a firewall is in between but it is considerable.
7. Fabric-Path/TRILL for IP. In this option DC is fully L3 routed. There is no such standards nowadays but we could create it. ;)
#2 - if you want to retain IP addresses without stretching L2 subnets, host routes (or LISP) are unfortunately the only way to go.
#3 - We had routing protocols on servers 30 years ago. Come on ;)
#6 - I used to believe in Proxy ARP. No longer. Anyhow, it solves only server-to-GW problem, nothing more.
#7 - Don't even try to go there. Either you solve the problem on L3, and then you're back to LISP or host routes, or on L2, where you could easily use VXLAN (but we don't want split subnets anyway).
BTW, I might get invited to PLNOG, in which case we can continue our discussion in a more pleasant environment.
BTW, 'I might get invited' is not a case. You must be invited. 8-)
2. The only way to go? What about load-balancers? With a disaster recovery DC it is also an option. The secondary LB + edge router start advertising VIP prefixes after a switchover button is pushed.
3. It is not an option not because there is no routing on servers but because IPs of GWs are fixed.
6. The funny thing Proxy ARP is shown by the Cisco BU as a part of the LISP solution in the case of IP address movement. That's right that Proxy ARP solves only to-GW problem but from this point there is pure L3 network with the conditional advertisements/IP SLA/PfR and so on.
7. By saying 'Fabric-Path/TRILL for IP' I did not mean the existing feature. I will rephrase it in an another way. Imagine that a network of switches compose a topology based on Switch-IDs. This network behaves like L3 not STP. After connecting a host/server with IP this address will be advertised to other participants which require a connectivity. In such a way we could achieve a full mobility of IPs which will have /128 address. Hosts do not need to use GWs as all links could be point-to-point to the nearest SW.
#2 - Think about it. If you have to advertise VIP prefixes, you're advertising static routes (or something similar).
#3 - Server loopback IP = fixed. Server LAN IP = DHCP. OSPF over LAN interface. Default route comes from next-hop router. What's wrong with this picture?
#7 - You just reinvented IP host routes carried in IS-IS (if you want to stay close to TRILL). Congratulations :-P
#2 - Think about it. If you have to advertise VIP prefixes, you're advertising host routes (or something similar).
#3 - Server loopback IP = fixed. Server LAN IP = DHCP. OSPF over LAN interface. Default route comes from DHCP or OSPF. What's wrong with this picture?
#7 - You just reinvented IP host routes carried in IS-IS (if you want to stay close to TRILL). Congratulations :-P
3. In this scenario it is fine if only loopbacks were used.
7. There is a major difference between such a future feature and existing IS-IS. The IS-IS topology is Switch-ID based not IP. It means that a switch in-between does not need to know IPs. All IPs could be divided using additional segment ID to allow virtualization. It doesn't have to be IS-IS. It can be BGP or another protocol which will create a cloud for connected IPs. Another version would be fully routed DC environment and LISP VM Mobility.
#7 - Now you've reinvented MPLS/VPN 8-) Although we agree it would be a good solution, I don't see it happening for way too many reasons.
http://blog.ioshints.info/2011/04/vcloud-architects-ever-heard-of-mpls.html
Also, in a totally masochistic design you could use BGP to propagate BGP next hops O:-)
I am now considering another issue with a default route in such a solution. Whether to use the same mechanism as it is in 6rd or something else. I will get back to you within couple of months.
But I was truly braindead yesterday - conversation-based learning is what LISP/NHRP do.
As for DNS, I'm really concerned about all the "optimization" that various clients and NAT devices do interfering with a short TTL. And how short practically can the TTL be, anyway? You don't want clients hitting your DNS server for every request, after all. And DNS solutions will still break sessions in progress.
What about some kind of long-distance distributed VXLAN? Have a virtualization-aware gateway in the network that maintains tunnels to the proper location. I guess I just invented a load balancer! :-D
Finally, where can I learn more about LISP? I keep hearing about it but know very little at this point.
Thanks, Ivan!
1) the IP moves elegantly with the service
2) if your application is well written you can now go active active (anycasting)
3) DR testing is also a breeze -- bring up the test scenario on your live network but with the test node's route "slugged" (huge metric) so the local test systems see the DR test environment but nobody else sees it.
And if you're deeply afraid of server guys doing bad things to your routing table, you can always put all the servers into an unrouted vlan and put a "router" VM such as one from vyatta or something similar in front of that vlan and have that do the route advertisements.
I say it's time for datacenter servers to stop having static routes.
Long-distance VXLAN: forget it. It's the wrong tool for the problem http://blog.ioshints.info/2011/09/vxlan-otv-and-lisp.html
LISP and VM mobility? Coming soon ... 8-)
I'm struggling with this part because I believe LISP would solve a big problem, but we would need the firewall to have LISP intelligence to see inside the LISP packet. Maybe this is coming too.
LISP can help you get optimal inbound traffic after VM move, but the stretched L2 subnet is stil required. Watch NFD3 presentation from Victor Moreno.
This works for the host subnet route but for the gateway address for that subnet, depending on the hardware, you may be able to:
1) Use proxy-arp for the specific gateway address only, if this is supported. Some gear will let you specify the host addresses themselves you want to proxy arp for. If you have equipment that can do this it is nice (some equipment will not even require an IP on the interface to do this, if it does require an address you could use a throw-away network and proxy-arp specifically for the gateway you really want to).
2) If you can't do an explicit proxy-arp limited to the gateway address (since I don't like ARPing for all addresses) perhaps trying to add a minimal subnet size to the interface for the gateway IP to live there just for ARP purposes. Some equipment can even do a /32 subnet on an ethernet interface so you could only have the specific gateway as a connected route with no other addresses. Other equipment may require a /31 or /30 on the interface for the gateway address. It is good to limit the gateway address subnet size as much as possible because it will become a connected route and the connected route will always be preferred on that local device. This means you lose the ability to ping the gateway address at the opposite datacenter but you really care about the hosts anyway, right? The static route should still route for the hosts the way you want so you can ping to the hosts.
What do you think? Definitely needs more real world testing on various gear as I suspect not all vendors handle these things the same. It's also a clunky solution and requires documentation for the poor guy who comes after you trying to understand the config. You know what they say though, "Never point out a solution without providing at least one problem"....or maybe that quote goes the other way around... ;)