It’s hard for me to admit, but there just might be a corner use case for split subnets and inter-DC bridging: even if you move a cold VM between data centers in a controlled disaster avoidance process (moving live VMs rarely makes sense), you might not be able to change its IP address due to hard-coded IP addresses, be it in application code or configuration files.
Disaster recovery is a different beast: if you’ve lost the primary DC, it doesn’t hurt if you instantiate the same subnet in the backup DC.
However, before jumping headfirst into a misty pool filled with unicorn tears (actually, a brittle solution with too many moving parts that usually makes no sense), let’s see if there are alternatives. Here are some ideas in exponentially decreasing order of preference:
Ever heard of DNS? If the application uses hardcoded addresses in its clients or between servers, there’s not much you can do, but one would expect truly hardcoded addresses only in home-brewed craplications ... and masterpieces created by those “programming gurus” that never realized hostnames should be used in configuration files instead of IP addresses.
If your application is somewhat well-behaved, there are all sorts of dynamic DNS solutions that you can use to automatically associate server’s new IP address with its DNS FQDN. Windows clusters do that automatically, many DHCP servers automatically create dynamic DNS entries after client address allocation, and there are numerous Linux clients that you can use even with static IP addresses.
Use low TTL values if you’re changing DNS records, or the clients won’t be able to connect to the migrated servers due to stale local DNS caches.
Host routes? For whatever reason some people think host routes are worse than long-distance bridging. They’re not – if nothing else, you have all the forwarding information in one place ... and modern L3 switches use hosts routes for directly connected IP hosts anyway.
Automatic network-side configuration of host routes is mission impossible. Local Area Mobility (LAM) worked years ago, but is not supported in data center switches ... and the only other mechanism I could think of that wouldn’t involve loads of homemade scripting glue is OpenFlow-driven RIB modification supposedly working on Juniper MX routers.
LISP is another option. LISP VM Mobility detects out-of-subnet IP addresses and uses LISP mappings to steer traffic toward them. LISP VM mobility is somewhat similar to host routes, but uses tunneling across IP core instead of forwarding state distribution. Another bonus: host routes (actually LISP mappings for /32 prefixes) are created automatically.
Routing protocols? Routing protocols running on servers were pretty popular years ago (that’s how IBM implemented IP multipathing on mainframes). Instead of configuring hardcoded IP address on server’s LAN interface, configure it on a loopback and run OSPF over DHCP-addressed LAN interfaces.
Assuming you still use Cat 6500 in your data center instead of Nexus switches, you could use OSPF database filters to minimize the impact of changes in OSPF topology.
Yeah, I know, a stupid solution like this requires actual changes to server configurations ... and it’s so much easier to pretend the problem doesn’t exist and claim that the network should support whatever we throw at it ;)
Route Health Injection on load balancers? Same idea as server-side routing protocols, but implemented in front of the whole application infrastructure.
Assuming your application sits behind a load balancer and you’re doing a cold migration of all application components in one step, you can preconfigure all the required IP subnets in the disaster recovery site (after all, they’re hidden behind a load balancer) and rely on the load balancer to insert the publicly-visible route to the application’s public IP address once everything is ready to go.
The universal duct tape – NAT. If the clients use DNS to connect to the servers, but the servers have to use fixed IP addresses, use NAT to hide server subnets behind different public IP addresses (one per site).
Obviously you have to move the whole application infrastructure at once if you want to use this approach or things will break really badly.
Apart from the usual NAT-is-bad and NAT-breaks-things mantras, there are a few additional drawbacks:
- Clients have to rely on changed DNS records as you cannot insert a host route into the outside network like a load balancer can with RHI.
- NAT devices usually don’t support dynamic DNS registration, so you have to change the DNS entries “manually”.
Virtual appliances? Duncan Epping proposed using vShield Edge as a NAT device. I wouldn’t – on top of all the other drawbacks mentioned above, vShield Edge becomes a single point of failure that uses VMware HA to provide somewhat better availability.
Anything else? I’m positive I’ve missed an elegant idea or two. Your comments are most welcome ... including those telling me why the ideas mentioned above would never be implementable.