Layer-2 Extension (OTV) Use Cases
I was listening to the fantastic OTV Deep Dive PQ Packet Pushers podcast while biking around the wonderful Slovenian forests. They started the podcast by discussing OTV use cases, Ethan throwing in long-distance vMotion (the usual long-distance L2 extension selling point), but refreshingly some of the engineers said “well, that’s not really the use case we see in real life.”
So what were the use cases they were mentioning?
I loved one of them – someone using OTV to get away from L2 interconnect. They had a traditional L2 interconnect (and all the associated “goodies”), decided to convert it to L3 interconnect, but still needed some stretched VLANs in the migration period.
And here are the other use cases I gleaned from the podcast:
External BGP subnets – you have a single /24 IPv4 prefix that you have to announce from more than one data center, and most people would immediately think about stretching that same subnet (because you can’t advertise two /25s to the Internet) across more than one location hoping that everything works.
Not surprisingly, if the inter-DC WAN link fails, you’ll face a nice split-brain scenario with both data centers advertising the subnet, effectively preventing some users from reaching the correct data center … unless you do some fancy routing, which brings me to the point: you don’t need stretched layer-2 subnet to implement this scenario, you just need proper design and some more intelligent routing.
Now, I totally understand that some customers love to sprinkle another layer of pixie dust over their network instead of investing in proper BGP design and deployment. As a system integrator you usually have to go with what your customers want (and are willing to pay for), but the L2 extension still carries a hefty price tag (particularly if you have to buy the M2 linecards and OTV license for Nexus 7000) which might be a bit higher than attending a BGP course and paying someone to design your DC WAN edge (or review your design).
Data Center migration, which is a perfect use case that even I would support. Do keep in mind that you have to sync a lot of things (including storage), which could make the migration project a bit more complex than a simple shutdown-move-powerup procedure, but if you have to move the data center and cannot agree on a reasonably long maintenance window within the next 6 months, you just might have to use long-distance vMotion hoping nothing crashes in the process.
Also, keep in mind that your migration might not be as fast as you expect it to be – some people managed to move 30 VMs in a weekend, which was such a phenomenal achievement that EMC simply had to document it in a press release.
Finally, don’t forget to turn off layer-2 extension when you’re done – you wouldn’t want to turn two data centers into a single failure domain, would you?
Disaster recovery with SRM – yet another use case supporting laziness at the cost of network complexity. I totally understand that you have to use the same subnet in both data centers because some craplications simply cannot survive a changed IP address, but I can’t grasp why you wouldn’t use SRM external hooks and reconfigure the switches with NETCONF (or XMPP or Puppet) during the SRM recovery process to recreate the subnet in the other data center.
BTW, if you’re running anything more complex than an SMB web hosting environment, you probably have to migrate firewall and load balancer configurations as well, in which case recreating the lost subnet is the least of your worries … unless you already deployed virtual appliances.
Summary – I’m still looking for a good layer-2 extension use case (apart from the migration ones).
You’ll find all you never wanted to know about Data Center interconnects (layer-2 and layer-3, including MPLS/VPN) in the DCI webinar.
1) Support for partial failover as most failures are not complete site failures. (Yes, we considered the issue of applications being split between data centers while running in this state).
2) Virtual appliances where SRM IP customization does not work. (These same VMs often have no APIs and cannot be scripted).
3) Broken VMs where IP customization failed (yes, fixing those VMs would have been better).