Revisited: Layer-2 DCI over VXLAN

I’m still getting questions about layer-2 data center interconnect; it seems this particular bad idea isn’t going away any time soon. In the face of that sad reality, let’s revisit what I wrote about layer-2 DCI over VXLAN.

VXLAN hasn’t changed much since the time I explained why it’s not the right technology for long-distance VLANs.

  • I haven’t seen integration with OTV or LISP that was promised years ago (or maybe I missed something – please write a comment);
  • VXLAN-to-VLAN gateways are still limited to single gateway (or MLAG cluster) per VXLAN segment, generating traffic trombones with long-distance VLANs;
  • Traffic trombones generated by stateful appliances (inter-subnet firewalls or load balancers) are impossible to solve.

Then there’s the obvious problem of data having gravity (or applications being used to being close to data) – if you move a VM away from the data, the performance quickly drops way below acceptable levels.

However, if you’re forced to implement a stretched VLAN (because the application team cannot possibly deploy their latest gizmo without it, or because the server team claims they need it for disaster recovery … that has no chance of working) that nobody will ever use, VXLAN is the least horrible technology. After all, you’ve totally decoupled the physical infrastructure from the follies of virtual networking, and even if someone manages to generate a forwarding loop between two VXLAN segments, the network infrastructure won’t be affected assuming you implemented some basic traffic policing rules.


  1. Interesting to read the comments in the 'VXLAN is not a Data Center Interconnect' post from 2012 :)
    Still no control-plane, still basically the same landscape with maybe just a bit broader support.

    I'd be interested to hear your thoughts on unicast-mode VXLAN vs 'standard' (draft) VXLAN when it comes to layer 2 extension (while I'm 100% with you that the 'requirement' for layer 2 extension is a mostly/entirely BS requirement). I'm not fully versed in how all the magic happens, but I would assume that the VSM acts as some sort of control plane, and helps to create the VTEP to MAC mappings... I would suspect that this would help to alleviate some of the MAC flooding issues that aren't (adequately) addressed in multicast mode? As a further bandaid (definitely not a 'real' solution) I wonder about the effectiveness of implementing storm-control on port-profiles in the 1000v as an extra method to limit broadcast storm type traffic within the bridge-domain.

    Of course none of this addresses the traffic trombone issues, or lack of active/active gateway type functionality (outside of NSX I guess? Also to an extent ACI I suppose). Of course there are also some scale limitations to consider with unicast-mode VXLAN -- particularly surrounding the gateway functionality.

    Lastly, although this would *never* fly in a production environment, for kicks you could deploy a CSR with an interface in a VXLAN, and use it to extend the VXLAN to a VLAN in another data-center... its filthy, but it works... in a lab... :)

    1. Unicast-mode VXLAN has centralized control plane (VSM) and thus represents an interesting failure scenario: what happens if the DCI link fails?

      There are only two alternatives:
      * You eventually lose half of the VXLAN subnet (because VEMs lose connectivity to VSM) - best case its topology would be frozen;
      * You have redundant VSMs and they go into split-brain mode.

      Not fun.
    2. I think thats only partly true though. In the event the VSM goes away the VEMs still have their mapping (for at least some time). I suspect BUM traffic would be broken, but normal unicast would live on, assuming you don't need to change any port-profiles or bring up new hosts etc.! I agree, its certainly not good for DCI, just playing devils advocate for fun!

      One last question -- assuming you did deploy redundant VSMs in each of two data centers, if the DCI link fails is it really that big of a deal for the VSMs to go split-brained? If they can maintain operations within each DC that would be better than being totally broken I would think.
    3. "normal unicast would live on, assuming you don't need to change any port-profiles or bring up new hosts etc." ... or move a VM or use DRS or HA or ...

      " is it really that big of a deal for the VSMs to go split-brained" ... and what happens when the DCI link comes back?
    4. Hah yeah, again, not saying it would be a good idea :)

      I have no idea how a split brained VSM would react (failure of a single VSM and re-joining is pretty smooth though), perhaps I'll have and excuse to lab it up now!
  2. "VXLAN is the least horrible technology"

    Could you please elaborate on how VXLAN is a better option than OTV? As far as I can see, OTV doesn't suffer from the traffic tromboning you get from VXLAN. Sure you have to stretch your VLANs, but you're protected from bridging failures going over your DCI. OTV is also able to have multiple edge devices per site, so there's no single failure domain. It's even integrated with LISP to mitigate any sub-optimal traffic flows.

    If I simply misinterpreted your post, I apologize.
    1. I'd hazard a guess that OTV may well be better. However, IIRC it's only supported on Nexus 7000s and requires both licenses for the VDC option as well as OTV (last time I looked - it might have changed) so if you're not a Cisco shop, or you don't have Nexus 7000s, then VXLAN may well be the least horrible option. There are plenty of things that support VXLAN (if you're using VMware then the Nexus 1000v, or VCNS or NSX would all do the job)
    2. ASR 1ks and the virtual CSRs also support OTV, and are way cheaper than the M cards on the 7ks. If you went the VXLAN route, and have bare metal servers (or maybe other VMs that don't live in ESX) that need access to the servers on the VXLAN segment, they have to go through the VXLAN gateway which might be on the opposite side of the DCI, resulting in tromboning.

      I wonder if what the cost of the added latency and bandwidth usage of the DCI is, and if it would be offset by just purchasing something that supports OTV.

      Semi-related fun fact: OTV has an RFC draft (currently expired though) out there so it looks like the intention is to let anyone use OTV.
  3. Why wouldn't first hop filtering work as well on vxlan ?
    make the mac address the same .
  4. Will traffic trombones generated by stateful appliances be resolved with ASA clustering in ver 9.x? I understand that the ASA clustering feature will soon be supported over OTV LAN extensions. Maybe VXLAN in the near future?? Any thoughts

      See also RFC 1925, sections 2.5, 2.6 and 2.11 ;)
  5. mm much clearer now...
    so lets wait for this to be a Cisco Validated Design before we recommend to clients :)

  6. How does the VLAN to VNI mapping works when a VM migrates to another DC maintaining flat layer2 connectivity. I assume the VLAN the VM uses will not change so the VLAN to VNI mapping should be the same in the new location. This practically can limit the number of VNIs to 4K (the number of VLAN ids)
    1. Short answer: it depends.

      Assuming you're using vSphere 5.5 or earlier, you have to have a vDS that spans both data centers, which means that hypervisors in the second data center automatically have the same port groups (and VNIs) as those in the first data center.
  7. This works if the VTEP is in the hypervisor. What happens if the VTEP is in the TOR? Is the port group visible to the TOR switch?
    1. There is no standard solution for that problem. Talk to whoever is trying to sell you ToR VTEP ;)
  8. Ivan,

    To Randell Greer's point above, your statement "VXLAN is the least horrible technology" does it have to do with the requirement of specialized hardware (Nexus 7k or ASR 1k) or reliance on Cisco for N1v? If you ignore the hardware requirements for OTV for a moment for L2 DCI would VXLAN still be better bet over OTV? We are not Cisco shop at least not in the data center, we are Arista shop so VXLAN (NSX would awesome and we are even considering NSX) makes perfect sense for us but if OTV is better point solution to satisfy L2 dependencies of few legacy apps we won't mind spending money for couple pairs of ASR 1ks.
  9. Ivan,

    Please ignore my last post above dated 20 October, 2014 18:13, I already found you blogpost specifically addressing Randell's question. Awesome. Thank you sir.
  10. Dear Ivan, i am planing to design private virtual cloud which include multiple data center and want work load mobality on demand to any data center to any.I want to use Cisco OTV,LISP,Vmware NSX and VXLAN.I want to use VXLAN for east to west traffic within DC and for noth to south want to use OTV and LISP.Could you please provide any use case and detail technical information for how OTV and LISP integrate with NSX
    1. Simple answer: No.

      In my opinion it makes no sense to agglutinate so many complex technologies into a single Rube Goldberg construction, regardless of what vendors tell you, and have no plans to waste my time trying to figure out how to make them work.

      Workload mobility is a myth and works best in vendor PPTs. Get over it and build something that has a chance of being operated and supported by average ops people.

      I apologize if I depressed you, but I'm sick-and-tired of the vendor posturing.
  11. Ivan, regarding what you said : "Traffic trombones generated by stateful appliances (inter-subnet firewalls or load balancers) are impossible to solve."

    > Is this still your view? Some Firewalls are capable of synchronizing their states (sessions) with other members, yet they are in StandAlone (not in cluster), would you say this can resolve the "stretched cluster" problems ?

    IMHO, this may resolve some problems:
    > The cluster (especially the A/P) mode which can be seen as a single failure domain (software problem/bug...)
    > The Asymmetric flow/routing (i.e. no need for LISP if "Traffic trombones" is not a problem)

    So now, we may add as many Firewalls to the topology which becomes like a "Firewall Fabric" :)
    1. No, I haven't changed my mind. IMHO it's really hard to change the laws of physics (or networking), and whatever glitzy miracle comes out is usually just a reiteration of old stuff.
    2. Thanks, that's clear. For the sake of all of us understanding better, why "states synchronization" on FWs/LB would not solve the "the Traffic trombones generated by stateful appliances" ?
Add comment