Revisited: Layer-2 DCI over VXLAN
I’m still getting questions about layer-2 data center interconnect; it seems this particular bad idea isn’t going away any time soon. In the face of that sad reality, let’s revisit what I wrote about layer-2 DCI over VXLAN.
VXLAN hasn’t changed much since the time I explained why it’s not the right technology for long-distance VLANs.
- I haven’t seen integration with OTV or LISP that was promised years ago (or maybe I missed something – please write a comment);
- VXLAN-to-VLAN gateways are still limited to single gateway (or MLAG cluster) per VXLAN segment, generating traffic trombones with long-distance VLANs;
- Traffic trombones generated by stateful appliances (inter-subnet firewalls or load balancers) are impossible to solve.
Then there’s the obvious problem of data having gravity (or applications being used to being close to data) – if you move a VM away from the data, the performance quickly drops way below acceptable levels.
However, if you’re forced to implement a stretched VLAN (because the application team cannot possibly deploy their latest gizmo without it, or because the server team claims they need it for disaster recovery … that has no chance of working) that nobody will ever use, VXLAN is the least horrible technology. After all, you’ve totally decoupled the physical infrastructure from the follies of virtual networking, and even if someone manages to generate a forwarding loop between two VXLAN segments, the network infrastructure won’t be affected assuming you implemented some basic traffic policing rules.
Still no control-plane, still basically the same landscape with maybe just a bit broader support.
I'd be interested to hear your thoughts on unicast-mode VXLAN vs 'standard' (draft) VXLAN when it comes to layer 2 extension (while I'm 100% with you that the 'requirement' for layer 2 extension is a mostly/entirely BS requirement). I'm not fully versed in how all the magic happens, but I would assume that the VSM acts as some sort of control plane, and helps to create the VTEP to MAC mappings... I would suspect that this would help to alleviate some of the MAC flooding issues that aren't (adequately) addressed in multicast mode? As a further bandaid (definitely not a 'real' solution) I wonder about the effectiveness of implementing storm-control on port-profiles in the 1000v as an extra method to limit broadcast storm type traffic within the bridge-domain.
Of course none of this addresses the traffic trombone issues, or lack of active/active gateway type functionality (outside of NSX I guess? Also to an extent ACI I suppose). Of course there are also some scale limitations to consider with unicast-mode VXLAN -- particularly surrounding the gateway functionality.
Lastly, although this would *never* fly in a production environment, for kicks you could deploy a CSR with an interface in a VXLAN, and use it to extend the VXLAN to a VLAN in another data-center... its filthy, but it works... in a lab... :)
Carl
There are only two alternatives:
* You eventually lose half of the VXLAN subnet (because VEMs lose connectivity to VSM) - best case its topology would be frozen;
* You have redundant VSMs and they go into split-brain mode.
Not fun.
One last question -- assuming you did deploy redundant VSMs in each of two data centers, if the DCI link fails is it really that big of a deal for the VSMs to go split-brained? If they can maintain operations within each DC that would be better than being totally broken I would think.
" is it really that big of a deal for the VSMs to go split-brained" ... and what happens when the DCI link comes back?
I have no idea how a split brained VSM would react (failure of a single VSM and re-joining is pretty smooth though), perhaps I'll have and excuse to lab it up now!
Could you please elaborate on how VXLAN is a better option than OTV? As far as I can see, OTV doesn't suffer from the traffic tromboning you get from VXLAN. Sure you have to stretch your VLANs, but you're protected from bridging failures going over your DCI. OTV is also able to have multiple edge devices per site, so there's no single failure domain. It's even integrated with LISP to mitigate any sub-optimal traffic flows.
If I simply misinterpreted your post, I apologize.
I wonder if what the cost of the added latency and bandwidth usage of the DCI is, and if it would be offset by just purchasing something that supports OTV.
Semi-related fun fact: OTV has an RFC draft (currently expired though) out there so it looks like the intention is to let anyone use OTV.
http://www.ietf.org/archive/id/draft-hasmit-otv-04.txt
make the mac address the same .
See also RFC 1925, sections 2.5, 2.6 and 2.11 ;)
so lets wait for this to be a Cisco Validated Design before we recommend to clients :)
Assuming you're using vSphere 5.5 or earlier, you have to have a vDS that spans both data centers, which means that hypervisors in the second data center automatically have the same port groups (and VNIs) as those in the first data center.
To Randell Greer's point above, your statement "VXLAN is the least horrible technology" does it have to do with the requirement of specialized hardware (Nexus 7k or ASR 1k) or reliance on Cisco for N1v? If you ignore the hardware requirements for OTV for a moment for L2 DCI would VXLAN still be better bet over OTV? We are not Cisco shop at least not in the data center, we are Arista shop so VXLAN (NSX would awesome and we are even considering NSX) makes perfect sense for us but if OTV is better point solution to satisfy L2 dependencies of few legacy apps we won't mind spending money for couple pairs of ASR 1ks.
Please ignore my last post above dated 20 October, 2014 18:13, I already found you blogpost specifically addressing Randell's question. Awesome. Thank you sir.
In my opinion it makes no sense to agglutinate so many complex technologies into a single Rube Goldberg construction, regardless of what vendors tell you, and have no plans to waste my time trying to figure out how to make them work.
Workload mobility is a myth and works best in vendor PPTs. Get over it and build something that has a chance of being operated and supported by average ops people.
I apologize if I depressed you, but I'm sick-and-tired of the vendor posturing.
> Is this still your view? Some Firewalls are capable of synchronizing their states (sessions) with other members, yet they are in StandAlone (not in cluster), would you say this can resolve the "stretched cluster" problems ?
IMHO, this may resolve some problems:
> The cluster (especially the A/P) mode which can be seen as a single failure domain (software problem/bug...)
> The Asymmetric flow/routing (i.e. no need for LISP if "Traffic trombones" is not a problem)
So now, we may add as many Firewalls to the topology which becomes like a "Firewall Fabric" :)
http://blog.ipspace.net/2011/04/distributed-firewalls-how-badly-do-you.html
http://blog.ipspace.net/2015/11/stretched-firewalls-across-layer-3-dci.html