Revisited: The Need for Stretched VLANs
Regardless of how much I write about (the ridiculousness of using) stretched VLANs, I keep getting questions along the same lines. This time it’s:
What type of applications require L2 Extension and L3 extension?
I don’t think I’ve seen anyone use L3 extension (after all, isn’t that what Internet is all about), so let’s focus on the first one.
Stretched VLANs (or L2 extensions) are used to solve a number of unrelated problems, because once a vendor sold you a hammer everything starts looking like a nail, and once you get used to replacing everything with nails, you want to use them in all possible environments, including public and hybrid clouds.
Some of the challenges that I’ve seen solved with stretched VLANs include:
Subnet mobility. You must move a subnet from one site to another during disaster recovery process because the subnet (or IP addresses within the subnet) is hard-coded in application, configuration files, or firewall/load balancer rules.
This one is easiest to solve: start using automation and make network infrastructure recovery part of your overall recovery process.
Alternatively, configure the same subnet on VLAN interfaces or firewall contexts that are shutdown during the regular operation, and enabled when needed.
For whatever reason, most everyone solves this one by stretching a VLAN between data centers (because VMware consultants told them to do so) and then experiencing a dual-data-center meltdown before ever having the need to do a disaster recovery.
IP address mobility. Similar to the one above, but caused by cold or hot VM move, resulting in an IP subnet stretched across multiple sites.
You can implement this requirement without stretching a VLAN by using host-route-based IP forwarding as implemented in Local Area Mobility (requires at least Cisco IOS version 10.0), Cisco DFA, Cisco ACI, Cumulus Linux redistribute ARP, or any decent EVPN implementation.
However, as you usually cannot announce host routes to WAN or public Internet, this design effectively creates multi-site summarization boundary. Anyone with production-grade multi-area OSPF experience probably knows how bad this could be; everyone else should figure this one out as a nice homework assignment.
IP multicast. The application needs IP multicast because whatever (the only valid reasons I found: stock exchange feeds and video streaming) and it’s easier to stretch a VLAN than to figure out how to spell PIM (btw, I agree with this conclusion).
Some vendors “solve” this problem by requiring layer-2 connectivity between cluster members (see also: let’s offload our support costs below).
Simplistic routing on multihomed hosts. I strongly suspect that most of we need layer-2 for iSCSI sentiment comes from inability to properly configure IP routing on multihomed iSCSI clients. VMware fixed this in recent vSphere versions, not sure what other vendors are doing.
This one is way harder than one would expect. I started writing a lengthy article on the topic and still haven’t finished it because new worms keep turning up every time you turn around.
Let’s offload our support costs to customer’s networking team. A typical trick used by software vendors is to write requirements that include must have layer-2 connectivity between hosts in a cluster for no reason, and reject support requests from environments that violate this ridiculous CYA approach.
Famous vendors in this category:
- VMware requiring layer-2 connectivity between kernel interfaces to run a TCP-based vMotion session between them (they got to their senses in the meantime);
- Oracle database clusters.
Honorable mentions:
- Nutanix distributed storage.
- Stretched Cisco Hyperflex clusters
Anything else? Write a comment!
Let’s offload error detection to customer’s networking team. It’s easier to rely on Ethernet checksums than to do application-level checksums.
Bad news: with MAC-over-IP solutions like VXLAN-with-EVPN that almost every data center switching vendor is pushing you lose end-to-end Ethernet checksum even if the whole thing still looks like a thick yellow cable.
Non-IP protocols. Hey, it’s 2018. Let’s move on.
We want thick yellow cable. Solutions implementing stupidities that are skirting the edges of valid Ethernet behavior and work well only on a thick yellow cable. I’m looking at you Network Load Balancing.
Wrap-up: It makes me sad that after all these years we still have to deal with ignorance and decade-old stupidities that refuse to die.
Fortunately, the big cloud providers don’t want to budge on this one because they’re focused on making money from running services instead of supporting old crap based on which VP yells louder, so most of the things I mentioned above might die in the next few decades.
For an even more cynical view, join the Building Next-Generation Data Centers online course and listen to what Michele Chubirka has to say about infrastructure, security and DevOps.
The problem is that stretched vlans work if the scenario is small or simple and this make difficult to explain to the people that are using them that those performance problems and hiccups in the network that they are having are because L2 doesn't scale ( I recall other great blog post from Ivan about this topic), also more difficult if the vendor is explaining them that is the way to do it.
I am trying to push VRF functionality into modeling and bare-metal provisioning software based on the new Linux Kernel VRF-lite code (thanks a lot to those who upstreamed this) to avoid L2 stretching or static route usage pad.lv/1737428.
Currently the modeling software helps with endpoint discovery based on the fact that hosts are multi-homed but does not help with routing.
The VRF functionality needs to be there to make sure hosts are not turned into routers (security and operational reasons). At the same time, with VRFs there is no reliance static routes on a given host.
In general, I would say that VIP/FIP-oriented failover mechanisms constrain software deployments to a single L2.
* Keystone (OpenStack) catalog is a good example of that problem: a single hostname can be used as a catalog entry and clients only expect a single hostname;
* Highly-available load-balancers suffer from the same problem.
Doing L3 properly certainly requires more effort: either clients should be clever enough to select from multiple endpoints and do failover correctly or L3 infrastructure should be good enough to support ECMP with hosts using loopback interfaces with correct addresses. Multiple highly-available load-balancers + multiple A records per load-balancer hostname is another option but it just tries to hide the L3 part and has certain limitations with regards to user session management (not to mention that you have to resolve first => think about DNS high-availability).
Or, you can do VIPs on a single L2 which is why software vendors require that (who knows if you meet a qualified enough network team? Will their security team allow peering? Does a software vendor have the right network expertise?).
Thanks for the post!
basicly as a cloud provider you can use your existing 10 or 100g port at decix to offer vlans from your cloud to an isp connected at the decix who shall extend to vlan to their customer. i absolutly did not see any sence but creating a big l2 nightmare. one of the reasons i was presented was to build redundancy concepts (!?!) and to garantee roundtrip times to the cloud services.
so basicly an internet exchange with plenty of clouds connected and plenty of bandwith available offers us a tool to build giantic l2 loops between different clouds and customers!
next step will be to dig out spanning tree to solve the upcoming issues ;-)
me and my collegue startet to have fun about cloud-loops. does a cloud start raining when it loops? what will come out when it starts raining? water or personal sensitive data? if it is personal sensitive date will an umbrella protect you data security officer? hope you feel my irony on this whole l2 topic...
The last paragraph is a keeper - it not only made my day but also brightened up the whole Tech Field Day crew at CLEUR18. Thanks a million!
http://blog.ipspace.net/2013/09/layer-2-extension-otv-use-cases.html
It remains a constant battle for me to guide application designers away from their reliance on stretched VLANs. So thanks for refreshing your warnings about stretched VLANs (I remember your May 2012 article on the subject). I continue to look around for similar guidance from other sources such as the one from Gartner below in 2015. If you know of similar guidance from other sources then it would help further support my case.
https://blogs.gartner.com/andrew-lerner/2015/04/23/stretchdontbreak
Kind Regards,
Peter
Also, most everyone probably decided to leave this corpse to rot while they chase the beautiful intent-based unicorns ;))