Revisited: The Need for Stretched VLANs

Tuesday, January 30, 2018 09:31 +0100

Revisited: The Need for Stretched VLANs

Regardless of how much I write about (the ridiculousness of using) stretched VLANs, I keep getting questions along the same lines. This time it’s:

What type of applications require L2 Extension and L3 extension?

I don’t think I’ve seen anyone use L3 extension (after all, isn’t that what Internet is all about), so let’s focus on the first one.

Stretched VLANs (or L2 extensions) are used to solve a number of unrelated problems, because once a vendor sold you a hammer everything starts looking like a nail, and once you get used to replacing everything with nails, you want to use them in all possible environments, including public and hybrid clouds.

Some of the challenges that I’ve seen solved with stretched VLANs include:

Subnet mobility. You must move a subnet from one site to another during disaster recovery process because the subnet (or IP addresses within the subnet) is hard-coded in application, configuration files, or firewall/load balancer rules.

This one is easiest to solve: start using automation and make network infrastructure recovery part of your overall recovery process.

Alternatively, configure the same subnet on VLAN interfaces or firewall contexts that are shutdown during the regular operation, and enabled when needed.

For whatever reason, most everyone solves this one by stretching a VLAN between data centers (because VMware consultants told them to do so) and then experiencing a dual-data-center meltdown before ever having the need to do a disaster recovery.

IP address mobility. Similar to the one above, but caused by cold or hot VM move, resulting in an IP subnet stretched across multiple sites.

You can implement this requirement without stretching a VLAN by using host-route-based IP forwarding as implemented in Local Area Mobility (requires at least Cisco IOS version 10.0), Cisco DFA, Cisco ACI, Cumulus Linux redistribute ARP, or any decent EVPN implementation.

However, as you usually cannot announce host routes to WAN or public Internet, this design effectively creates multi-site summarization boundary. Anyone with production-grade multi-area OSPF experience probably knows how bad this could be; everyone else should figure this one out as a nice homework assignment.

IP multicast. The application needs IP multicast because whatever (the only valid reasons I found: stock exchange feeds and video streaming) and it’s easier to stretch a VLAN than to figure out how to spell PIM (btw, I agree with this conclusion).

Some vendors “solve” this problem by requiring layer-2 connectivity between cluster members (see also: let’s offload our support costs below).

Simplistic routing on multihomed hosts. I strongly suspect that most of we need layer-2 for iSCSI sentiment comes from inability to properly configure IP routing on multihomed iSCSI clients. VMware fixed this in recent vSphere versions, not sure what other vendors are doing.

This one is way harder than one would expect. I started writing a lengthy article on the topic and still haven’t finished it because new worms keep turning up every time you turn around.

Let’s offload our support costs to customer’s networking team. A typical trick used by software vendors is to write requirements that include must have layer-2 connectivity between hosts in a cluster for no reason, and reject support requests from environments that violate this ridiculous CYA approach.

Famous vendors in this category:

VMware requiring layer-2 connectivity between kernel interfaces to run a TCP-based vMotion session between them (they got to their senses in the meantime);
Oracle database clusters.

Honorable mentions:

Nutanix distributed storage.
Stretched Cisco Hyperflex clusters

The response I got from the Hyperflex presenter @ Tech Field Day Extra CLEUR 2018: "yes, we're using pure IP transport without IP multicast, and no, we haven't tested and validated our solution for use over routed networks." Now you know.

Anything else? Write a comment!

Let’s offload error detection to customer’s networking team. It’s easier to rely on Ethernet checksums than to do application-level checksums.

Bad news: with MAC-over-IP solutions like VXLAN-with-EVPN that almost every data center switching vendor is pushing you lose end-to-end Ethernet checksum even if the whole thing still looks like a thick yellow cable.

Non-IP protocols. Hey, it’s 2018. Let’s move on.

We want thick yellow cable. Solutions implementing stupidities that are skirting the edges of valid Ethernet behavior and work well only on a thick yellow cable. I’m looking at you Network Load Balancing.

Wrap-up: It makes me sad that after all these years we still have to deal with ignorance and decade-old stupidities that refuse to die.

Fortunately, the big cloud providers don’t want to budge on this one because they’re focused on making money from running services instead of supporting old crap based on which VP yells louder, so most of the things I mentioned above might die in the next few decades.

For an even more cynical view, join the Building Next-Generation Data Centers online course and listen to what Michele Chubirka has to say about infrastructure, security and DevOps.

Latest blog posts in Disaster Recovery series

Recent posts in the same categories

design

switching

data center

14 comments:

Anonymous 30 January 2018 10:50

I have another "good reason" why you need L2 connectivity. Couple of years ago one engineer wants to implement L2 between remote locations and datacenter. The reason for that was to simplify the network, so that workers/devices on the central location will have the same connectivity. So you can prepare the printer with the IP address and you can simply plug the printer on the remote location and everything is working... :)

Replies

Anonymous 30 January 2018 10:56

I had to read it twice to confirm it was meant as sarcasm...

Anonymous 30 January 2018 11:56

Me too .. :) .. Except this actually happened. Having implemented redundant L3 connectivity from a remote location to a DC with a neat, repeatable design it was all torn down and replaced with a L2 extension because "it made it easier to image workstations" ...

Antonio Ojea 31 January 2018 09:15

Well, for me this is the main reason that people use them, "to simplify the network".
The problem is that stretched vlans work if the scenario is small or simple and this make difficult to explain to the people that are using them that those performance problems and hiccups in the network that they are having are because L2 doesn't scale ( I recall other great blog post from Ivan about this topic), also more difficult if the vendor is explaining them that is the way to do it.

Jeff Behrns 30 January 2018 12:39

Great stuff as usual sir. Even if you stretch using a method that is somewhat well thought out like oh-tee-vee, when you throw a poorly designed app on top you will ultimately start hacking the DCI to make the app work thus putting both DCs at even higher risk of meltdown. Example: disabling arp suppression to make NLB work. And people wonder why cloud providers are taking over. Enterprise vendors think they are solving customer problems with overlay monstrosities when they are actually just gifting the clouds more business. Stop re-packaging old world tech if you want to survive.

Dmitrii S. 30 January 2018 15:01

+1 on simplistic routing

I am trying to push VRF functionality into modeling and bare-metal provisioning software based on the new Linux Kernel VRF-lite code (thanks a lot to those who upstreamed this) to avoid L2 stretching or static route usage pad.lv/1737428.

Currently the modeling software helps with endpoint discovery based on the fact that hosts are multi-homed but does not help with routing.

The VRF functionality needs to be there to make sure hosts are not turned into routers (security and operational reasons). At the same time, with VRFs there is no reliance static routes on a given host.

In general, I would say that VIP/FIP-oriented failover mechanisms constrain software deployments to a single L2.

* Keystone (OpenStack) catalog is a good example of that problem: a single hostname can be used as a catalog entry and clients only expect a single hostname;
* Highly-available load-balancers suffer from the same problem.

Doing L3 properly certainly requires more effort: either clients should be clever enough to select from multiple endpoints and do failover correctly or L3 infrastructure should be good enough to support ECMP with hosts using loopback interfaces with correct addresses. Multiple highly-available load-balancers + multiple A records per load-balancer hostname is another option but it just tries to hide the L3 part and has certain limitations with regards to user session management (not to mention that you have to resolve first => think about DNS high-availability).

Or, you can do VIPs on a single L2 which is why software vendors require that (who knows if you meet a qualified enough network team? Will their security team allow peering? Does a software vendor have the right network expertise?).

Thanks for the post!

Unknown 30 January 2018 23:04

i met with a couple of decix (big german internet exchange) guys in hamburg last week. they try to offer a product named direct cloud access. i took one of them with me cause we had basicly the same 2h drive to get to our beds and we had a lengthy discussion regarding this topic, how they try to place it on the market and to which people they are trying to sell what.

basicly as a cloud provider you can use your existing 10 or 100g port at decix to offer vlans from your cloud to an isp connected at the decix who shall extend to vlan to their customer. i absolutly did not see any sence but creating a big l2 nightmare. one of the reasons i was presented was to build redundancy concepts (!?!) and to garantee roundtrip times to the cloud services.

so basicly an internet exchange with plenty of clouds connected and plenty of bandwith available offers us a tool to build giantic l2 loops between different clouds and customers!

next step will be to dig out spanning tree to solve the upcoming issues ;-)

me and my collegue startet to have fun about cloud-loops. does a cloud start raining when it loops? what will come out when it starts raining? water or personal sensitive data? if it is personal sensitive date will an umbrella protect you data security officer? hope you feel my irony on this whole l2 topic...

Replies

Ivan Pepelnjak 31 January 2018 08:47

As weird as it sounds, P2P VLAN between your DC and cloud is the best you can do - you don't want to deal with someone in the middle messing up your routing tables. See http://blog.ipspace.net/2012/07/the-difference-between-metro-ethernet.html for details.

The last paragraph is a keeper - it not only made my day but also brightened up the whole Tech Field Day crew at CLEUR18. Thanks a million!

Unknown 31 January 2018 23:01

ok, when you take it that way and use it as an ethernet link to run l3 over it and use a ipsec tunnel over the internet as a backup it might make sense. but give the people the ability to get layer 2 curcuit and they will start doing stupid stuff

Unknown 05 February 2018 09:26

Recently I feel like it's really vendors pushing layer 2 solutions, rather than us (enterprise customer) demanding it. As example Cisco have been really aggressively pushing us to buy into their SDA solution. It currently seems to rely mostly on VXLAN overlays that let you stretch layer 2 domains. The selling point being you can click buttons in a GUI and just pretend its magic and not a layer 2 overlay. I was told by someone from Cisco "you don't need to worry about what it's doing"... which sort of translated to "shut up and buy it". Large layer 2 domains are something we've been slowly moving away from for years, we no longer have the same reliance on layer 2 connectivity in the majority of the campus. Most of our legacy applications are long gone and in recent times we've been trying to ensure no "lazy" application are purchased.

Anonymous 16 February 2018 19:33

Biggest need for stretched vlans I've seen is DC migrations when the entity cannot or will not attempt to Re-IP.

Replies

Ivan Pepelnjak 18 February 2018 10:42

I would say this is a special case of IP address mobility. See also

http://blog.ipspace.net/2013/09/layer-2-extension-otv-use-cases.html

Anonymous 28 February 2018 08:54

Hi Ivan,
It remains a constant battle for me to guide application designers away from their reliance on stretched VLANs. So thanks for refreshing your warnings about stretched VLANs (I remember your May 2012 article on the subject). I continue to look around for similar guidance from other sources such as the one from Gartner below in 2015. If you know of similar guidance from other sources then it would help further support my case.
https://blogs.gartner.com/andrew-lerner/2015/04/23/stretchdontbreak
Kind Regards,
Peter

Replies

Ivan Pepelnjak 28 February 2018 13:18

Haven't found anything similarly useful. You need some operational experience (and related scars) to figure out why stretched VLANs are so bad, and most so-called thought leaders lack both.

Also, most everyone probably decided to leave this corpse to rot while they chase the beautiful intent-based unicorns ;))

Add comment