Category: vMotion
Vendor Marketectures in Real Life
Remember my rants about VMware and firewall vendors promoting crazy solutions that work best in PowerPoint and cause more headaches than anything else (excluding increased vendor margins and sales team bonuses, of course)?
Here’s another we-don’t-need-all-that-complexity real-life story coming from one of my long-term subscribers:
Are Business Needs Just Excuses for Vendor Shenanigans?
Every now and then I call someone’s baby ugly (or maybe it was their third cousin’s baby and they nonetheless feel offended). In such cases a common resort is to cite business or market needs to prove how ignorant and clueless I am. Here’s a sample LinkedIn comment talking about my ignorance about the need for smart NICs:
The rise of custom silicon by Presando [sic], Mellanox, Amazon, Intel and others confirms there is a real market need.
Now let’s get something straight: while there are good reasons to use tons of different things that might look inappropriate, irrelevant or plain stupid to an outsider, I don’t believe in real market need argument being used to justify anything without supporting technical facts (tell me why you need that stuff and prove to me that using it is the best way of solving a problem).
Disaster Recovery: a Vendor Marketing Tale
Several engineers formerly working for a large virtualization vendor were pretty upset with me when I claimed that the virtualization consultants promote “disaster recovery using stretched VLANs” designs instead of alternatives that would implement proper separation of failure domains.
Guess what… it’s even worse than I thought.
Here’s a sequence of comments I received after reposting one of my “disaster recovery doesn’t need stretched VLANs” blog posts on LinkedIn sometime in late 2019:
Worth Reading: The Burning Bag of Dung
Loved the article from Philip Laplante about environmental antipatterns. I’ve seen plenty of founderitis and shoeless children in my life, but it was worshipping the golden calf that made me LOL:
In any environment where there is poor vision or leadership, it is often convenient to lay one’s hopes on a technology or a methodology about which little is known, thereby providing a hope for some miracle. Since no one really understands the technology, methodology, or practice, it is difficult to dismiss. This is an environmental antipattern because it is based on a collective suspension of disbelief and greed, which couldn’t be sustained by one or a few individuals embracing the ridiculous.
That paragraph totally describes the belief in the magical powers of long-distance vMotion, SDN (I published a whole book debunking its magical powers), building networks like Google does it, intent-based whatever, machine learning…
When All You Have Are Stretched VLANs...
Let’s agree for a millisecond that you can’t find any other way to migrate your workload into a public cloud than to move the existing VMs one-by-one without renumbering them. Doing a clumsy cloud migration like this will get you the headaches and the cloud bill you deserve, but that’s a different story. Today we’ll talk about being clumsy the right and the wrong way.
There are two ways of solving today’s challenge:
Vendor Marketing: Theory and Reality
Every time I’m complaining about the stupidities $vendors are trying to sell us, someone from vendorland saddles a high horse and starts telling me how I got it all wrong, for example:
It is a duty of a pre-sales, consultant, vendor representative to inform the customer about the risk.
When you stop laughing (maybe it was just April Fools’ joke ;), here’s how the reality of that process looks like (straight from one of my readers):
I remember when the VM guys and their managers were telling me (like they had discovered the solutions to all of ours problems) about “with VXLAN we can move a machine from one country to another, and keep having service with the same IP” … while looking at me with the “I’m so smart” face… and me thinking shit… I’m doomed :) … I don’t even want to start explaining … but in the long run I had to anyway.
The Myth of Lossless vMotion
As a response to my Live vMotion into VMware-on-AWS Cloud blog post Nico Vilbert pointed me to his blog post explaining the details of cross-Atlantic vMotion into AWS.
Today I will not go into yet another rant pointing out all the things that can go wrong, but focus on a minor detail: “no ping was dropped in the process.”
The vMotion is instantaneous and lossless myth has been propagated since the early days of vMotion when sysadmins proudly demonstrated what seemed to be pure magic to amazed audiences… including the now-traditional terminal window running ping and not losing a single packet.
Live vMotion into VMware-on-AWS Cloud
Considering VMware’s enrapturement with vMotion the following news (reported by Salman Naqvi in a comment to my blog post) was clearly inevitable:
I was surprised to learn that LIVE vMotion is supported between on-premise and Vmware on AWS Cloud
What’s more interesting is how did they manage to do it?
How Many vMotion Events Can You Expect in a Data Center?
One of my friends sent me this question:
How many VM moves do you see in a medium and how many in a large data center environment per second and per minute? What would be a reasonable maximum?
Obviously the answer to the first part is it depends (please share your experience in the comments), so we’ll focus on the second one. It’s time for another Fermi estimate.
The Grumpy Old Network Architects and Facebook
Nuno wrote an interesting comment to my Stretched Firewalls across L3 DCI blog post:
You're an old school, disciplined networking leader that architects networks based on rock-solid, time-tested designs. But it seems that the prevailing fashion in network design and availability go against your traditional design principles: inter-site firewall clustering, inter-site vMotion, DCI, etc.
Not so fast, my young padawan.
Let’s define prevailing fashion first. You might define it as Kool-Aid id peddled by snake oil salesmen or cool network designs by people who know what they’re doing. If we stick with the first definition, you’re absolutely right.
Now let’s look at the second camp: how people who know what they’re doing build their network (Amazon VPC, Microsoft Azure or Bing, Google, Facebook, a number of other large-scale networks). You’ll find L3 down to ToR switch (or even virtual switch), and absolutely no inter-site vMotion or clustering – because they don’t want to bet their service, ads or likes on the whims of technology that was designed to emulate thick yellow cable.
This isn't the first time that readers have asked you about these technologies, and it won't be the last. Vendors will continue to market them despite their shortcomings, and customers will continue to eat them up.
As long as there will be someone willing to believe in fairy tales and Santa Claus, there will be someone dressed in red coat and fake beard yelling “Ho, Ho, Ho!”
Enterprise IT managers sometimes act like small kids. They don’t want to hear that they have people- and process problems, and love to believe that the next magical bit of technology will solve whatever it is that bothers them. Vendors obviously love to explore these cravings and sell them ever-more-complex solutions.
I'd like to think that vendors will also continue to work out the kinks and over time the technology will become rock solid and time-tested.
I am positive you can make any technology almost-rock-solid. You can also make pigs fly (see RFC 1925 sect. 2.3). However, have you included the fuel costs in your TCO?
Also, the more complex a technology is, the likelier it is to crash down like a house of cards, and you’ll be left with an incomprehensible mix of bits and pieces that will be impossible to put back together (see also: You can’t reformat your data center).
Nino concluded his comment with a question:
Are you too stuck on past, traditional designs and not being open to new ways of building IT? I get that IT is very cyclical, and these new trends may die in the future...or thrive, and the customers may either fail...or succeed.
I am very open to new ways of building IT. I preach the need for meaningful SDN (not the centralized control plane crap), network automation, and proper application architecture. I just refuse to believe in fairy tales, and solving non-technical problems with technology.
Finally…
Looking for more red pills? Explore my SDN webinars, Designing Active/Active Data Centers webinar, and vMotion-related blog posts.
Is Anyone Using Long-Distance VM Mobility in Production?
I had fun times participating in a discussion focused on whether it makes sense to deploy OTV+LISP in a new data center deployment. Someone quickly pointed out the elephant in the room:
How many LISP VM mobility installs has anyone on this list been involved with or heard of being successfully deployed? How many VM mobility installs in general, where the VMs go at least 1,000 miles? I'm curious as to what the success rate for that stuff is.
I think we got one semi-qualifying response, so I made it even simpler ;)
Blessed by Gartner: Stretched VLANs Make Little Sense
One of my readers recently pointed me to a blog post written by Andrew Lerner from Gartner describing the drawbacks of stretched VLANs.
TL&DR: He’s saying more-or-less the same things I’ve been preaching for years. Now I can put Blessed by Gartner logo on my blog posts ;), and you can use the report to sway your CIO.
Before Talking about vMotion across Continents, Read This
I expect to hear a lot about the “wonderful” idea of moving running VMs 100 msec away (across the continent) in the upcoming weeks. I would recommend you read a few of my older blog posts before considering it… and don’t waste time trying to persuade the true believers with technical arguments – talk with whoever will foot the bill or walk away.
Latency: the Killer of Spread-Out Application Stack Ideas
A few months ago I described how bandwidth limitations shatter the dreams of spread-out application stacks with elements residing (or being dynamically migrated) between data centers. Today let’s focus on bandwidth’s ugly cousin: latency.
TL&DR Summary: Spreading the server components of an application across multiple locations (multiple data centers or hybrid cloud deployments) can easily result in dismal performance even when there’s plenty of bandwidth available.
Hotel California Effects of Public Clouds
In his The Case for Hybrids blog post Mat Mathews described the Hotel California effect of public clouds as: “One of the most oft mentioned issues with public cloud is the difficulty in getting out.” Once you start relying on cloud provider APIs to provide DNS, load balancing, CDN, content hosting, security groups, and a plethora of other services, it’s impossible to get out.
Interestingly, the side effects of public cloud deployments extend into the realm of application programming, as I was surprised to find out during one of my Expert Express engagements.