Q&A: Migrating to Modern Data Center Infrastructure

One of my readers sent me a list of questions after watching some of my videos, starting with a generic one:

While working self within large corporations for a long time, I am asking myself how it will be possible to move from messy infrastructure we grew over the years to a modern architecture.

Usually by building a parallel infrastructure and eventually retiring the old one, otherwise you’ll end up with layers of kludges. Obviously, the old infrastructure will lurk around for years (I know people who use this approach and currently run three generations of infrastructure).

We’re discussing most of the challenges this reader raised in the Building Next-Generation Infrastructure online course. You might want to join that conversation – register here.

However, as I'm always telling anyone who's willing to listen (not too many people, I'm afraid): changing the infrastructure won't help unless you also change your mindset and processes, starting with application development and deployment.

Shall Development, Test and Production rather be separated by physical Infrastructure? (e.g. how to test the impact of a VM Kernel upgrade...)

As always, it depends. Many customers run all three environments on the same physical (compute) infrastructure, others have dedicated clusters for dev/test/prod. I don't see many customers having dedicated network infrastructure for each environment.

On the “testing the impact of VM kernel upgrade” – people running efficient operations commonly build VMs from scratch instead of upgrading/patching them and replacing old VM images with new ones. It makes everyone’s life easier as you never have to deal with broken VMs or downgrades.

You can use the same approach with network services appliances if you virtualize them.

Shall Modern Datacenter being addressed by IPv6, or would it be because of Security considerations smarter to use IPv4 private IP's and NAT?

NAT is not a security mechanism, and having network-level firewalls in front of servers makes little sense. On the other hand, load balancing usually involves destination NAT, even in the IPv6 world.

It's time we start treating networking like an engineering discipline where you have to understand what problem you're solving, what tools are available to solve the problem, and which tool is the best tool for the current job given requirements and constraints, instead of trying to find a magic answer to all questions (BTW, it's 42).

How does a transition look like from an existing infrastructure, where app manager have no clue about their Port, Protocol, SRC and DST requirements for their applications? The process you describe in your videos (simplify, standardize, automate...) is a bit too high level from my point of view.

That process is high-level for a reason - to give people an understanding of what needs to be done. Sometimes it helps to have an overview map before starting a journey, particularly if you insist (like most people do) on modifying every single step in the step-by-step directions "just because we're special".

This is only possible if you have very skilled people, understanding the current apps and being able to transition those into a simpler, standardized environment. However, in most cases the app managers do not even understand what they have currently running...

There's your problem. Go fix it. You don't need skilled people, you need people who actually care. BTW, those same developers and app managers think they can bypass the rest of IT and ops teams and deploy their stuff on Amazon. Go figure.

In the end, someone needs to know what's currently running. It's better if the developers know what they're doing instead of ops team trying to reverse-engineer app structure and requirements via IT forensics (see also: Phoenix Project).

Are you facing similar challenges? Building Next-Generation Infrastructure online course probably has at least some answers you’re looking for – register here.

1 comments:

  1. Hi Ivan

    We have three different types of workloads in our environment such as Compute, Caching and Storage.

    We have three different infras for our set up. There is a caveat which you have mentioned in earlier posts that sharing Compute and caching workloads brings us the QOS mapping issues and also burstiness seems to be a issue with different buffer sizes required and shares same fate. Do you throw some more light to it or do you have any whitepaper associated to it.
Add comment
Sidebar