Private and Public Clouds, and the Mistakes You Can Make
A few days ago I had a nice chat with Christoph Jaggi about private and public clouds, and the mistakes you can make when building a private cloud – the topics we’ll be discussing in the Designing Infrastructure for Private Clouds workshop @ Data Center Day in Berne in mid-September.
The German version of our talk has been published on Inside-IT; those of you not fluent in German will find the English version below.
What is a private cloud?
A private cloud is very similar to a public cloud, with a single significant difference – the infrastructure is dedicated to the organization using it, resulting in tighter controls and more visible security mechanisms.
Apart from that difference, the private cloud should be no different from the public one – it should offer all the important benefits of a cloud service, including on-demand service provisioning, rapid elasticity, and self-service capabilities.
What are the benefits of a private cloud?
Most organizations running a private cloud decide to do that because they cannot deploy their workloads in a public cloud for a variety of reasons – be it restrictions on data location, security challenges, privacy concerns, or simply lack of trust in outsourced infrastructure.
Is a private cloud more expensive than using a public cloud?
As always, the correct answer is it depends.
If your workload is small, or its future is uncertain, building a private infrastructure to host that workload makes absolutely no sense, as it will always be more expensive than public infrastructure no matter how optimally you’re trying to build it. That’s why you’ll find most (technically competent) small organizations and startups using public clouds.
It’s hard to build and expand your own infrastructure if your workload expands exponentially. Very fast growing companies are thus using public clouds more often than private ones.
Large enterprises with static, well-known workloads typically benefit from using a private cloud infrastructure. Most analysis show that it’s cheaper to run a dedicated infrastructure than paying for use of someone else’s infrastructure… assuming you achieve very high utilization of dedicated infrastructure.
However, even in those organizations you might find workloads that are temporary or highly dynamic – marketing campaigns are typical examples. Using the right combination of private and public clouds is thus the recipe to reduce overall costs.
SDN and NFV: What do they offer in terms of private cloud?
It all depends on two fundamental decisions:
- Will you treat each business unit, application team or even each application as an independent tenant?
- Will you allow tenants to create their own logical networks and network services?
I would strongly recommend that you treat each application as an independent tenant – deploying each application within its own set of networks (and security zones) significantly reduces the inter-application security challenges. In a traditional environment an intruder breaking into one application often succeeds in migrating laterally to other applications. Doing the same across tenant boundaries is usually much harder.
Furthermore, you should encourage the application teams to deploy their own network services (load balancing and firewalling) on demand. This drastic change in perspective usually results in faster deployments and more flexible environment, but also in a total shift of responsibilities.
Assuming you want to deploy highly dynamic private cloud infrastructure using the above-mentioned approaches, SDN and NFV are the only way to go – SDN to provision the dynamic tenant networks (it’s impossible to do manual device provisioning in self-service environment) and NFV to run per-tenant network services in virtual machines.
What are the common pitfalls and how can they be avoided?
I could talk about this one for hours (actually I did in one of my presentations). However, if I would have to choose the worst possible pitfall, it’s the lack of self-service functionality. Many organizations build something that they call private cloud, but it still requires manual intervention from the networking or security department to deploy new logical networks or services. That’s not cloud – that’s just polished server virtualization.
What effect does a private cloud have on application design and delivery?
It depends on the goals of the private cloud. If you’re forced to build an infrastructure that will replicate all the idiosyncrasies of your existing environment, then there won’t be any impact on application design or delivery – but you’ll be left with a complex, expensive, and hard-to-manage infrastructure that will be no different from what you had before.
If you want your private cloud to be cost-competitive with the public clouds, you should take a page out of their playbook – they offer minimal redundancy at the infrastructure level and expect the applications to take care of all potential failures. Such an approach obviously requires a totally different way of designing applications (software developers call it “design for failure”) which is rarely seen within enterprise IT.
It’s often impossible to enforce this radically new application development paradigm when deploying a new private cloud infrastructure. You can, however, try to get rid of the big mistakes made in the past, for example large layer-2 domains or hacks like Microsoft Network Load Balancing.
Hybrid clouds: What design consideration come into play?
The questions you should ask before thinking about hybrid clouds is “where’s the data”, “how will we access that data” and “what will be the impact of increased latency”. It’s surprising how few people consider these questions, and those that don’t usually learn the answers the hard way.
Reality check: What is actually interoperable?
Short answer: not much, particularly at the network- and security front.
Moving data-at-rest between private and public clouds is a solved problem (ignoring bandwidth, latency and transfer time issues). Migrating virtual machines between private and public clouds is relatively easy to do. Recreating logical networks and security controls from existing environment in a public cloud is still Mission Impossible – most hybrid cloud orchestration solutions expect the networking team to do manual synchronization of the logical networks used within the private and public clouds.
VMware Site Recovery Manager 6.1 announced at VMworld seems to be a large step in the right direction… but let’s wait till its NSX integration ships.
Cloud orchestration is another sore spot. It’s possible to use the same orchestration system for private and public clouds… assuming they use the same API (which is mostly the case when using OpenStack across all environments). All other combinations of private and public clouds raise pretty hard orchestration issues, which are commonly solved with plugins that expose functionality of public clouds to private cloud orchestration systems, but don’t provide seamless end-to-end functionality or workload migration capabilities.
Have you actually seen enterprises doing this? I ask, because as you said it is both easier, and a total shift in responsibilities and I agree wholeheartedly. However, I have never (and I emphasize never) seen an app owner understand their network requirements. They have been unable to fill out simple forms, and generally takes much hand holding to fill out said simple form.
Almost every security form comes with access is required with "access is required bidirectionally."
Q. Really, it's tcp who sends the TCP SYN?
A. What??
Q. Who initiates the session?
A. Hmm, let me ask the vendor/programmer, etc..
Q. How do you know you need this access?
A. It's running slow.
Q. OK, let start from the beginning.
Maybe we should make the process as simple as Amazon does? ;)
People have a finite amount of time in their work week. If they are judged on delivering features and learning new toolkits, they aren't going to spend their time trying to understand something they are not judged on. If you push the load balancing and networking responsibility on to their teams, they they will spend time to understand it and will get better at it. I don't know RIP because I never had to use it. I know way more than I want about snmp because I *had* to learn it to get something done.
Another side benefit of team supported infrastructure in the cloud is unlike a traditional datacenter, where there is tons of lateral network access to jump thru, most teams are isolated. If they screwup their security posture, that's a problem, but it only affects them, not the rest of the company. You touched upon it in the article, but in a large company it's a huge advantage over a datacenter where you have to be a fascist about any change that affects security postures.