Q&A: What Is a Hyperconverged Infrastructure?
I’m running a hyperconverged infrastructure event with Mitja Robas on April 6th, and so my friend Christoph Jaggi sent me a list of interesting questions, starting with:
What are hyperconverged infrastructures?
The German version of the interview is published on inside-it.ch.
Hyperconverged infrastructure is a marketing term, so it’s not clearly defined. However, usually one talks about hyperconverged infrastructure when a product or solution integrates data center storage and compute resources. This integration is usually achieved by using storage devices in servers to implement a distributed storage solution.
How do they differ from converged infrastructures and from non-converged infrastructures?
Like with hyperconverged infrastructure, there’s no good definition of converged infrastructure. For example, FCoE which combines Fibre Channel with Ethernet is sometimes called converged infrastructure. Converged infrastructure usually translates into some kind of an external shared storage (FC, iSCSI, NFS …) that is more or less integrated (depending of the vendor) with compute resources (i.e. through orchestrators or virtual machine managers like vCenter Server).
Hyperconverged infrastructure is usually implemented with server-based storage, reducing the number of hardware components in the data center, and raising a whole spectrum of storage-related dilemmas and challenges.
What are the benefits of hyperconverged infrastructures?
Reducing the number of hardware components, and replacing storage arrays with less expensive server-based storage are the clear benefits. Some hyperconverged products also simplify storage replication, both within a data center, as well as across multiple data centers. However, the devil is in the details that we will discuss in the DIGS session on April 6th.
Many hyperconverged products come with a pre-installed hypervisor or include a simplified installation process, resulting in quicker deployment.
What are the downsides of hyperconverged infrastructures?
There are several downsides that the hyperconverged vendors don’t like to talk about:
- Increased network utilization – hyperconverged infrastructure uses the network to replicate data across redundant storage nodes, resulting in significant increase in network traffic.
- Increased storage requirements – storage arrays usually use some variant of RAID, resulting in moderate overhead, while hyperconverged infrastructure usually creates multiple copies of data, sometimes resulting on 200% overhead.
- Increased software complexity – distributed storage solution is inherently more complex than traditional storage arrays.
- Relative immaturity – storage arrays are very mature technologies. Many hyperconverged solutions are only a few years old.
- Hardware incompatibilities and issues – even though hyperconverged is preached as software defined solution, in reality selecting proper hardware matters big time.
What are the main use cases?
Hyperconverged solutions are ideal in environments with non-critical data that can tolerate significant RPO. Use cases include VDI servers, private and public clouds, VM disk images, and secondary storage or backup/archiving appliances and systems.
Can they integrate or co-exist with existing infrastructures?
Absolutely. Most hyperconverged solutions use a standard hypervisor (vSphere, Hyper-V or KVM) and present the distributed storage solution as an iSCSI or NFS target. Hypervisor software can still use other storage access methods and (for example) store VM disk images on hyperconverged storage while at the same time storing mission-critical database on a traditional or all-flash storage array.
In any case, iSCSI and NFS are the most commonly used access methods. It’s rare to see hyperconverged solutions using Fibre Channel.
What kind of cost structure comes with a hyperconverged infrastructure (initial cost, maintenance cost, build-out, integration of existing infrastructure)?
While vendors claim the hyperconverged infrastructure is cheaper than a traditional solution, be careful to include all cost elements in your comparison:
- Hyperconverged compute/storage hardware will be cheaper, but will require more storage devices (disks or SSDs) and faster network infrastructure;
- Software licenses will be more expensive (commercial distributed storage software is not cheap). Also keep in mind that the hyperconverged software running on hypervisors needs dedicated CPU resources which are included in the hypervisor license;
- Hardware build-out of a hyperconverged infrastructure will be faster, as you only have to rack-and-stack two types of components: servers and Ethernet switches. Software setup times vary by vendor.
- You will need support for hardware and software. Hyperconverged software support costs might be higher, the hardware support will be way cheaper than what you’re paying for your storage arrays. Also, as you’re using a highly redundant unified compute/storage architecture you don’t need the expensive fast-response maintenance any more.
- Traditional storage landscape has changed significantly in the past few years with the rise of a flash-based storage, not only from technical but also from cost-of-ownership perspective – the non-incumbent vendors can meet or even surpass the cost effectiveness of hyperconverged infrastructure while keeping the “enterprise” storage features – an option that should not be neglected.
For more details and an in-depth discussion of benefits and drawbacks of hyperconverged infrastructure (including lessons learned while running it in production) visit the 5th DIGS Special Interest Group Next-Gen Infrastructure event on April 6th.
So instead of sizing your centralized storage you can start small and expand in time.
Edwin de Graaf
Storage and compute now must scale together. Yes, you can have storage heavy and compute heavy nodes, but then that starts making things not only more confusing, but also potentially less performant (depends on the solution). Storage and compute now must share a life cycle, and that sucks. I replace my storage maybe every 5 years, I replace my compute every 2 – 4. Compute and storage now become a tier all together. Meaning if you want spinning disk and all flash, you need different clusters. That terribly inefficient from a compute cost perspective. The number of nodes you need to have a reasonable level of resiliency is at least three, one + n2.
For me, I’m gonna stick with Nimble Storage, Pure Storage, or any other shared storage solution that lets me keep my various infrastructure layers separate and scale independently as needed. I’d bet a good steak, my infrastructure performs better and costs less too.
That said, VDI and web scale are the two areas where I WOULD run HCI, but I would need a pretty massive environment for it to be worth it. SMB’s or smaller sites may also be another good fit. Full size enterprise though, I think SAN/NAS still makes more sense.
And last but not least, disk based storage is like buying steam train in the age of electric cars ;)