Interview: Reduce Costs and Gain Efficiencies with SDDC

A few days ago I had an interesting interview with Christoph Jaggi discussing the challenges, changes in mindsets and processes, and other “minor details” one must undertake to gain something from the SDDC concepts. The German version of the interview is published on Inside-IT.ch; you’ll find the English version below.

See you in Bern on September 9th

If you find the topic interesting, and live in Switzerland or reasonably close to Bern, register for the SIGS & Carrier Lunch’s DC Day on September 9th – I’ll have a keynote presentation covering SDDC concepts, and be available the whole day for follow-up discussions.

And now for the interview

In the last years we’ve heard and read a lot about the rigidity of IT operations. Is there some substance behind those claims?

Those claims are absolutely valid. In a typical enterprise data center it might take a few days to implement a new firewall or load balancing rule, and a few weeks to deploy a new application. That’s clearly unacceptable.

Compare that to any properly orchestrated public cloud. For example, the application owners can change security rules or load balancing behavior on Amazon Web Services in seconds using a simple graphical user interface or API calls.

How did we get to that point? What went wrong?

The original sin was probably our misplaced belief in all-encompassing power of infrastructure technology. We allowed the infrastructure vendors to lead us down the path of trying to fix every application shortcoming with more and more complex infrastructure technology instead of fixing the application development process. It was easier to believe the whitepapers than to push back and enforce pretty simple application development rules that would result in scalable applications similar to those developed by large web portals.

The increased complexity of the IT infrastructure (including compute, storage and networking) coupled with manual configuration of critical components has resulted in ever-increasing complexity that is hard to control. Some vendors try to reduce the apparent complexity with simple-looking user interfaces, which help the operators perform their daily jobs faster, but fail miserably when the underlying infrastructure develops unexpected behavior.

Could you give us a few examples?

Let’s start with the product we all know: VMware’s High Availability is supposed to increase the availability of mission-critical applications and many people use it without realizing that it protects us against the hardware failure, but not against operating system crashes, software bugs, or application configuration errors.

How often have your servers failed compared to any other outage in the application stack? Wouldn’t it make more sense to focus on all other sources of application failures and fix them for good by using a scale-out architecture with redundant application servers using a shared database?

Regardless of how it’s used, VMware’s HA remains a great product that removes a significant operational burden – we know that the virtual machines will be restarted following a hardware failure. It also introduces additional complexity: it requires IP address mobility – in most cases a VM restarted on another physical server needs to retain its previous IP address – which is traditionally implemented with virtual LANs (VLAN) in the networking part of the infrastructure.

Now that you’ve mentioned VLANs – are they really as bad as some vendors would like us to believe?

VLANs are not good or evil, but we tend to forget that each VLAN forms a single failure domain. When we stretch VLANs between data centers (to implement VMware High Availability clusters across geographical boundaries) we inadvertently link the fate of two or more data centers – a software or hardware error in one data center could trigger a total meltdown of multiple data centers. This is not just a theoretical threat – it has happened to more than one of my customers.

The networking vendors are working hard to bypass the limitations of VLANs by introducing data center fabric technologies, and new protocols like TRILL, SPB or OTV. These technologies make VLANs more stable – but you pay for that with even more technologies and protocols in your infrastructure.

Compare that to the way most large web providers run their networks – their internal networks are scalable IP networks using the same architectural principles we use to build the global Internet, and they adapt their applications to work over this stable and scalable infrastructure.

What about the new batch of software-defined networking and software-defined data center products? Will they solve our problems?

Some of the emerging technologies address the fundamental flaws of existing products. For example, overlay virtual networks allow us to build application segments without touching the underlying networking infrastructure, removing some of the need for change control and maintenance windows. Network function virtualization (NFV) products enable us to deploy network services like firewalls and load balancers in virtual format, significantly increasing their flexibility.

Unfortunately there’s no silver bullet – the new technologies or products will not solve our fundamental problems. We’ll still have to go through the painful process of re-architecting the whole application development and deployment process. We won’t get the true benefits of these emerging technologies until we make the application development teams responsible for the proper deployment and operation of their applications.

Is that what the DevOps movement is all about?

DevOps (like cloud) is a nice umbrella term for a whole range principles, methodologies and tools. What really matters is that you make everyone, from application developers to operations engineers work together to ensure the deployed applications meet the business needs of the organization.

Assuming I agree with you – how can I start this process?

As with any major paradigm shift, start with small baby steps and pilot projects that will slowly prove the viability of the new approach to the skeptics in your IT organization. I’ll outline some of them in my presentation at the SIGS & Carriers Lunch’s DC Day in September, and if you’re interested in the underlying technologies and applications, you’ll find literally days of presentations and recordings on my web site – start with ipSpace.net/SDN and ipSpace.net/Cloud.

2 comments:

  1. "In a typical enterprise data center it might take a few days to implement a new firewall or load balancing rule, and a few weeks to deploy a new application."

    ...which is usually coming from the change process, enforced by the customer(s) and/or business lines. In this case, an orchestrator with a web based GUI would shorten the few days with a few minutes, not reduce the few days to a few minutes. :)
    Replies
    1. True, I agree it's mostly a process constraint, getting approvals, sign offs, keying in POs and work breakdown structures into SAP, etc ... Then there I are the ITIL like change mgmt hurdles and hoops to jump through and then the customer coordination. I don't think SDN will reduce timelines significantly but where it might bring value is in eliminating errors, self/auto-documentation and determinism over traffic
Add comment
Sidebar