VSAN: As Always, Latency Is the Real Killer

Wednesday, September 23, 2015 09:00 +0200

VSAN: As Always, Latency Is the Real Killer

When I wrote my stretched VSAN post, I thought VSAN uses asynchronous replication across WAN. Duncan Epping quickly pointed out that it uses synchronous replication, and I fixed the blog post.

The “What about latency?” question immediately arose somewhere in my subconscious, but before I could add that thought to the blog post, Anders Henke wrote a lengthy comment that totally captured what I was thinking, so I’m including it in its entirety:

Note that any kind of synchronous replication also suffers from the extra network latency. Having said this, VMware's VSAN must be designed for a local network only.

They do have dedicated products for asynchronous remote replication, and one can probably combine VSAN with them. But please don't ignore the added latency and physics :-)

As a worst-case example: your HDD has an average rotational latency of around 2-3ms - the time until a sector can be read or written. Assuming the sector is written instantly, it will still take 2ms on an average write operation.

If you're doing replication in the metro area with a millisecond roundtrip of overall network latency, this latency will add up for any write requests: your remote HDD probably won't have its data committed in 2ms, but in 2+1=3ms.

You could use SDDs in the hypervisor hosts for local caching, reducing the overall latency to (almost) WAN latency, but then the difference between local VSAN and stretched VSAN would be even worse. See also below.

Also note that VMware claims the maximum latency supported for stretched VSAN is 5 msec. By now you should be able to figure out what that does to your write performance.

Depending on what your application actually does and how often synchronous data is forced onto disk, sync replication in this setup may functionally decrease the overall hard disk performance by up to 50%.

Of course, in real life the various writeback-caches in operating systems, hypervisors, RAID controllers and hard disks lie about having something "really" written onto disk, so those 50% are "worst case" for "every single sector/block is forced to disk". Even if the write is not forced to disk, the network latency still adds up before the remote system can promise "having it written". So overall, the network latency adds up to the access time.

As I said above: when you add writeback caches to the picture to improve performance, the WAN latency becomes an even bigger problem.

Even in a standard OLTP mix (70% read, 30% write), the impact of high-latency writes is obvious: the read performance doesn't change, the write performance gets noticeably worse.

If your application doesn't cope with extra latency on writes and you still require synchronous writes, you may need to switch from HDD to SSD (reducing the local access time close from 2ms to zero, leaving you with pure network latency).

With more remote locations, the problem becomes worse: 3ms is negligible in world of WAN, but if your 2ms hard disk suddenly takes 5ms before some data can be written, it is a considerable decrease.

And when your top-notch high performance database's average write latency suddenly jumps from 0.1ms (SSD) to 3.1ms (remote SSD), someone will probably notice (+3000%).

Summary: As always, think before you jump, don't believe in bandwidth fairy, and consider all the implications of stretched technologies… but if you’re a regular reader of my blog, you probably know that by now ;)

Recent posts in the same categories

SAN

WAN

virtualization

7 comments:

Duncan Epping (VMware) 23 September 2015 09:40

I think what you guys forgot is that Virtual SAN always writes to SSD. That is how the architecture has been designed from the start. We take advantage of the SSD, buffer the writes and then destage them when needed. The write is acknowledged to the Guest OS / Application as soon as it hits the SSD buffer. So the latency for a write to a device like this will not be 2ms but much lower than that.

I understand what he is trying to say, but we are forgetting that we are trying to solve a business problem here. Any stretched storage platform has the same challenge when it comes to latency, yet NetApp Metro, EMC VPLEX, 3Par etc etc are still relatively popular solutions. Why? Well simply because in many cases it is 10x easier to provide this level of resiliency through an infrastructure level solution rather than to rely on 3rd party application providers to change their full architecture to provide you the resiliency you need. As you know getting large vendors to change their application architecture isn't easy, and can take years... if at all.

These types of solutions are developed for relatively short distances, and still relatively low latency. Sure it has been validated to be able to incur a hit of 5ms, that doesn't mean that from a customer point of view this would be acceptable. That decision is up to the customer. Same applies to bandwidth, what can your afford, what is available in your region / between sites etc.

Stretched infrastructures are not easy to architect, or deploy for that matter, but I truly believe with Virtual SAN we made the storage aspects 10x easier to manage and deploy than they have ever been before.

Replies

Anonymous 23 September 2015 10:42

Hi Duncan. How is it possible to buffer/later destage writes if the replication is synchronous? To my understanding synchronous implies buffering is not possible? Unless you mean buffering on both sides of the stretched VSAN, in which case the issue of latency still stands...

Duncan Epping (VMware) 23 September 2015 12:03

It is buffered on the SSD which is persistent on both sides. And yes the network latency stands, but you remove the drive latency.

That was not the point I was trying to get across. I do think that customers are concerned about latency, at the same time they are concerned about availability. It is for them to figure out what is more important.

Anonymous 25 September 2015 06:26

And the latency of an SSD is *not* 0.1 ms under non-trivial workload. Think more like 1 ms for a 4K write. So, add the network latency between the two sites (vmware's recommendation is < 5 ms RTT) to 1 ms. That's the price to pay if you want rpo=0 and protection from a site failure. There is no free lunch with the physics of this universe.

Replies

Duncan Epping (VMware) 25 September 2015 09:19

Of course there isn't a free lunch... (latency will vary depending on the type of drive you have. NVMe and Diable for instance can provide low latency even under load)

What is the alternative you have? I am not sure what the point is people are trying to make with discussions like these. It is not like it is easy to get a whole application architecture changed.

Ivan Pepelnjak 25 September 2015 09:29

I can't say what points other people are trying to make, mine was very simple: know how things work, and carefully consider the consequences.

I'm positive most people reading about stretched VSAN never considered the impact of additional latency (even I thought it didn't matter).

Ivan Pepelnjak 27 November 2015 11:18

Just FYI: VMware documented the stretched VSAN bandwidth requirements:

http://www.vmware.com/files/pdf/products/vsan/vmware-virtual-san-6.1-stretched-cluster-bandwidth-sizing.pdf

Add comment