The Impact of Data Gravity: a Campfire Story

Thursday, August 21, 2014 08:55 +0200

The Impact of Data Gravity: a Campfire Story

Here’s an interesting story illustrating the potential pitfalls of multi-DC deployments and the impact of data gravity on application performance.

Long long time ago on a cloudy planet far far away, a multinational organization decided to centralize their IT operations and move all workloads into a central private cloud.

They knew what they were doing (after all, it wasn’t their first workload migration) and carefully prepared the infrastructure, analyzed the dependencies in the application stack, and moved the whole application stack to the central private cloud (including the underlying database).

The results were dismal: transactions that took milliseconds before the migration now took several seconds.

The enlightened gurus quickly identified the only possible culprit: it must be The Network. After all, the users were in the same campus network as the application prior to the migration, and had to traverse the Internet (or private WAN) to get to the new private cloud after the migration. Nothing else changed but the underlying network infrastructure. Case closed.

It’s almost impossible to prove the poor Network innocent once the judgment in absentia has been passed. No amount of traceroutes, latency measurements or other probes will ever persuade non-networking people that the network is not the problem (but try using Thousand Eyes – the jury might be swayed by nice-looking diagrams and graphs).

The situation on the cloudy planet was no different. The networking engineers did their measurements that clearly showed there’s no network problem – latencies and round-trip times were in the expected range, there was plenty of bandwidth and packet drops were negligible. Still, nobody believed those measurements until the whole problem exploded and forced the application and database teams to start troubleshooting their respective silos.

As it turned out, the application did use just one database, which was moved with the application, but there was another small database with user credentials and other user details, and that one wasn’t moved to the central private cloud due to local privacy protection laws. The database server running in the central private cloud thus continuously queried the remote database, adding tens of milliseconds to the transaction processing time with each query.

6 comments:

Anonymous 21 August 2014 15:38

So, it still was the network. The added latency between the authentication server and the application is delaying processing. :P

Randall Greer 21 August 2014 15:50

It's not the network team's job to identify every dependency that every application required, in this example the network conditions were found to be fine but someone on the application team neglected to test the effects of putting some distance between the main db and the user db

Anonymous 21 August 2014 17:27

It is network engineers resposibility to understand applications' packet flow.

Anonymous 21 August 2014 23:00

Completely impractical at a large enterprise environment where there are literally thousands of applications. There is always need to balance the responsibilities between the network engineer and application technical owner.

Anonymous 22 August 2014 00:06

This is gonna be my new bedtime story. I'm raising a "network" guy.

Anonymous 22 August 2014 13:38

Blaming netwotk is the simplest way, but it is not solution - network guys will not break laws of physics. In good organization everybody (net, admins and devs) should work together and have one common target. And everybody should undarstand laws of physics.

Recent posts in the same categories

data center

WAN

6 comments: