The Impact of Data Gravity: a Campfire Story
Here’s an interesting story illustrating the potential pitfalls of multi-DC deployments and the impact of data gravity on application performance.
Long long time ago on a cloudy planet far far away, a multinational organization decided to centralize their IT operations and move all workloads into a central private cloud.
They knew what they were doing (after all, it wasn’t their first workload migration) and carefully prepared the infrastructure, analyzed the dependencies in the application stack, and moved the whole application stack to the central private cloud (including the underlying database).
The results were dismal: transactions that took milliseconds before the migration now took several seconds.
The enlightened gurus quickly identified the only possible culprit: it must be The Network. After all, the users were in the same campus network as the application prior to the migration, and had to traverse the Internet (or private WAN) to get to the new private cloud after the migration. Nothing else changed but the underlying network infrastructure. Case closed.
It’s almost impossible to prove the poor Network innocent once the judgment in absentia has been passed. No amount of traceroutes, latency measurements or other probes will ever persuade non-networking people that the network is not the problem (but try using Thousand Eyes – the jury might be swayed by nice-looking diagrams and graphs).
The situation on the cloudy planet was no different. The networking engineers did their measurements that clearly showed there’s no network problem – latencies and round-trip times were in the expected range, there was plenty of bandwidth and packet drops were negligible. Still, nobody believed those measurements until the whole problem exploded and forced the application and database teams to start troubleshooting their respective silos.
As it turned out, the application did use just one database, which was moved with the application, but there was another small database with user credentials and other user details, and that one wasn’t moved to the central private cloud due to local privacy protection laws. The database server running in the central private cloud thus continuously queried the remote database, adding tens of milliseconds to the transaction processing time with each query.
6 comments: