There Is no Paradigm Shift – Good Applications Were Always Network-Aware
Someone left the following comment on one of my blog posts:
There is a paradigm shift that I don’t think most application developers understand. In a traditional enterprise model, the network is built around the application requirements, now we are saying the application has to build around the network.
I would say there’s no paradigm shift – developers of well-performing applications were always aware of laws of physics.
Honestly, I don’t know when the shift from “we have to use the resources that are available” to “we can do whatever we find the most expedient and blame someone else if things don’t work out” happened as I was off the grid for a few years, but I was definitely surprised when coming back and encountering the servers-following-the-sun madness. After all, if a car manufacturer would claim the roads have to be resurfaced every time they launch a new car to adapt the roads to the flashy new tires they used on the car, they wouldn’t be in business for long.
However, that doesn’t mean the application developers have to know the innards of networking technologies. The anonymous commenter continued:
I have so far seen a set of developers who can’t understand the current dozen choices we give for Load Balancer options and can’t correctly communicate their security needs.
Ahem. Why do you need a dozen choices for load balancers? That unique snowflake mentality makes the network more complex than necessary, and permanently increases your costs. How about sitting down with the developers, documenting the development environment they plan to use, developing a working load balancer (or firewall) configuration while the application developers are still working on the application, and using it as a standardized template from that point onwards.
Finally, continuing with the comment anonymous made:
In my experience developers don’t understand the difference between latency and bandwidth, and are amazed that their application works better on their workstation with a smaller CPU, but with local web, DB and app services, then it does when they split these services across the pond, I just don’t understand how we can bridge that gap anytime soon.
This one is easy to solve – all you need is fifteen minutes analyzing browser waterfall diagrams with them, and there’s probably something in Wireshark that could do the same thing for TCP sessions so they can see how individual components of their application interact.
Insert artificial latency and traffic shaping (WANem is a pretty good tool) and show them how that affects their application’s traffic flows. I’m positive most good developers wouldn’t mind knowing how TCP and HTTP really work (after all, over 45000 people have already registered for my Udemy course).
Many of the application owners struggle with the details. Latency is one those details that is misunderstood when an application was hosted locally and moved even 15-20 ms away. The next conversation goes like this...
"It must be placed in the high priority queue"
"But your circuit is only 10-20% utilized. QoS doesn't kick in"
"Put it there anyway and we'll see because it will be better."
"It won't do anything and it will make our policy more complex"
"Well, I have to escalate this to management then..."
I've wondered why some of the app owners respond this way. I think it comes down to these app owners usually don't look at a problem and break it down to the basics, and usually don't have a strong engineering, scientific, or mathematics background. So they just see the surface of the application without asking things like; how does a client connect; is there a dependency on dns, AD forest or domain: where does the authentication and authorization take place; what kind of data transfer takes place; is it a bulk or serial transfer; does there even need to be a transfer; etc.
Although testing with delay before a migration would illustrate some of these issues, it is often not done for various reasons. What I've done, is shown is a graph of the effects of latency on throughput for simple-TCP, i.e. no windowing. This has been my most effective method of education without getting too deep. It may not change the design or the project, but it may help the application owners understand the performance differences after a migration.
The more scientifically minded application owners will breakdown their application and understand most of the pieces before a centralization project and tend to have a good success rate. All their firewall and load balancing needs are handled before a migration, are well documented, and tend to be simple configurations. If I show the graph, their response is usually, "I understand, but I think we've always used sliding windows... BTW - we need a VIP on the load-balancer, round-robin... we'll take care of persistence on the app side, and we'll check the health and publish a page if the app is up..."