Congestion Control Algorithms Are Not Fair
Creating a mathematical model of queuing in a distributed system is hard (Queuing Theory was one of the most challenging ipSpace.net webinars so far), and so instead of solutions based on control theory and mathematical models we often get what seems to be promising stuff.
Things that look intuitively promising aren’t always what we expect them to be, at least according to an MIT group that analyzed delay-bounding TCP congestion control algorithms (CCA) and found that most of them result in unfair distribution of bandwidth across parallel flows in scenarios that diverge from spherical cow in vacuum. Even worse, they claim that:
[…] Our paper provides a detailed model and rigorous proof that shows how all delay-bounding, delay-convergent CCAs must suffer from such problems.
It seems QoS will remain spaghetti-throwing black magic for a bit longer…
I am no CCA guy, but i think the fact that no CCAs (delay-bounded or not) is completely fair in all cases has been established a long time ago. As i understand it, the goal is to approach fairness as much as possible while avoiding extreme cases of unfairness (starvation). In that sense, what the MIT guys proved strictly speaking also is that starvation in some cases cannot be avoided with delay-bounded CCAs.
Personally, i found this interesting: "Rather than worry about perfect flow-level fairness, which might not be an interesting goal in practice [7 ], we seek to be 𝑠-fair; i.e., bound unfairness to a maximum specified throughput ratio of 𝑠 > 1." with the cited paper being https://www.researchgate.net/publication/220194938_Flow_Rate_Fairness_Dismantling_a_Religion
I always tried to evaluate on something like Jain's fairness, but maybe i need to reconsider.
Concerning queuing theory: I often find that the problem is not that queuing theory (or network calculus) is too difficult to grasp, but rather that either (1) the assumptions that are made in the models about a priori knowledge of network traffic characteristics, about no delay being needed to share network state between distributed nodes and about that there are no performance constraints for an algorithm to be implemented in the end are unrealistic, or (2) the resulting models have too many variables so that hard statements can only be given on very weak upper or lower bounds, rendering their usefulness limited (but definitely still worthwhile).