Stop the Network-Based Application Recognition Nonsense
One of my readers sent me an interesting update on the post-QUIC round of NBAR whack-a-mole (TL&DR: everything is better with Bluetooth AI):
So far, so good. However, whenever there’s a change, there’s an opportunity for marketing FUD, coming from the usual direction.
At the same time, we don’t live in a world with infinite bandwidth, and Cisco is looking for ways for ISPs to regain some of that control with the Cisco Ultra Traffic Optimization AI. It is quite well documented in a Cisco Live talk.
I can see both sides of the challenge here, but I would love to hear your opinion about it in one of your blogs.
I always believed that a Service Provider network should be as simple as possible (see also: plumbing)1. It should provide each client with its fair share of resources and ignore the rest. Obviously, that’s not too hard to implement (apart from the “What exactly is a client?” bit and a few other details).
The as simple as possible idea doesn’t work well with “premium” vendors, who try to keep their fat margins by persuading everyone how special their networks should be. Service Provider business development folks who dream of increasing ARPU2 love those fairy tales. The next thing you know, everyone keeps repeating the “OMG, we need traffic engineering or bandwidth management3 based on application recognition to fix broken apps” mantra.
Ignoring the marketing gimmicks, why might we care about recognizing applications? Back to my reader…
Nobody (sane) ever promised that we’d be fair to apps. We have to be fair to everyone paying for our service in the sense that everyone paying the same amount should get an equal share of a congested resource.
With that in mind, how about an alternate idea: instead of deploying Ultra Traffic Optimization AI, yell at your vendor to implement a congestion management mechanism that monitors link utilization by individual users. For example, it could increase the drop probability of a packet if the same user4 already has multiple packets in the same output queue5, or it could keep some sort of longer-term statistics. I’m positive someone already worked on something along these lines and got ignored because the solution is not complex enough.
As for QUIC is a firehose, that could be true, but that would be nothing new; we experienced the same drama at least a half dozen times, starting with “UDP will kill TCP!” Remember the days of “BitTorrent will bring down our networks”? How about the days of “Video will kill our networks”? Either the QUIC-based applications behave politely enough not to be noticed, or we’ll experience another round of countermeasures along the lines of RFC 60576. In the meantime, can we please keep monitoring and running our networks without the unnecessary drama?
-
Enterprise networks are a completely different story, as the various enterprise policies dictate how the network should behave. Hint: enforcing application visibility is easier if you manage to make it a compliance issue ;) ↩︎
-
Average Revenue Per User ↩︎
-
A fancier way of saying “QoS”, potentially safe to use even in environments that have realized how much hard work goes into deploying and operating QoS policies. ↩︎
-
Where a user could be identified by an IPv4 address, an IPv6 prefix, or any other relevant sequence of bits in the packet header (flow label comes to mind). ↩︎
-
Good congestion management mechanisms could be surprisingly simple. See CoDel for more details. ↩︎
-
RFC 6057 is worth reading. It has “protocol agnostic” in the title for a good reason and needs no application recognition to work. ↩︎
There is a Free Software project called LibreQoS that embraces the "fair share per user" idea. It is intended for use by Internet service providers.
I was about to comment on bufferbloat. I've interacted with Dave Täht plenty of times on this topic, and he's IMO the go-to expert on queuing discipline and packet buffering.
LibreQoS is a good middle-box solution for poorly designed networks and/or poor network equipment vendor (aka all the current vendors, none of them support FQ_Codel on hardware). We've even joked about this a few times: https://x.com/DaryllSwer/status/1753146680659308708
In the case of the latter, if every network vendor listened to Dave Täht and implemented FQ_Codel on the ASICs directly and from there, allow the operator to configure bandwidth caps (queues/policer) using FQ_Codel, most of the bufferbloat bs goes away.
It's even worse for Wi-Fi, LTE/5G equipment, the vendors refuse to adopt something like cake-autorate or just plain fq_codel with BQL. Starlink a few months ago did share their “new” router is FQ_Codel enabled and offloaded straight, they did publish some data I believe that showed excessively LOW latency with FQ_Codel enabled.
Dave introduced me to FQ_Codel a few years ago, since then I've deployed FQ_Codel for many ISPs that use MikroTik (which has FQ_Codel, but no BQL support) and bufferbloat/latency at peak has significantly dropped.
I agree with this blog post's general opinion as well. I don't understand this bs about “app based” anything. No, it should be port-based or customer VLAN based at most.
100% in favour of keeping SP networks as clean transit paths instead of packet molestation and mangling along the way - We're already doing that due to governmental pressures for internet censorship/blocking etc.