Nelson's log

Complex QOS rules considered harmful

tl;dr: some router firmware has a catch-all rule that throttles all unidentified UDP traffic to 5% of bandwidth (labelled “Crawl”). This is a stupid rule, disable it.

I just fixed a bug in my router’s configuration that explained why Google QUIC was not working well for me. It may also explain bugs I’ve been seeing in League of Legends, OpenVPN, and other UDP protocols. I’m not entirely certain.

I’ve been running the Tomato v1.28 (Toastman) firmware for a year+ now. It’s an old build. It has 40+ default QoS rules identifying all sorts of protocols from important ones (DNS) to silly ones (RealAudio streaming), and then classifies traffic service level. Unfortunately some of the rules are harmful.

The problem rule in this case was the very last one. “UDP Dst Port: 1-65535, classify Crawl”. And Crawl by default is limited to maximum 5% of total bandwidth! There are a few higher priority rules that classify specific kinds of UDP traffic: DNS, for instance. But any new or unanticipated use of UDP is severely throttled. Such as QUIC, Google’s fancy new web protocol. And Cisco VPN. And maybe OpenVPN.

And maybe League of Legends; it’s a UDP protocol too, and hasn’t performed as well on my slow network as I expected. Just playing a game feels about the same, maybe a little less laggy, but there’s still the same unexpectedly high packet loss. But I think one reproducible bug is gone now. Jayce gates cause a brief surge of UDP packets; it used to be that caused significant lag even when playing alone. Now they don’t cause lag.

The simple fix is to adjust the Crawl class to also get up to 100% of bandwidth (both inbound and outbound). That may still have lower queue priority though. You can also try adding more rules for UDP protocols you care about; QUIC is on ports 80 and 443, for instance. But trying to label all known UDP protocols is a Sisyphean task.

I can’t imagine why anyone ever thought a 5% cap for default traffic was a good idea. Particularly for a UDP protocol which may not even be able to interpret those dropped packets as a signal to rate limit itself. Judging by the comment they were trying to catch unidentified BitTorrent traffic, which must have its own rate limiting. But still, what a dumb rule.

After several years of using QoS on home routers I’m of the opinion that QoS rules cause as much trouble as they fix. It’s certainly caused me a lot of problems. In a home network there’s no meaningful way to shape incoming traffic at all. You can shape the outgoing traffic a bit, and I think prioritizing ACK is probably a good idea. (Although weirdly this behavior is not the default). But in general the QoS implementations out there complicate things a lot and don’t provide a lot of value.

It’s time to go back and look at what the Bufferbloat guys have accomplished recently, and whether fq_codel or something similar has gotten traction. Their approach seems much simpler. Last I checked no Tomato variant supported it.