Complex QOS rules considered harmful

tl;dr: some router firmware has a catch-all rule that throttles all unidentified UDP traffic to 5% of bandwidth (labelled “Crawl”). This is a stupid rule, disable it.

I just fixed a bug in my router’s configuration that explained why Google QUIC was not working well for me. It may also explain bugs I’ve been seeing in League of Legends, OpenVPN, and other UDP protocols. I’m not entirely certain.

I’ve been running the Tomato v1.28 (Toastman) firmware for a year+ now. It’s an old build. It has 40+ default QoS rules identifying all sorts of protocols from important ones (DNS) to silly ones (RealAudio streaming), and then classifies traffic service level. Unfortunately some of the rules are harmful.

The problem rule in this case was the very last one. “UDP Dst Port: 1-65535, classify Crawl”. And Crawl by default is limited to maximum 5% of total bandwidth! There are a few higher priority rules that classify specific kinds of UDP traffic: DNS, for instance. But any new or unanticipated use of UDP is severely throttled. Such as QUIC, Google’s fancy new web protocol. And Cisco VPN. And maybe OpenVPN.

And maybe League of Legends; it’s a UDP protocol too, and hasn’t performed as well on my slow network as I expected. Just playing a game feels about the same, maybe a little less laggy, but there’s still the same unexpectedly high packet loss. But I think one reproducible bug is gone now. Jayce gates cause a brief surge of UDP packets; it used to be that caused significant lag even when playing alone. Now they don’t cause lag.

The simple fix is to adjust the Crawl class to also get up to 100% of bandwidth (both inbound and outbound). That may still have lower queue priority though. You can also try adding more rules for UDP protocols you care about; QUIC is on ports 80 and 443, for instance. But trying to label all known UDP protocols is a Sisyphean task.

I can’t imagine why anyone ever thought a 5% cap for default traffic was a good idea. Particularly for a UDP protocol which may not even be able to interpret those dropped packets as a signal to rate limit itself. Judging by the comment they were trying to catch unidentified BitTorrent traffic, which must have its own rate limiting. But still, what a dumb rule.

After several years of using QoS on home routers I’m of the opinion that QoS rules cause as much trouble as they fix. It’s certainly caused me a lot of problems. In a home network there’s no meaningful way to shape incoming traffic at all. You can shape the outgoing traffic a bit, and I think prioritizing ACK is probably a good idea. (Although weirdly this behavior is not the default). But in general the QoS implementations out there complicate things a lot and don’t provide a lot of value.

It’s time to go back and look at what the Bufferbloat guys have accomplished recently, and whether fq_codel or something similar has gotten traction. Their approach seems much simpler. Last I checked no Tomato variant supported it.

3 thoughts on “Complex QOS rules considered harmful

  1. I agree with you that using extensive classification rules can lead to a world of hurt. And that world is pretty common.

    The fq_codel-using sqm-scripts now in openwrt do nearly no specialized prioritization. Neither do the qos-scripts. The only two things under fq_codel that tend to misbehave a bit are torrent and vpns, which I will get to in a second.

    I do believe that tcp ack prioritization has a place at very low bandwidths when SFQ (per packet fairness) is in use, but do not believe it is useful elsewhere (and of increasing less benefit in a world with new tcp-like protocols in it, regardless, see: http://www.bufferbloat.net/projects/cerowrt/wiki/Wondershaper_Must_Die ) . It is far saner to hand the DRR-based fq_codel a quantum 300 to use, thus giving a slight boost to all smaller packets (be they voip/dns/acks/etc). dd-wrt allows you to use fq_codel as a qdisc and I have recommended to the authors that they disable the ack prioritization checkbox when fq_codel was used, but do not know if they followed up on that suggestion.

    Once there is some fair (flow) queuing in place, and queue length is controlled, there is some room to do some classification that makes sense.

    sqm-scripts and now “cake” ( http://www.bufferbloat.net/projects/codel/wiki/Cake ) have always had a 3 tier shaper, which has as priority support for locally generated dns and ntp packets from the router, only. It can be used, if you so desire, to recognise incoming vpn packets and give those a slight boost, and/or deprioritize something tightly recognised as torrent or other background traffic (such as rsync). On the deprioritization front I found that there was so much mis-classified traffic as CS1, that I set the background class to 30% of traffic (after, like you, discovering that 5% was quite harmful).

    But generally I have found not a lot of need to improve the performance of vpns over the best effort class – merely having reliably low latency was enough to make most vpn usage transparent. There is a need on userspace vpns, however, to debloat and fq those – as they frequently bottleneck on the crypto stage on weak cpus – accumulating latency in their read buffers, and serializing the output.

    As for torrent, well, I find that web traffic knocks it out of the way just fine under fq_codel, and normal tcp outcompetes utp somewhat, even with multiple flows going. This paper goes into
    great detail on utp behaviors with RED: http://perso.telecom-paristech.fr/~drossi/paper/rossi14comnet-b.pdf

    … but it has not been updated to talk to how torrent behaves under fq_codel.

  2. Dave, I appreciate you checking in here. And all the work you’ve done! It sounds like y’all have made good progress. It’s great news that fq_codel is now in OpenWrt. I see it’s in dd-wrt too and lots of other places. Next time I can tinker with router firmware I’m definitely going to try out something that supports your queuing.

    The docs have gotten good too! Some quick references:
    http://www.bufferbloat.net/projects/codel/wiki
    https://en.wikipedia.org/wiki/CoDel
    http://wiki.openwrt.org/doc/uci/qos

  3. Oh and I meant to say BitTorrent is a particularly perverse problem for QoS. BitTorrent is designed to actively resist detection and traffic shaping, to stop ISPs that have tampered with customer traffic in inappropriate ways. Turns out in an arms race everyone loses; now our own router has a harder time shaping the traffic. Fortunately BitTorrent clients all come with throttling options.

    Truthfully the best solution for all this stuff is a fast network. At my home 100Mbps Internet link I don’t do any QoS at all. Between all packets just coming faster and being faster than many connections to servers anyway, there’s little need to try to shape traffic.

Comments are closed.