Grafana and InfluxDB: rollups

Now that I have a few months of data some of my Grafana graphs were getting slow. I realized I’d made no provision for data rollups. I’m recording data every few seconds. Great for a detailed graph of the last 24 hours, but far too fine grained for a report for a full year. Clearly I need some data aggregation / rollups, just like rrdtool did 22 years ago.

Joke’s on me: there’s no easy way to do aggregates or rollups in Grafana + InfluxDB. It’s a frequently requested feature, this discussion from 2016 is the most complete I’ve found. No clear answer but the fact it’s still not present (even in InfluxDB 2.0) speaks volumes. To be fair, doing a general purpose rollup is kind of hard, lots of weird corner cases.

InfluxDB does have reasonable support for creating a rollup table yhourself; see the Downsampling and data retention docs. You set up a continuous query to populate records and then a retention policy to delete old data. (Hysterically, the docs suggest you create the retention policy first; those only run every 30 minutes so you’ll probably get lucky!) But that amounts to creating a second table for the downsampled data. Whatever queries that data is then responsible for selecting the right table (or worse, merging data). There’s a variety of hacks for working with this in Grafana; the usual suggestion is to make the Grafana query select from the value of a variable rather than a static data source, then populate the variable appropriately. I imagine this can be made to work but what a PITA!

Looking at my own problem again I realized that I only had a million points at worst; that’s not awesome but why is the system choking on that? InfluxDB is super fast! Turns out my problem was the weird dashboard I’d imported didn’t have any sort of group_by clause defined. The ordinary Grafana boilerplate includes a group_by($_interval) as part of the query. InfluxDB is still having to read all the records but at least it’s only shipping a rolled up version to Grafana. Once I added that back my existing dashboards got speedy again. Problem avoided. That won’t work for a system with a lot of data, but for a small data system it’s fine.