Better Jupyter charts: Seaborn

Took a quick tour through Seaborn, the enhancement library for matplotlib. It’s very good! It does two basic things. It makes the default matplotlib charts prettier, and it gives you an easy API to do some fancier types of statistical visualization. I’d say it’s a no-brainer to use Seaborn if you’re doing exploratory data visualizations. It still just renders raster images in Jupyter, so it doesn’t fulfill my goal of having nice SVG interactive charts in the browser, but Seaborn is solving a different problem.

Prettier

“Drawing attractive figures is important”, the docs say, and I couldn’t agree more. Seaborn reconfigures matplotlib so the default charts look better. I don’t just mean nice anti-aliasing, but also reasonable grid ticks and color choices. Seaborn has good perceptual palettes which are really important. I believe stock matplotlib has recently improved in part with input from Seaborn.

The other part of “attractive figures” is the Seaborn API is DataFrame-aware and will label your plots using the labels in your DataFrame. Getting nicely labelled axes and titles and stuff takes several lines of manual code with matplotlib; with Seaborn it’s a single line of code.

Fancier statistics

The meatier part of Seaborn is it has more complex chart types built in.

Distributions are mostly what I’ve used. In detail:

  • distplot: 1 dimensional distributions, an enhancement of matplotlib.hist. It adds a continuous kernel density estimate to the bars, and also has a rug-plot option.
  • jointplot: 2 dimensional distributions, an enhancement of matplotlib.scatter. Adds correlation coefficient, histograms on the side, a sort of quicky ggplot. Can also do continuous contour plots. One drawback is there doesn’t seem to be any easy way to set the hue of the individual dots, so no sneaking in a third dimension of data. Pass in kind=”reg” to have it try to fit a regression curve to the scatter.
  • hexbins: 2 dimensional histograms
  • pairplot: pairwise jointplots for your N dimensional dataset. So nice to have this in a single command, although it’s unwielding for N > 5 or so. (Note this does have a way to set the hue of individual dots.)

Regressions are for fitting various kinds of statistical models to your dataset.

Categorical graphs are for looking at statistics for qualitative categories, plots like swarm plots and violin plots. They’re really quite beautiful. Here’s a sample visualization of the distribution of a variable (y axis) for each of 5 categories (the x axis). I’ve used a violin plot which shows a KDE continuous approximation, along with a swarm plot to show the actual dots on top of it. (The data set is # of games players have for each of 5 roles in League of Legends. That one crazy outlier is Fiddlesticks Jungle; players tend to have 220+ games on him!)

download.png

Finally data-aware grids are used for making small multiples plots of the same dataset. I already mentioned pairplot, for doing NxN visualizations of N variables, pairwise. FacetGrid is a tool for quickly generating a bunch of graphs comparing across multiple categorical variables.

Conclusion

I like Seaborn. I like that it looks good out of the box. I also like that it allows me to make more sophisticated graphs very simply, with little effort. It may be a little too easy; I’m not sure a violin plot is really the right treatment of that data above, for instance. But it sure looks good!