Pandas DataFrames: MultiIndex and slicing

Diving more in to using Pandas DataFrames, I spent some time learning about MultiIndex. Long story short it’s a way to have a composite key for your data, to say “these two columns of my CSV file are the name for the row”. Or more complex things.

Pandas has some fairly powerful mechanisms to subset your DataFrame based on aspects of its MultiIndex composite key. They’re a bit confusing though, the slicing syntax is abstruse. Also there’s a hidden gotcha: you really have to sort your DataFrame before you can slice it if it has a MultiIndex.

Anyway, there’s a demo notebook program here. I also uploaded the ipynb file as a gist but GitHub’s viewer is buggy.

For the search engines: if you see any of the following error messages, this sample notebook will help you figure them out. Summary; call DataFrame.sortlevel()


PerformanceWarning: indexing past lexsort depth may impact performance.

KeyError: 'Key length (1) was greater than MultiIndex lexsort depth (0)'

KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (1)'

PS: if you want to get the series of index labels for a DataFrame with a MultiIndex, use get_level_values(). I keep forgetting how to do this.