Nelson's log

unicodecsv 0.11.0 speed improvement

My little Python csv parsing benchmark had one really nice effect; someone contributed a patch to remove some unnecessary calls to isinstance in certain circumstances. He claims a 2x speedup. That patch was just released in 0.11.0, so I tested it. These are the same benchmarks and data file I ran and reported before as “Python 2 results”

 61.65s unicodeCsvReader 0.9
118.64s unicodeCsvDictReader 0.9
 41.12s unicodeCsvReader 0.11
 96.34s unicodeCsvDictReader 0.11

Not quite a 2x speedup, but runs in 0.7x or 0.8x the time it used to. That’s quite good for a simple change, and a nice example of open source working well.

I should really stop using the csv DictReader. Or maybe make a new one that’s smarter. The current Python module seems to actually create a real dict object for every row. I think you could make something faster that used a class wrapper for the tuple that emulated dict-style retrieval by looking up column offsets in the row header. At least there’d be less data copied, but maybe the wrapper overhead would negate that. Have to try it to see.