Python3: no len() for iterators

2016/04/062016/04/07 ~ nelsonminar

Python 3 has an odd oversight; there’s no function for finding the length of an iterable. You can do len(aList) of course, and len(aTuple). But there’s no len(anIterator). What’s really surprising is there’s no itertools.len(anIterator), some function implementation.

The naive solution is len(tuple(anIterator)). But that’s inefficient because it constructs a tuple in memory. This discussion suggests sum(1 for e in anIterator), which if I understand generators correctly should be pretty efficient and involve no allocation of new collections. Surprised this function isn’t in itertools.

Update: see also cardinality.count() for a clever faster implementation.

Update 2: I had some yak-shaving time, so I made an iPython notebook timing three different solutions for counting the length of an iterable.

1x: len of tuple of iterable
7x: deque an enumerate of iterable
11x: sum 1 over the iterable

The results are roughly the same for iterables of size 1000 to 1,000,000 to 100,000,000 and hold whether the iterable is a list or an ephemeral itertools iterable.

What’s astonishing is how much faster the len(tuple(iterable)) method is than the others. Naively it should be slower, it’s doing a lot more work building that tuple in memory. But perhaps this is a highly optimized codepath in Python. The tradeoff is it temporarily allocates memory to consume the whole iterable, so it’s using O(n) memory compared to the O(1) memory for the other solutions.

12 thoughts on “Python3: no len() for iterators”

ryan says:

2016/04/06 at 22:12

this actually kind of makes sense to me. len(iterator) is kind of dangerous, since iterating may have side effects, and you may not be able to rewind if you need to actually use the values afterward. (alluded to in the SO question you link.) disabling it is opinionated, but maybe the right choice.

agreed though, it’s odd that it’s not even in itertools. disabling builtin len but supporting it in itertools seems like the best tradeoff. iterables also seem relatively safe, as opposed to iterators, since iterables usually don’t have side effects and can rewind.
nelsonminar says:

2016/04/06 at 22:13

Yeah the fact that calculating the length of an iterator may be destructive requires care. The sum() method is non-obvious enough though I’d like to see it in itertools.

(I’m still confused about the difference between generators, iterators, and iterables.)
pvg says:

2016/04/07 at 07:57

You’re probably not going to see this in itertools as length is not a meaningful property of an iterator. Your code has a slight but critical inaccuracy that may be contributing to the confusion – sum doesn’t take an iterator, it takes an iterable. It just so happens that every iterator is also an iterable where __iter__ acts as a copy constructor of the iterable. That’s why this particular construct is non-destructive.
pvg says:

2016/04/07 at 08:53

Glorp, part of my comment was a horrible lie as well, __iter__ on typical (say, of a list) iterator just returns self, not a copy so the sum thing is very much destructive when called on a raw iterator. Which is probably another reason to avoid it.
nelsonminar says:

2016/04/07 at 15:56

Reflecting on pvg’s comments make me realize that “length” really is not a property of an iterator. Nor an iterable. The only way to know the length of an iterable is to count the items in it, which is O(n) and could be destructive. So maybe I’d make the method I want called “count” or something, to indicate it’s a slow operation. (only itertools.count() exists and is something different.)

Also itertools isn’t really the right place for this method. That library is largely about constructing new iterables, not consuming existing ones. There’s a bunch of built-in functions that are basically “reduce the iterable to a single result”: any(), sum(), etc. I wonder if anyone’s considered just extending len() to iterables? I guess it’d be surprising that sometimes it’s O(1) and sometimes O(n).

Also worth noting; reduce() got removed in Python 3 because it was too confusing. Ironically when Guido first proposed the removal in 2005 he said dropping filter and map was noncontroversial. Those stayed, only reduce got removed. http://www.artima.com/weblogs/viewpost.jsp?thread=98196
nelsonminar says:

2016/04/07 at 15:59

(For that matter, len(iterable) could very likely never terminate. Perhaps that’s a bad idea in the standard library.)
nelsonminar says:

2016/04/07 at 16:12

OK maybe this is the final word: the cardinality package has an implementation of “length of iterable” (named count()). The implementation has a clever use of enumerate and deque which moves the actual iteration into C code and therefore should be faster than a for loop. https://github.com/wbolster/cardinality/blob/master/cardinality.py#L48-L52
nelsonminar says:

2016/04/07 at 19:20

Here’s a final final word for anyone following along: read the update to the blog post with a Python notebook timing various solutions.
pvg says:

2016/04/08 at 00:32

itertools still lets CPython cheat so those iterators are a bit magical rather than ordinary – they have a method __length_hint__ that is supposed to help optimize memory allocation, see

http://legacy.python.org/dev/peps/pep-0424/

I noticed this yesterday while dir-ing various built-in iterators. If you time a completely non-magical iterator everything runs at about the same, crushingly slow speed –

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

<br /> Viewer requires iframe.<br />

view raw

Iterable timing.ipynb

hosted with ❤ by GitHub
nelsonminar says:

2016/04/08 at 01:04

Oh nice work, thank you. I wonder if that explains why the len1() method with creating a tuple is so much faster; the magic iterator may it it do work on N elements at once, not just 1.

FWIW I ran your notebook and consistently got different timings. len1() is 405ms, len2() is 485ms, and len3() is 447ms. len1 is much much slower (but still fastest) and len3 is a little faster than len2.

It’s a bit tricky to measure because so much overhead is now in the Python code for the iterator itself. I don’t really care enough to figure out how to factor that out though.
pvg says:

2016/04/08 at 09:42

That sounds plausible; I tried to test it in a couple obvious ways – deleting __length_hint__ on the ‘native’ iterator (doesn’t work, is read-only) and adding it to the python iterator (doesn’t change execution time) and then my curiosity crashed against my laziness as well.

An interesting thing about Python is that, despite the batteries, readability and ease-of-use, it’s an old, long-evolved language and it doesn’t take much to find yourself staring right into the sausage factory. A few undirected iterator/iterable thoughts:

* If you want length, these are the wrong abstractions. On the other hand, they’re omnipresent in the standard library and language idiom. Even the runtime writers found themselves in your predicament and implemented an ugly magic hack. As a rhetorical wanky design astronomy aside, iterables are not lengthable but if you were starting from scratch, would you make the protocol demand the converse – that lengthables are also iterable? After all, things that are integer-indexable are already treated as iterables.

* An iterator is a representation of the traversal state of some sequence. An iterable is a thing that gives you a newly initialized iterator from the beginning of the sequence. Iterators are also iterables, though, and they violate this contract.

* Iterators use exceptions for flow control. Believers will tell you it’s for brevity/succinctness but as these benchmarks suggest, this design decision was very likely influenced by the huge expense of function/method calls.

I also had no idea iPython had such a convenient shorthand to timeit. I feel like I’ve just caught up with The Future.
nelsonminar says:

2016/04/08 at 13:40

For posterity, __length_hint__ is sort of documented here: https://www.python.org/dev/peps/pep-0424/

Comments are closed.

Nelson's log

A personal work journal

Python3: no len() for iterators

Related

12 thoughts on “Python3: no len() for iterators”