Iterables, Iterators, and Generators
Notes based on Fluent Python, summarized and rephrased in my own words.[file:22]
Why iteration matters
Iteration is the foundation of data processing. When dealing with datasets that don’t fit in memory, we need a lazy way to retrieve items one at a time, on demand. That is exactly what the Iterator pattern provides.[file:22]
In Python:
- Iterables are objects you can loop over (e.g.
for x in obj:). - Iterators are the objects that actually produce items one by one.
- Generators are a convenient way to write iterators using
yield.
All generators are iterators, because they fully implement the iterator protocol. Conceptually, iterators often pull items from an underlying collection, whereas generators can "produce" items out of thin air.[file:22]
Sentence, version 1: a simple word sequence
First, we implement Sentence as a sequence of words extracted from a text using a regex:[file:22]
import re
import reprlib
RE_WORD = re.compile(r'\w+')
class Sentence:
def __init__(self, text):
self.text = text
self.words = RE_WORD.findall(text)
def __getitem__(self, index):
return self.words[index]
def __len__(self):
return len(self.words)
def __repr__(self):
return 'Sentence(%s)' % reprlib.repr(self.text)Usage:[file:22]
s = Sentence('"The time has come," the Walrus said,')
print(s) # Sentence('"The time ha... Walrus said,')
for word in s:
print(word)
list(s)
# ['The', 'time', 'has', 'come', 'the', 'Walrus', 'said']Why sequences are iterable
When Python needs to iterate over x, it effectively does:
- Call
iter(x). - If
x.__iter__()exists, call it to get an iterator. - Else, if
x.__getitem__()exists, Python builds an iterator that fetches items by index, starting at 0, untilIndexErroris raised.[file:22] - If neither works,
TypeError: object is not iterableis raised.
Any sequence (like this Sentence class) is iterable primarily because it implements __getitem__. Standard sequence types also implement __iter__, and you usually should, too; the special handling for __getitem__ exists largely for backward compatibility.[file:22]
In terms of the ABCs, abc.Iterable considers only __iter__, so an old-style sequence with just __getitem__ may be iterable to the runtime but not recognized by isinstance(obj, abc.Iterable).[file:22]
Iterables vs iterators
Iterable: an object from which Python can obtain an iterator using iter(obj). Typically:
- Defines
__iter__returning an iterator, or - Implements
__getitem__with integer indices starting at 0.[file:22]
Iterator: an object that:
- Implements
__next__(no arguments), returning the next item or raisingStopIterationwhen exhausted. - Implements
__iter__returningself, so it can be used anywhere an iterable is expected (like insidefor).[file:22]
Example of using an iterator explicitly:[file:22]
s = 'ABC'
for ch in s:
print(ch) # implicit iterator
it = iter(s) # get iterator
while True:
try:
print(next(it))
except StopIteration:
breakThe standard library defines collections.abc.Iterator roughly as:[file:22]
class Iterator(Iterable):
@abstractmethod
def __next__(self):
raise StopIteration
def __iter__(self):
return selfSo: iterables produce iterators; iterators produce items.[file:22]
Sentence, version 2: explicit iterator class
We can rewrite Sentence to use an explicit iterator class instead of relying on __getitem__:[file:22]
import re
import reprlib
RE_WORD = re.compile(r'\w+')
class Sentence:
def __init__(self, text):
self.text = text
self.words = RE_WORD.findall(text)
def __repr__(self):
return 'Sentence(%s)' % reprlib.repr(self.text)
def __iter__(self):
return SentenceIterator(self.words)
class SentenceIterator:
def __init__(self, words):
self.words = words
self.index = 0
def __next__(self):
try:
word = self.words[self.index]
except IndexError:
raise StopIteration()
self.index += 1
return word
def __iter__(self):
return selfNotes:[file:22]
Sentenceis an iterable; each call toiter(sentence)returns a newSentenceIterator, so you can iterate multiple times.SentenceIteratoris an iterator; it holds iteration state (index) and returns itself from__iter__.
A common mistake is to merge these roles and make the container itself an iterator (maintaining internal index state). That usually leads to bugs where repeated iteration doesn’t work properly. Keep iterables and iterators conceptually separate.[file:22]
Sentence, version 3: generator function
Instead of defining a separate iterator class, we can use a generator function inside __iter__:[file:22]
import re
import reprlib
RE_WORD = re.compile(r'\w+')
class Sentence:
def __init__(self, text):
self.text = text
self.words = RE_WORD.findall(text)
def __repr__(self):
return 'Sentence(%s)' % reprlib.repr(self.text)
def __iter__(self):
for word in self.words:
yield word
# explicit return not needed; function just endsAny function that uses yield becomes a generator function: calling it returns a generator object that implements the iterator protocol.[file:22]
This version:
- Is shorter and clearer than an explicit iterator class.
- Still returns a fresh iterator on each call to
__iter__.
Sentence, version 4: fully lazy with re.finditer
We can make the sentence even more memory-efficient by avoiding a precomputed words list and scanning the text lazily:[file:22]
import re
import reprlib
RE_WORD = re.compile(r'\w+')
class Sentence:
def __init__(self, text):
self.text = text
def __repr__(self):
return 'Sentence(%s)' % reprlib.repr(self.text)
def __iter__(self):
for match in RE_WORD.finditer(self.text):
yield match.group()re.finditer is a lazy alternative to re.findall: it returns an iterator of MatchObject instances instead of a full list.[file:22] For long texts or many matches, this saves a lot of memory. match.group() extracts the matched word.
Sentence, version 5: generator expression
We can make __iter__ even more compact using a generator expression:[file:22]
import re
import reprlib
RE_WORD = re.compile(r'\w+')
class Sentence:
def __init__(self, text):
self.text = text
def __repr__(self):
return 'Sentence(%s)' % reprlib.repr(self.text)
def __iter__(self):
return (match.group() for match in RE_WORD.finditer(self.text))This is functionally equivalent to version 4 but uses a concise expression instead of a for/yield generator function.[file:22]
Generator tools in itertools and the stdlib
Python’s stdlib offers many tools that produce or consume iterators lazily.[file:22]
Examples:
itertools.count(start, step)– infinite arithmetic progression.pythonimport itertools gen = itertools.count(1, 0.5) next(gen) # 1 next(gen) # 1.5 next(gen) # 2.0Common built-ins that work lazily or return iterators (or are closely related):
enumerate– pairs index with items.iter– creates an iterator.next– advances an iterator.filter,map– lazy transforms.zip– iterates in parallel.reversed– reversed iteration (for suitable objects).rangein Python 3 – lazy numeric sequence.[file:22]
There are many more in itertools that can be combined to build sophisticated pipelines.
yield from (Python 3.3+)
When a generator needs to yield all values from another iterable or generator, the traditional pattern is nested loops:[file:22]
def chain(*iterables):
for it in iterables:
for i in it:
yield iSince Python 3.3, we can write this more directly:
def chain(*iterables):
for it in iterables:
yield from ityield from sub_iterable delegates to another iterator or generator, yielding all its values in turn. It is syntactic sugar that often simplifies nested generator code.[file:22]
Reduction functions that consume iterables
Several built-ins take an iterable and reduce it down to a single value:[file:22]
all(iterable)–Trueif all elements are truthy (or iterable is empty).any(iterable)–Trueif any element is truthy.sum(iterable, start=0)– adds numbers.max(iterable, ...),min(iterable, ...)– extreme values.functools.reduce(func, iterable[, init])– general left fold.[file:22]
These functions are natural consumers of iterators and fit well with lazy pipelines.
The advanced iter(callable, sentinel) form
The built-in iter has a lesser-known two-argument form:
iter(callable, sentinel)- The first argument is a callable with no arguments.
- The second is a sentinel value.
- The resulting iterator repeatedly calls the callable, yielding each result until the callable returns the sentinel value, at which point iteration stops (by raising
StopIteration).[file:22]
Example: simulate repeated dice rolls until a 1 appears:[file:22]
from random import randint
def d6():
return randint(1, 6)
rolls = iter(d6, 1)
for roll in rolls:
print(roll) # prints values 2–6 until a 1 is rolled (1 is not printed)This pattern lets you turn any repeated, sentinel-terminated function call into a clean iterator.