1.8. Use zip to Process Iterators in Parallel

Often in Python you find yourself with many lists of related objects. List comprehensions make it easy to take a source list and get a derived list by applying an expression (see Item 27: “Use Comprehensions Instead of map and filter”):

>>> names = ['Cecilia', 'Lise', 'Marie']
>>> counts = [len(n) for n in names]
>>> print(counts)
[7, 4, 5]

The items in the derived list are related to the items in the source list by their indexes. To iterate over both lists in parallel, I can iterate over the length of the names source list:

>>> longest_name = None
>>> max_count = 0
>>>
>>> for i in range(len(names)):
>>>     count = counts[i]
>>>     if count > max_count:
>>>         longest_name = names[i]
>>>         max_count = count
>>>
>>> print(longest_name)
Cecilia

The problem is that this whole loop statement is visually noisy. The indexes into names and counts make the code hard to read. Indexing into the arrays by the loop index i happens twice. Using enumerate (see Item 7: “Prefer enumerate Over range”) improves this slightly, but it’s still not ideal:

>>> for i, name in enumerate(names):
>>>     count = counts[i]
>>>     if count > max_count:
>>>         longest_name = name
>>>         max_count = count

To make this code clearer, Python provides the zip built-in function. zip wraps two or more iterators with a lazy generator. The zip generator yields tuples containing the next value from each iterator. These tuples can be unpacked directly within a for statement (see Item 6: “Prefer Multiple Assignment Unpacking Over Indexing”). The resulting code is much cleaner than the code for indexing into multiple lists:

>>> for name, count in zip(names, counts):
>>>     if count > max_count:
>>>         longest_name = name
>>>         max_count = count

zip consumes the iterators it wraps one item at a time, which means it can be used with infinitely long inputs without risk of a program using too much memory and crashing.

However, beware of zip’s behavior when the input iterators are of different lengths. For example, say that I add another item to names above but forget to update counts. Running zip on the two input lists will have an unexpected result:

>>> names.append('Rosalind')
>>> for name, count in zip(names, counts):
>>>     print(name)
Cecilia
Lise
Marie

The new item for ‘Rosalind’ isn’t there. Why not? This is just how zip works. It keeps yielding tuples until any one of the wrapped iterators is exhausted. Its output is as long as its shortest input. This approach works fine when you know that the iterators are of the same length, which is often the case for derived lists created by list comprehensions.

But in many other cases, the truncating behavior of zip is surprising and bad. If you don’t expect the lengths of the lists passed to zip to be equal, consider using the zip_longest function from the itertools built-in module instead:

>>> import itertools
>>> for name, count in itertools.zip_longest(names, counts):
>>>     print(f'{name}: {count}')
Cecilia: 7
Lise: 4
Marie: 5
Rosalind: None

zip_longest replaces missing values—the length of the string ‘Rosalind’ in this case—with whatever fillvalue is passed to it, which defaults to None.

1.8.1. Things to Remember

✦ The zip built-in function can be used to iterate over multiple iterators in parallel.

✦ zip creates a lazy generator that produces tuples, so it can be used on infinitely long inputs.

✦ zip truncates its output silently to the shortest iterator if you supply it with iterators of different lengths.

✦ Use the zip_longest function from the itertools built-in module if you want to use zip on iterators of unequal lengths without truncation.