7.6. Avoid Creating New Thread Instances for On-demand Fan-out

Threads are the natural first tool to reach for in order to do parallel I/O in Python (see Item 53: “Use Threads for Blocking I/O, Avoid for Parallelism”). However, they have significant downsides when you try to use them for fanning out to many concurrent lines of execution.

To demonstrate this, I’ll continue with the Game of Life example from before (see Item 56: “Know How to Recognize When Concurrency Is Necessary” for background and the implementations of various functions and classes below). I’ll use threads to solve the latency problem caused by doing I/O in the game_logic function. To begin, threads require coordination using locks to ensure that assumptions within data structures are maintained properly. I can create a subclass of the Grid class that adds locking behavior so an instance can be used by multiple threads simultaneously:

Click here to view code image

>>> from threading import Lock
>>>
>>> ALIVE = '*'
>>> EMPTY = '-'
>>>
>>> class Grid:
>>>     ...
>>>
>>> class LockingGrid(Grid):
>>>     def __init__(self, height, width):
>>>         super().__init__(height, width)
>>>         self.lock = Lock()
>>>
>>>     def __str__(self):
>>>         with self.lock:
>>>             return super().__str__()
>>>
>>>     def get(self, y, x):
>>>         with self.lock:
>>>              return super().get(y, x)
>>>
>>>     def set(self, y, x, state):
>>>         with self.lock:
>>>             return super().set(y, x, state)

Then, I can reimplement the simulate function to fan out by creating a thread for each call to step_cell. The threads will run in parallel and won’t have to wait on each other’s I/O. I can then fan in by waiting for all of the threads to complete before moving on to the next generation:

Click here to view code image

>>> from threading import Thread
>>>
>>> def count_neighbors(y, x, get):
>>>     ...
>>>
>>> def game_logic(state, neighbors):
>>>     ...
>>>     # Do some blocking input/output in here:
>>>     data = my_socket.recv(100)
>>>     ...
>>> def step_cell(y, x, get, set):
>>>     state = get(y, x)
>>>     neighbors = count_neighbors(y, x, get)
>>>     next_state = game_logic(state, neighbors)
>>>     set(y, x, next_state)
>>>
>>> def simulate_threaded(grid):
>>>     next_grid = LockingGrid(grid.height, grid.width)
>>>
>>>     threads = []
>>>     for y in range(grid.height):
>>>         for x in range(grid.width):
>>>             args = (y, x, grid.get, next_grid.set)
>>>             thread = Thread(target=step_cell, args=args)
>>>             thread.start()  # Fan out
>>>             threads.append(thread)
>>>
>>>     for thread in threads:
>>>         thread.join()       # Fan in
>>>
>>>     return next_grid

I can run this code using the same implementation of step_cell and the same driving code as before with only two lines changed to use the LockingGrid and simulate_threaded implementations:

Click here to view code image

class ColumnPrinter:

...

grid = LockingGrid(5, 9) # Changed grid.set(0, 3, ALIVE) grid.set(1, 4, ALIVE) grid.set(2, 2, ALIVE) grid.set(2, 3, ALIVE) grid.set(2, 4, ALIVE)

columns = ColumnPrinter() for i in range(5):

columns.append(str(grid)) grid = simulate_threaded(grid) # Changed

print(columns)

>>>
    0     |     1     |     2     |     3     |     4
---*----- | --------- | --------- | --------- | ---------
----*---- | --*-*---- | ----*---- | ---*----- | ----*----
--***---- | ---**---- | --*-*---- | ----**--- | -----*---
--------- | ---*----- | ---**---- | ---**---- | ---***---
--------- | --------- | --------- | --------- | ---------

This works as expected, and the I/O is now parallelized between the threads. However, this code has three big problems:

The Thread instances require special tools to coordinate with each other safely (see Item 54: “Use Lock to Prevent Data Races in Threads”). This makes the code that uses threads harder to reason about than the procedural, single-threaded code from before. This complexity makes threaded code more difficult to extend and maintain over time.

Threads require a lot of memory—about 8 MB per executing thread. On many computers, that amount of memory doesn’t matter for the 45 threads I’d need in this example. But if the game grid had to grow to 10,000 cells, I would need to create that many threads, which couldn’t even fit in the memory of my machine. Running a thread per concurrent activity just won’t work.

Starting a thread is costly, and threads have a negative performance impact when they run due to context switching between them. In this case, all of the threads are started and stopped each generation of the game, which has high overhead and will increase latency beyond the expected I/O time of 100 milliseconds.

This code would also be very difficult to debug if something went wrong. For example, imagine that the game_logic function raises an exception, which is highly likely due to the generally flaky nature of I/O:

Click here to view code image

>>> def game_logic(state, neighbors):
>>>     ...
>>>     raise OSError('Problem with I/O')
>>>     ...

I can test what this would do by running a Thread instance pointed at this function and redirecting the sys.stderr output from the program to an in-memory StringIO buffer:

Click here to view code image

import contextlib import io

fake_stderr = io.StringIO() with contextlib.redirect_stderr(fake_stderr):

thread = Thread(target=game_logic, args=(ALIVE, 3)) thread.start() thread.join()

print(fake_stderr.getvalue())

>>>
Exception in thread Thread-226:
Traceback (most recent call last):
  File "threading.py", line 917, in _bootstrap_inner
    self.run()
  File "threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "example.py", line 193, in game_logic
    raise OSError('Problem with I/O')
OSError: Problem with I/O

An OSError exception is raised as expected, but somehow the code that created the Thread and called join on it is unaffected. How can this be? The reason is that the Thread class will independently catch any exceptions that are raised by the target function and then write their traceback to sys.stderr. Such exceptions are never re-raised to the caller that started the thread in the first place.

Given all of these issues, it’s clear that threads are not the solution if you need to constantly create and finish new concurrent functions. Python provides other solutions that are a better fit (see Item 58: “Understand How Using Queue for Concurrency Requires Refactoring,” Item 59: “Consider ThreadPoolExecutor When Threads Are Necessary for Concurrency”, and Item 60: “Achieve Highly Concurrent I/O with Coroutines”).

7.6.1. Things to Remember

✦ Threads have many downsides: They’re costly to start and run if you need a lot of them, they each require a significant amount of memory, and they require special tools like Lock instances for coordination.

✦ Threads do not provide a built-in way to raise exceptions back in the code that started a thread or that is waiting for one to finish, which makes them difficult to debug.