8.7. Prefer deque for Producer–Consumer Queues

A common need in writing programs is a first-in, first-out (FIFO) queue, which is also known as a producer–consumer queue. A FIFO queue is used when one function gathers values to process and another function handles them in the order in which they were received. Often, programmers use Python’s built-in list type as a FIFO queue.

For example, say that I have a program that’s processing incoming emails for long-term archival, and it’s using a list for a producer–consumer queue. Here, I define a class to represent the messages:

Click here to view code image

>>> class Email:
>>>     def __init__(self, sender, receiver, message):
>>>         self.sender = sender
>>>         self.receiver = receiver
>>>         self.message = message
>>>     ...

I also define a placeholder function for receiving a single email, presumably from a socket, the file system, or some other type of I/O system. The implementation of this function doesn’t matter; what’s important is its interface: It will either return an Email instance or raise a NoEmailError exception:

Click here to view code image

>>> class NoEmailError(Exception):
>>>     pass
>>>
>>> def try_receive_email():
>>>     # Returns an Email instance or raises NoEmailError
>>>     ...

The producing function receives emails and enqueues them to be consumed at a later time. This function uses the append method on the list to add new messages to the end of the queue so they are processed after all messages that were previously received:

>>> def produce_emails(queue):
>>>     while True:
>>>         try:
>>>             email = try_receive_email()
>>>         except NoEmailError:
>>>             return
>>>         else:
>>>             queue.append(email)  # Producer

The consuming function does something useful with the emails. This function calls pop(0) on the queue, which removes the very first item from the list and returns it to the caller. By always processing items from the beginning of the queue, the consumer ensures that the items are processed in the order in which they were received:

Click here to view code image

>>> def consume_one_email(queue):
>>>     if not queue:
>>>         return
>>>     email = queue.pop(0)  # Consumer
>>>     # Index the message for long-term archival
>>>     ...

Finally, I need a looping function that connects the pieces together. This function alternates between producing and consuming until the keep_running function returns False (see Item 60: “Achieve Highly Concurrent I/O with Coroutines” on how to do this concurrently):

>>> def loop(queue, keep_running):
>>>     while keep_running():
>>>         produce_emails(queue)
>>>         consume_one_email(queue)
>>>
>>> def my_end_func():
>>>     ...
>>>
>>> loop([], my_end_func)

Why not process each Email message in produce_emails as it’s returned by try_receive_email? It comes down to the trade-off between latency and throughput. When using producer–consumer queues, you often want to minimize the latency of accepting new items so they can be collected as fast as possible. The consumer can then process through the backlog of items at a consistent pace—one item per loop in this case—which provides a stable performance profile and consistent throughput at the cost of end-to-end latency (see Item 55: “Use Queue to Coordinate Work Between Threads” for related best practices).

Using a list for a producer–consumer queue like this works fine up to a point, but as the cardinality—the number of items in the list—increases, the list type’s performance can degrade superlinearly. To analyze the performance of using list as a FIFO queue, I can run some micro-benchmarks using the timeit built-in module. Here, I define a benchmark for the performance of adding new items to the queue using the append method of list (matching the producer function’s usage):

Click here to view code image

>>> import timeit
>>>
>>> def print_results(count, tests):
>>>     avg_iteration = sum(tests) / len(tests)
>>>     print(f'Count {count:>5,} takes {avg_iteration:.6f}s')
>>>     return count, avg_iteration
>>>
>>> def list_append_benchmark(count):
>>>     def run(queue):
>>>         for i in range(count):
>>>             queue.append(i)
>>>
>>>     tests = timeit.repeat(
>>>         setup='queue = []',
>>>         stmt='run(queue)',
>>>         globals=locals(),
>>>         repeat=1000,
>>>         number=1)
>>>
>>>     return print_results(count, tests)

Running this benchmark function with different levels of cardinality lets me compare its performance in relationship to data size:

Click here to view code image

>>> def print_delta(before, after):
>>>     before_count, before_time = before
>>>     after_count, after_time = after
>>>     growth = 1 + (after_count - before_count) / before_count
>>>     slowdown = 1 + (after_time - before_time) / before_time
>>>     print(f'{growth:>4.1f}x data size, {slowdown:>4.1f}x time')
>>>
>>> baseline = list_append_benchmark(500)
>>> for count in (1_000, 2_000, 3_000, 4_000, 5_000):
>>>     comparison = list_append_benchmark(count)
>>>     print_delta(baseline, comparison)
Count   500 takes 0.000021s
Count 1,000 takes 0.000041s
 2.0x data size,  2.0x time
Count 2,000 takes 0.000080s
 4.0x data size,  3.8x time
Count 3,000 takes 0.000120s
 6.0x data size,  5.8x time
Count 4,000 takes 0.000161s
 8.0x data size,  7.7x time
Count 5,000 takes 0.000197s
10.0x data size,  9.5x time

This shows that the append method takes roughly constant time for the list type, and the total time for enqueueing scales linearly as the data size increases. There is overhead for the list type to increase its capacity under the covers as new items are added, but it’s reasonably low and is amortized across repeated calls to append.

Here, I define a similar benchmark for the pop(0) call that removes items from the beginning of the queue (matching the consumer function’s usage):

>>> def list_pop_benchmark(count):
>>>     def prepare():
>>>         return list(range(count))
>>>
>>>     def run(queue):
>>>         while queue:
>>>             queue.pop(0)
>>>
>>>     tests = timeit.repeat(
>>>         setup='queue = prepare()',
>>>         stmt='run(queue)',
>>>         globals=locals(),
>>>         repeat=1000,
>>>         number=1)
>>>
>>>     return print_results(count, tests)

I can similarly run this benchmark for queues of different sizes to see how performance is affected by cardinality:

Click here to view code image

>>> baseline = list_pop_benchmark(500)
>>> for count in (1_000, 2_000, 3_000, 4_000, 5_000):
>>>     comparison = list_pop_benchmark(count)
>>>     print_delta(baseline, comparison)
Count   500 takes 0.000041s
Count 1,000 takes 0.000097s
 2.0x data size,  2.4x time
Count 2,000 takes 0.000253s
 4.0x data size,  6.1x time
Count 3,000 takes 0.000465s
 6.0x data size, 11.3x time
Count 4,000 takes 0.000829s
 8.0x data size, 20.2x time
Count 5,000 takes 0.001540s
10.0x data size, 37.5x time

Surprisingly, this shows that the total time for dequeuing items from a list with pop(0) scales quadratically as the length of the queue increases. The cause is that pop(0) needs to move every item in the list back an index, effectively reassigning the entire list’s contents. I need to call pop(0) for every item in the list, and thus I end up doing roughly len(queue) * len(queue) operations to consume the queue. This doesn’t scale.

Python provides the deque class from the collections built-in module to solve this problem. deque is a double-ended queue implementation. It provides constant time operations for inserting or removing items from its beginning or end. This makes it ideal for FIFO queues.

To use the deque class, the call to append in produce_emails can stay the same as it was when using a list for the queue. The list.pop method call in consume_one_email must change to call the deque.popleft method with no arguments instead. And the loop method must be called with a deque instance instead of a list. Everything else stays the same. Here, I redefine the one function affected to use the new method and run loop again:

>>> import collections
>>>
>>> def consume_one_email(queue):
>>>     if not queue:
>>>         return
>>>     email = queue.popleft()  # Consumer
>>>     # Process the email message
>>>     ...
>>>
>>> def my_end_func():
>>>     ...
>>>
>>> loop(collections.deque(), my_end_func)

I can run another version of the benchmark to verify that append performance (matching the producer function’s usage) has stayed roughly the same (modulo a constant factor):

Click here to view code image

>>> def deque_append_benchmark(count):
>>>     def prepare():
>>>         return collections.deque()
>>>     def run(queue):
>>>         for i in range(count):
>>>             queue.append(i)
>>>
>>>     tests = timeit.repeat(
>>>         setup='queue = prepare()',
>>>         stmt='run(queue)',
>>>         globals=locals(),
>>>         repeat=1000,
>>>         number=1)
>>>     return print_results(count, tests)
>>>
>>> baseline = deque_append_benchmark(500)
>>> for count in (1_000, 2_000, 3_000, 4_000, 5_000):
>>>     comparison = deque_append_benchmark(count)
>>>     print_delta(baseline, comparison)
Count   500 takes 0.000020s
Count 1,000 takes 0.000041s
 2.0x data size,  2.1x time
Count 2,000 takes 0.000081s
 4.0x data size,  4.1x time
Count 3,000 takes 0.000121s
 6.0x data size,  6.1x time
Count 4,000 takes 0.000169s
 8.0x data size,  8.5x time
Count 5,000 takes 0.000207s
10.0x data size, 10.4x time

And I can benchmark the performance of calling popleft to mimic the consumer function’s usage of deque:

Click here to view code image

>>> def dequeue_popleft_benchmark(count):
>>>     def prepare():
>>>         return collections.deque(range(count))
>>>
>>>     def run(queue):
>>>         while queue:
>>>             queue.popleft()
>>>
>>>     tests = timeit.repeat(
>>>         setup='queue = prepare()',
>>>         stmt='run(queue)',
>>>         globals=locals(),
>>>         repeat=1000,
>>>         number=1)
>>>
>>>     return print_results(count, tests)
>>>
>>> baseline = dequeue_popleft_benchmark(500)
>>> for count in (1_000, 2_000, 3_000, 4_000, 5_000):
>>>     comparison = dequeue_popleft_benchmark(count)
>>>     print_delta(baseline, comparison)
Count   500 takes 0.000018s
Count 1,000 takes 0.000036s
 2.0x data size,  2.0x time
Count 2,000 takes 0.000073s
 4.0x data size,  4.1x time
Count 3,000 takes 0.000110s
 6.0x data size,  6.1x time
Count 4,000 takes 0.000147s
 8.0x data size,  8.2x time
Count 5,000 takes 0.000183s
10.0x data size, 10.2x time

The popleft usage scales linearly instead of displaying the superlinear behavior of pop(0) that I measured before—hooray! If you know that the performance of a program critically depends on the speed of producer–consumer queues, then deque is a great choice. If you’re not sure, then you should instrument your program to find out (see Item 70: “Profile Before Optimizing”).

8.7.1. Things to Remember

✦ The list type can be used as a FIFO queue by having the producer call append to add items and the consumer call pop(0) to receive items. However, this may cause problems because the performance of pop(0) degrades superlinearly as the queue length increases.

✦ The deque class from the collections built-in module takes constant time—regardless of length—for append and popleft, making it ideal for FIFO queues.