2.5. Be Cautious When Relying on dict Insertion Ordering¶
In Python 3.5 and before, iterating over a dict would return keys in arbitrary order. The order of iteration would not match the order in which the items were inserted. For example, here I create a dictionary mapping animal names to their corresponding baby names and then print it out (see Item 75: “Use repr Strings for Debugging Output” for how this works):
>>> # Python 3.5
>>> baby_names = {
>>> 'cat': 'kitten',
>>> 'dog': 'puppy',
>>> }
>>> print(baby_names)
{'cat': 'kitten', 'dog': 'puppy'}
When I created the dictionary the keys were in the order ‘cat’, ‘dog’, but when I printed it the keys were in the reverse order ‘dog’, ‘cat’. This behavior is surprising, makes it harder to reproduce test cases, increases the difficulty of debugging, and is especially confusing to newcomers to Python.
This happened because the dictionary type previously implemented its hash table algorithm with a combination of the hash built-in function and a random seed that was assigned when the Python interpreter started. Together, these behaviors caused dictionary orderings to not match insertion order and to randomly shuffle between program executions.
Starting with Python 3.6, and officially part of the Python specification in version 3.7, dictionaries will preserve insertion order. Now, this code will always print the dictionary in the same way it was originally created by the programmer:
>>> baby_names = {
>>> 'cat': 'kitten',
>>> 'dog': 'puppy',
>>> }
>>> print(baby_names)
{'cat': 'kitten', 'dog': 'puppy'}
With Python 3.5 and earlier, all methods provided by dict that relied on iteration order, including keys, values, items, and popitem, would similarly demonstrate this random-looking behavior:
>>> # Python 3.5
>>> print(list(baby_names.keys()))
>>> print(list(baby_names.values()))
>>> print(list(baby_names.items()))
>>> print(baby_names.popitem()) # Randomly chooses an item
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipykernel_87/2223537970.py in <module>
1 # Python 3.5
----> 2 print(list(baby_names.keys()))
3 print(list(baby_names.values()))
4 print(list(baby_names.items()))
5 print(baby_names.popitem()) # Randomly chooses an item
NameError: name 'baby_names' is not defined
These methods now provide consistent insertion ordering that you can rely on when you write your programs:
>>> print(list(baby_names.keys()))
>>> print(list(baby_names.values()))
>>> print(list(baby_names.items()))
>>> print(baby_names.popitem()) # Last item inserted
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipykernel_87/2887977299.py in <module>
----> 1 print(list(baby_names.keys()))
2 print(list(baby_names.values()))
3 print(list(baby_names.items()))
4 print(baby_names.popitem()) # Last item inserted
NameError: name 'baby_names' is not defined
There are many repercussions of this change on other Python features that are dependent on the dict type and its specific implementation.
Keyword arguments to functions—including the **kwargs catch-all parameter; see Item 23: “Provide Optional Behavior with Keyword Arguments”—previously would come through in seemingly random order, which can make it harder to debug function calls:
>>> # Python 3.5
>>> def my_func(**kwargs):
>>> for key, value in kwargs.items():
>>> print('%s = %s' % (key, value))
>>>
>>> my_func(goose='gosling', kangaroo='joey')
goose = gosling
kangaroo = joey
Now, the order of keyword arguments is always preserved to match how the programmer originally called the function:
>>> def my_func(**kwargs):
>>> for key, value in kwargs.items():
>>> print(f'{key} = {value}')
>>>
>>> my_func(goose='gosling', kangaroo='joey')
goose = gosling
kangaroo = joey
Classes also use the dict type for their instance dictionaries. In previous versions of Python, object fields would show the randomizing behavior:
>>> # Python 3.5
>>> class MyClass:
>>> def __init__(self):
>>> self.alligator = 'hatchling'
>>> self.elephant = 'calf'
>>> a = MyClass()
>>> for key, value in a.__dict__.items():
>>> print('%s = %s' % (key, value))
alligator = hatchling
elephant = calf
Again, you can now assume that the order of assignment for these instance fields will be reflected in __dict__:
>>> class MyClass:
>>> def __init__(self):
>>> self.alligator = 'hatchling'
>>> self.elephant = 'calf'
>>> a = MyClass()
>>> for key, value in a.__dict__.items():
>>> print(f'{key} = {value}')
alligator = hatchling
elephant = calf
The way that dictionaries preserve insertion ordering is now part of the Python language specification. For the language features above, you can rely on this behavior and even make it part of the APIs you design for your classes and functions.
Note
For a long time the collections built-in module has had an OrderedDict class that preserves insertion ordering. Although this class’s behavior is similar to that of the standard dict type (since Python 3.7), the performance characteristics of OrderedDict are quite different. If you need to handle a high rate of key insertions and popitem calls (e.g., to implement a least-recently-used cache), OrderedDict may be a better fit than the standard Python dict type (see Item 70: “Profile Before Optimizing” on how to make sure you need this).
However, you shouldn’t always assume that insertion ordering behavior will be present when you’re handling dictionaries. Python makes it easy for programmers to define their own custom container types that emulate the standard protocols matching list, dict, and other types (see Item 43: “Inherit from collections.abc for Custom Container Types”). Python is not statically typed, so most code relies on duck typing—where an object’s behavior is its de facto type—instead of rigid class hierarchies. This can result in surprising gotchas.
For example, say that I’m writing a program to show the results of a contest for the cutest baby animal. Here, I start with a dictionary containing the total vote count for each one:
>>> votes = {
>>> 'otter': 1281,
>>> 'polar bear': 587,
>>> 'fox': 863,
>>> }
I define a function to process this voting data and save the rank of each animal name into a provided empty dictionary. In this case, the dictionary could be the data model that powers a UI element:
>>> def populate_ranks(votes, ranks):
>>> names = list(votes.keys())
>>> names.sort(key=votes.get, reverse=True)
>>> for i, name in enumerate(names, 1):
>>> ranks[name] = i
I also need a function that will tell me which animal won the contest. This function works by assuming that populate_ranks will assign the contents of the ranks dictionary in ascending order, meaning that the first key must be the winner:
>>> def get_winner(ranks):
>>> return next(iter(ranks))
Here, I can confirm that these functions work as designed and deliver the result that I expected:
>>> ranks = {}
>>> populate_ranks(votes, ranks)
>>> print(ranks)
>>> winner = get_winner(ranks)
>>> print(winner)
{'otter': 1, 'fox': 2, 'polar bear': 3}
otter
Now, imagine that the requirements of this program have changed. The UI element that shows the results should be in alphabetical order instead of rank order. To accomplish this, I can use the collections.abc built-in module to define a new dictionary-like class that iterates its contents in alphabetical order:
>>> from collections.abc import MutableMapping
>>>
>>> class SortedDict(MutableMapping):
>>> def __init__(self):
>>> self.data = {}
>>>
>>> def __getitem__(self, key):
>>> return self.data[key]
>>> def __setitem__(self, key, value):
>>> self.data[key] = value
>>>
>>> def __delitem__(self, key):
>>> del self.data[key]
>>> def __iter__(self):
>>> keys = list(self.data.keys())
>>> keys.sort()
>>> for key in keys:
>>> yield key
>>> def __len__(self):
>>> return len(self.data)
I can use a SortedDict instance in place of a standard dict with the functions from before and no errors will be raised since this class conforms to the protocol of a standard dictionary. However, the result is incorrect:
>>> sorted_ranks = SortedDict()
>>> populate_ranks(votes, sorted_ranks)
>>> print(sorted_ranks.data)
>>> winner = get_winner(sorted_ranks)
>>> print(winner)
{'otter': 1, 'fox': 2, 'polar bear': 3}
fox
The problem here is that the implementation of get_winner assumes that the dictionary’s iteration is in insertion order to match populate_ranks. This code is using SortedDict instead of dict, so that assumption is no longer true. Thus, the value returned for the winner is ‘fox’, which is alphabetically first.
There are three ways to mitigate this problem. First, I can reimplement the get_winner function to no longer assume that the ranks dictionary has a specific iteration order. This is the most conservative and robust solution:
>>> def get_winner(ranks):
>>> for name, rank in ranks.items():
>>> if rank == 1:
>>> return name
>>>
>>> winner = get_winner(sorted_ranks)
>>> print(winner)
otter
The second approach is to add an explicit check to the top of the function to ensure that the type of ranks matches my expectations, and to raise an exception if not. This solution likely has better runtime performance than the more conservative approach:
- def get_winner(ranks):
- if not isinstance(ranks, dict):
raise TypeError('must provide a dict instance')
return next(iter(ranks))
get_winner(sorted_ranks)
>>>
Traceback ...
TypeError: must provide a dict instance
The third alternative is to use type annotations to enforce that the value passed to get_winner is a dict instance and not a MutableMapping with dictionary-like behavior (see Item 90: “Consider Static Analysis via typing to Obviate Bugs”). Here, I run the mypy tool in strict mode on an annotated version of the code above:
from typing import Dict, MutableMapping def populate_ranks(votes: Dict[str, int],
ranks: Dict[str, int]) -> None: names = list(votes.keys()) names.sort(key=votes.get, reverse=True) for i, name in enumerate(names, 1):
ranks[name] = i
- def get_winner(ranks: Dict[str, int]) -> str:
return next(iter(ranks))
- class SortedDict(MutableMapping[str, int]):
...
votes = { 'otter': 1281, 'polar bear': 587, 'fox': 863, } sorted_ranks = SortedDict() populate_ranks(votes, sorted_ranks) print(sorted_ranks.data) winner = get_winner(sorted_ranks) print(winner)
$ python3 -m mypy --strict example.py .../example.py:48: error: Argument 2 to "populate_ranks" has ➥incompatible type "SortedDict"; expected "Dict[str, int]" .../example.py:50: error: Argument 1 to "get_winner" has ➥incompatible type "SortedDict"; expected "Dict[str, int]"
This correctly detects the mismatch between the dict and MutableMapping types and flags the incorrect usage as an error. This solution provides the best mix of static type safety and runtime performance.
2.5.1. Things to Remember¶
✦ Since Python 3.7, you can rely on the fact that iterating a dict instance’s contents will occur in the same order in which the keys were initially added.
✦ Python makes it easy to define objects that act like dictionaries but that aren’t dict instances. For these types, you can’t assume that insertion ordering will be preserved.
✦ There are three ways to be careful about dictionary-like classes: Write code that doesn’t rely on insertion ordering, explicitly check for the dict type at runtime, or require dict values using type annotations and static analysis.