2.4. Sort by Complex Criteria Using the key Parameter¶
The list built-in type provides a sort method for ordering the items in a list instance based on a variety of criteria. By default, sort will order a list’s contents by the natural ascending order of the items. For example, here I sort a list of integers from smallest to largest:
>>> numbers = [93, 86, 11, 68, 70]
>>> numbers.sort()
>>> print(numbers)
[11, 68, 70, 86, 93]
The sort method works for nearly all built-in types (strings, floats, etc.) that have a natural ordering to them. What does sort do with objects? For example, here I define a class—including a repr method so instances are printable; see Item 75: “Use repr Strings for Debugging Output”—to represent various tools you may need to use on a construction site:
>>> class Tool:
>>> def __init__(self, name, weight):
>>> self.name = name
>>> self.weight = weight
>>>
>>> def __repr__(self):
>>> return f'Tool({self.name!r}, {self.weight})'
>>>
>>> tools = [
>>> Tool('level', 3.5),
>>> Tool('hammer', 1.25),
>>> Tool('screwdriver', 0.5),
>>> Tool('chisel', 0.25),
>>> ]
Sorting objects of this type doesn’t work because the sort method tries to call comparison special methods that aren’t defined by the class:
tools.sort()
>>>
Traceback ...
TypeError: '<' not supported between instances of 'Tool' and
'Tool'
If your class should have a natural ordering like integers do, then you can define the necessary special methods (see Item 73: “Know How to Use heapq for Priority Queues” for an example) to make sort work without extra parameters. But the more common case is that your objects may need to support multiple orderings, in which case defining a natural ordering really doesn’t make sense.
Often there’s an attribute on the object that you’d like to use for sorting. To support this use case, the sort method accepts a key parameter that’s expected to be a function. The key function is passed a single argument, which is an item from the list that is being sorted. The return value of the key function should be a comparable value (i.e., with a natural ordering) to use in place of an item for sorting purposes.
Here, I use the lambda keyword to define a function for the key parameter that enables me to sort the list of Tool objects alphabetically by their name:
>>> print('Unsorted:', repr(tools))
>>> tools.sort(key=lambda x: x.name)
>>> print('\nSorted: ', tools)
Unsorted: [Tool('level', 3.5), Tool('hammer', 1.25), Tool('screwdriver', 0.5), Tool('chisel', 0.25)]
Sorted: [Tool('chisel', 0.25), Tool('hammer', 1.25), Tool('level', 3.5), Tool('screwdriver', 0.5)]
I can just as easily define another lambda function to sort by weight and pass it as the key parameter to the sort method:
>>> tools.sort(key=lambda x: x.weight)
>>> print('By weight:', tools)
By weight: [Tool('chisel', 0.25), Tool('screwdriver', 0.5), Tool('hammer', 1.25), Tool('level', 3.5)]
Within the lambda function passed as the key parameter you can access attributes of items as I’ve done here, index into items (for sequences, tuples, and dictionaries), or use any other valid expression.
For basic types like strings, you may even want to use the key function to do transformations on the values before sorting. For example, here I apply the lower method to each item in a list of place names to ensure that they’re in alphabetical order, ignoring any capitalization (since in the natural lexical ordering of strings, capital letters come before lowercase letters):
>>> places = ['home', 'work', 'New York', 'Paris']
>>> places.sort()
>>> print('Case sensitive: ', places)
>>> places.sort(key=lambda x: x.lower())
>>> print('Case insensitive:', places)
Case sensitive: ['New York', 'Paris', 'home', 'work']
Case insensitive: ['home', 'New York', 'Paris', 'work']
Sometimes you may need to use multiple criteria for sorting. For example, say that I have a list of power tools and I want to sort them first by weight and then by name. How can I accomplish this?
>>> power_tools = [
>>> Tool('drill', 4),
>>> Tool('circular saw', 5),
>>> Tool('jackhammer', 40),
>>> Tool('sander', 4),
>>> ]
The simplest solution in Python is to use the tuple type. Tuples are immutable sequences of arbitrary Python values. Tuples are comparable by default and have a natural ordering, meaning that they implement all of the special methods, such as __lt__, that are required by the sort method. Tuples implement these special method comparators by iterating over each position in the tuple and comparing the corresponding values one index at a time. Here, I show how this works when one tool is heavier than another:
>>> saw = (5, 'circular saw')
>>> jackhammer = (40, 'jackhammer')
>>> assert not (jackhammer < saw) # Matches expectations
If the first position in the tuples being compared are equal—weight in this case—then the tuple comparison will move on to the second position, and so on:
>>> drill = (4, 'drill')
>>> sander = (4, 'sander')
>>> assert drill[0] == sander[0] # Same weight
>>> assert drill[1] < sander[1] # Alphabetically less
>>> assert drill < sander # Thus, drill comes first
You can take advantage of this tuple comparison behavior in order to sort the list of power tools first by weight and then by name. Here, I define a key function that returns a tuple containing the two attributes that I want to sort on in order of priority:
>>> power_tools.sort(key=lambda x: (x.weight, x.name))
>>> print(power_tools)
[Tool('drill', 4), Tool('sander', 4), Tool('circular saw', 5), Tool('jackhammer', 40)]
One limitation of having the key function return a tuple is that the direction of sorting for all criteria must be the same (either all in ascending order, or all in descending order). If I provide the reverse parameter to the sort method, it will affect both criteria in the tuple the same way (note how ‘sander’ now comes before ‘drill’ instead of after):
>>> power_tools.sort(key=lambda x: (x.weight, x.name),
>>> reverse=True) # Makes all criteria descending
>>> print(power_tools)
[Tool('jackhammer', 40), Tool('circular saw', 5), Tool('sander', 4), Tool('drill', 4)]
For numerical values it’s possible to mix sorting directions by using the unary minus operator in the key function. This negates one of the values in the returned tuple, effectively reversing its sort order while leaving the others intact. Here, I use this approach to sort by weight descending, and then by name ascending (note how ‘sander’ now comes after ‘drill’ instead of before):
>>> power_tools.sort(key=lambda x: (-x.weight, x.name))
>>> print(power_tools)
[Tool('jackhammer', 40), Tool('circular saw', 5), Tool('drill', 4), Tool('sander', 4)]
Unfortunately, unary negation isn’t possible for all types. Here, I try to achieve the same outcome by using the reverse argument to sort by weight descending and then negating name to put it in ascending order:
- power_tools.sort(key=lambda x: (x.weight, -x.name),
reverse=True)
>>>
Traceback ...
TypeError: bad operand type for unary -: 'str'
For situations like this, Python provides a stable sorting algorithm. The sort method of the list type will preserve the order of the input list when the key function returns values that are equal to each other. This means that I can call sort multiple times on the same list to combine different criteria together. Here, I produce the same sort ordering of weight descending and name ascending as I did above but by using two separate calls to sort:
>>> power_tools.sort(key=lambda x: x.name) # Name ascending
>>> power_tools.sort(key=lambda x: x.weight, # Weight descending
>>> reverse=True)
>>> print(power_tools)
[Tool('jackhammer', 40), Tool('circular saw', 5), Tool('drill', 4), Tool('sander', 4)]
To understand why this works, note how the first call to sort puts the names in alphabetical order:
>>> power_tools.sort(key=lambda x: x.name)
>>> print(power_tools)
[Tool('circular saw', 5), Tool('drill', 4), Tool('jackhammer', 40), Tool('sander', 4)]
When the second sort call by weight descending is made, it sees that both ‘sander’ and ‘drill’ have a weight of 4. This causes the sort method to put both items into the final result list in the same order that they appeared in the original list, thus preserving their relative ordering by name ascending:
>>> power_tools.sort(key=lambda x: x.weight,
>>> reverse=True)
>>> print(power_tools)
[Tool('jackhammer', 40), Tool('circular saw', 5), Tool('drill', 4), Tool('sander', 4)]
This same approach can be used to combine as many different types of sorting criteria as you’d like in any direction, respectively. You just need to make sure that you execute the sorts in the opposite sequence of what you want the final list to contain. In this example, I wanted the sort order to be by weight descending and then by name ascending, so I had to do the name sort first, followed by the weight sort.
That said, the approach of having the key function return a tuple, and using unary negation to mix sort orders, is simpler to read and requires less code. I recommend only using multiple calls to sort if it’s absolutely necessary.
2.4.1. Things to Remember¶
✦ The sort method of the list type can be used to rearrange a list’s contents by the natural ordering of built-in types like strings, integers, tuples, and so on.
✦ The sort method doesn’t work for objects unless they define a natural ordering using special methods, which is uncommon.
✦ The key parameter of the sort method can be used to supply a helper function that returns the value to use for sorting in place of each item from the list.
✦ Returning a tuple from the key function allows you to combine multiple sorting criteria together. The unary minus operator can be used to reverse individual sort orders for types that allow it.
✦ For types that can’t be negated, you can combine many sorting criteria together by calling the sort method multiple times using different key functions and reverse values, in the order of lowest rank sort call to highest rank sort call.