5.1. Compose Classes Instead of Nesting Many Levels of Built-in Types

Python’s built-in dictionary type is wonderful for maintaining dynamic internal state over the lifetime of an object. By dynamic, I mean situations in which you need to do bookkeeping for an unexpected set of identifiers. For example, say that I want to record the grades of a set of students whose names aren’t known in advance. I can define a class to store the names in a dictionary instead of using a predefined attribute for each student:

>>> class SimpleGradebook:
>>>     def __init__(self):
>>>         self._grades = {}
>>>
>>>     def add_student(self, name):
>>>         self._grades[name] = []
>>>
>>>     def report_grade(self, name, score):
>>>         self._grades[name].append(score)
>>>
>>>     def average_grade(self, name):
>>>         grades = self._grades[name]
>>>         return sum(grades) / len(grades)

Using the class is simple:

>>> book = SimpleGradebook()
>>> book.add_student('Isaac Newton')
>>> book.report_grade('Isaac Newton', 90)
>>> book.report_grade('Isaac Newton', 95)
>>> book.report_grade('Isaac Newton', 85)
>>>
>>> print(book.average_grade('Isaac Newton'))
90.0

Dictionaries and their related built-in types are so easy to use that there’s a danger of overextending them to write brittle code. For example, say that I want to extend the SimpleGradebook class to keep a list of grades by subject, not just overall. I can do this by changing the _grades dictionary to map student names (its keys) to yet another dictionary (its values). The innermost dictionary will map subjects (its keys) to a list of grades (its values). Here, I do this by using a defaultdict instance for the inner dictionary to handle missing subjects (see Item 17: “Prefer defaultdict Over setdefault to Handle Missing Items in Internal State” for background):

>>> from collections import defaultdict
>>>
>>> class BySubjectGradebook:
>>>     def __init__(self):
>>>         self._grades = {}                     # Outer dict
>>>
>>>     def add_student(self, name):
>>>         self._grades[name] = defaultdict(list) # Inner dict

This seems straightforward enough. The report_grade and average_grade methods gain quite a bit of complexity to deal with the multilevel dictionary, but it’s seemingly manageable:

>>> from collections import defaultdict
>>>
>>> class BySubjectGradebook:
>>>     def __init__(self):
>>>         self._grades = {}                     # Outer dict
>>>
>>>     def add_student(self, name):
>>>         self._grades[name] = defaultdict(list) # Inner dict
>>>     def report_grade(self, name, subject, grade):
>>>         by_subject = self._grades[name]
>>>         grade_list = by_subject[subject]
>>>         grade_list.append(grade)
>>>
>>>     def average_grade(self, name):
>>>         by_subject = self._grades[name]
>>>         total, count = 0, 0
>>>         for grades in by_subject.values():
>>>             total += sum(grades)
>>>             count += len(grades)
>>>         return total / count

Using the class remains simple:

>>> book = BySubjectGradebook()
>>> book.add_student('Albert Einstein')
>>> book.report_grade('Albert Einstein', 'Math', 75)
>>> book.report_grade('Albert Einstein', 'Math', 65)
>>> book.report_grade('Albert Einstein', 'Gym', 90)
>>> book.report_grade('Albert Einstein', 'Gym', 95)
>>> print(book.average_grade('Albert Einstein'))
81.25

Now, imagine that the requirements change again. I also want to track the weight of each score toward the overall grade in the class so that midterm and final exams are more important than pop quizzes. One way to implement this feature is to change the innermost dictionary; instead of mapping subjects (its keys) to a list of grades (its values), I can use the tuple of (score, weight) in the values list:

>>> class WeightedGradebook:
>>>     def __init__(self):
>>>         self._grades = {}
>>>
>>>     def add_student(self, name):
>>>         self._grades[name] = defaultdict(list)
>>>
>>>     def report_grade(self, name, subject, score, weight):
>>>         by_subject = self._grades[name]
>>>         grade_list = by_subject[subject]
>>>         grade_list.append((score, weight))

Although the changes to report_grade seem simple—just make the grade list store tuple instances—the average_grade method now has a loop within a loop and is difficult to read:

>>> class WeightedGradebook:
>>>     def __init__(self):
>>>         self._grades = {}
>>>
>>>     def add_student(self, name):
>>>         self._grades[name] = defaultdict(list)
>>>
>>>     def report_grade(self, name, subject, score, weight):
>>>         by_subject = self._grades[name]
>>>         grade_list = by_subject[subject]
>>>         grade_list.append((score, weight))
>>>     def average_grade(self, name):
>>>         by_subject = self._grades[name]
>>>
>>>         score_sum, score_count = 0, 0
>>>         for subject, scores in by_subject.items():
>>>             subject_avg, total_weight = 0, 0
>>>         for score, weight in scores:
>>>             subject_avg += score * weight
>>>             total_weight += weight
>>>
>>>         score_sum += subject_avg / total_weight
>>>         score_count += 1
>>>
>>>         return score_sum / score_count

Using the class has also gotten more difficult. It’s unclear what all of the numbers in the positional arguments mean:

>>> book = WeightedGradebook()
>>> book.add_student('Albert Einstein')
>>> book.report_grade('Albert Einstein', 'Math', 75, 0.05)
>>> book.report_grade('Albert Einstein', 'Math', 65, 0.15)
>>> book.report_grade('Albert Einstein', 'Math', 70, 0.80)
>>> book.report_grade('Albert Einstein', 'Gym', 100, 0.40)
>>> book.report_grade('Albert Einstein', 'Gym', 85, 0.60)
>>> print(book.average_grade('Albert Einstein'))
91.0

When you see complexity like this, it’s time to make the leap from built-in types like dictionaries, tuples, sets, and lists to a hierarchy of classes.

In the grades example, at first I didn’t know I’d need to support weighted grades, so the complexity of creating classes seemed unwarranted. Python’s built-in dictionary and tuple types made it easy to keep going, adding layer after layer to the internal bookkeeping. But you should avoid doing this for more than one level of nesting; using dictionaries that contain dictionaries makes your code hard to read by other programmers and sets you up for a maintenance nightmare.

As soon as you realize that your bookkeeping is getting complicated, break it all out into classes. You can then provide well-defined interfaces that better encapsulate your data. This approach also enables you to create a layer of abstraction between your interfaces and your concrete implementations.

Refactoring to Classes There are many approaches to refactoring (see Item 89: “Consider warnings to Refactor and Migrate Usage” for another). In this case, I can start moving to classes at the bottom of the dependency tree: a single grade. A class seems too heavyweight for such simple information. A tuple, though, seems appropriate because grades are immutable. Here, I use the tuple of (score, weight) to track grades in a list:

>>> grades = []
>>> grades.append((95, 0.45))
>>> grades.append((85, 0.55))
>>> total = sum(score * weight for score, weight in grades)
>>> total_weight = sum(weight for _, weight in grades)
>>> average_grade = total / total_weight

I used _ (the underscore variable name, a Python convention for unused variables) to capture the first entry in each grade’s tuple and ignore it when calculating the total_weight.

The problem with this code is that tuple instances are positional. For example, if I want to associate more information with a grade, such as a set of notes from the teacher, I need to rewrite every usage of the two-tuple to be aware that there are now three items present instead of two, which means I need to use _ further to ignore certain indexes:

>>> grades = []
>>> grades.append((95, 0.45, 'Great job'))
>>> grades.append((85, 0.55, 'Better next time'))
>>> total = sum(score * weight for score, weight, _ in grades)
>>> total_weight = sum(weight for _, weight, _ in grades)
>>> average_grade = total / total_weight

This pattern of extending tuples longer and longer is similar to deepening layers of dictionaries. As soon as you find yourself going longer than a two-tuple, it’s time to consider another approach.

The namedtuple type in the collections built-in module does exactly what I need in this case: It lets me easily define tiny, immutable data classes:

Click here to view code image

>>> from collections import namedtuple
>>>
>>> Grade = namedtuple('Grade', ('score', 'weight'))

These classes can be constructed with positional or keyword arguments. The fields are accessible with named attributes. Having named attributes makes it easy to move from a namedtuple to a class later if the requirements change again and I need to, say, support mutability or behaviors in the simple data containers.

5.1.1. Limitations of namedtuple

Although namedtuple is useful in many circumstances, it’s important to understand when it can do more harm than good:

  • You can’t specify default argument values for namedtuple classes. This makes them unwieldy when your data may have many optional properties. If you find yourself using more than a handful of attributes, using the built-in dataclasses module may be a better choice.

  • The attribute values of namedtuple instances are still accessible using numerical indexes and iteration. Especially in externalized APIs, this can lead to unintentional usage that makes it harder to move to a real class later. If you’re not in control of all of the usage of your namedtuple instances, it’s better to explicitly define a new class.

Next, I can write a class to represent a single subject that contains a set of grades:

>>> class Subject:
>>>     def __init__(self):
>>>         self._grades = []
>>>
>>>     def report_grade(self, score, weight):
>>>         self._grades.append(Grade(score, weight))
>>>
>>>     def average_grade(self):
>>>         total, total_weight = 0, 0
>>>         for grade in self._grades:
>>>             total += grade.score * grade.weight
>>>             total_weight += grade.weight
>>>         return total / total_weight

Then, I write a class to represent a set of subjects that are being studied by a single student:

>>> class Student:
>>>     def __init__(self):
>>>         self._subjects = defaultdict(Subject)
>>>
>>>     def get_subject(self, name):
>>>         return self._subjects[name]
>>>
>>>     def average_grade(self):
>>>         total, count = 0, 0
>>>         for subject in self._subjects.values():
>>>             total += subject.average_grade()
>>>             count += 1
>>>         return total / count

Finally, I’d write a container for all of the students, keyed dynamically by their names:

>>> class Gradebook:
>>>     def __init__(self):
>>>         self._students = defaultdict(Student)
>>>
>>>     def get_student(self, name):
>>>         return self._students[name]

The line count of these classes is almost double the previous implementation’s size. But this code is much easier to read. The example driving the classes is also more clear and extensible:

>>> book = Gradebook()
>>> albert = book.get_student('Albert Einstein')
>>> math = albert.get_subject('Math')
>>> math.report_grade(75, 0.05)
>>> math.report_grade(65, 0.15)
>>> math.report_grade(70, 0.80)
>>> gym = albert.get_subject('Gym')
>>> gym.report_grade(100, 0.40)
>>> gym.report_grade(85, 0.60)
>>> print(albert.average_grade())
80.25

It would also be possible to write backward-compatible methods to help migrate usage of the old API style to the new hierarchy of objects.

5.1.2. Things to Remember

  • Avoid making dictionaries with values that are dictionaries, long tuples, or complex nestings of other built-in types.

  • Use namedtuple for lightweight, immutable data containers before you need the flexibility of a full class.

  • Move your bookkeeping code to using multiple classes when your internal state dictionaries get complicated.