6.6. Register Class Existence with __init_subclass__¶
Another common use of metaclasses is to automatically register types in a program. Registration is useful for doing reverse lookups, where you need to map a simple identifier back to a corresponding class.
For example, say that I want to implement my own serialized representation of a Python object using JSON. I need a way to turn an object into a JSON string. Here, I do this generically by defining a base class that records the constructor parameters and turns them into a JSON dictionary:
>>> import json
>>>
>>> class Serializable:
>>> def __init__(self, *args):
>>> self.args = args
>>>
>>> def serialize(self):
>>> return json.dumps({'args': self.args})
This class makes it easy to serialize simple, immutable data structures like Point2D to a string:
>>> class Point2D(Serializable):
>>> def __init__(self, x, y):
>>> super().__init__(x, y)
>>> self.x = x
>>> self.y = y
>>>
>>> def __repr__(self):
>>> return f'Point2D({self.x}, {self.y})'
>>> point = Point2D(5, 3)
>>> print('Object: ', point)
>>> print('Serialized:', point.serialize())
Object: Point2D(5, 3)
Serialized: {"args": [5, 3]}
Now, I need to deserialize this JSON string and construct the Point2D object it represents. Here, I define another class that can deserialize the data from its Serializable parent class:
>>> class Deserializable(Serializable):
>>> @classmethod
>>> def deserialize(cls, json_data):
>>> params = json.loads(json_data)
>>> return cls(*params['args'])
Using Deserializable makes it easy to serialize and deserialize simple, immutable objects in a generic way:
>>> class BetterPoint2D(Deserializable):
>>> ...
>>> before = BetterPoint2D(5, 3)
>>> print('Before: ', before)
>>> data = before.serialize()
>>> print('Serialized:', data)
>>> after = BetterPoint2D.deserialize(data)
>>> print('After: ', after)
Before: <__main__.BetterPoint2D object at 0x7f94c4274b20>
Serialized: {"args": [5, 3]}
After: <__main__.BetterPoint2D object at 0x7f94c4274d60>
The problem with this approach is that it works only if you know the intended type of the serialized data ahead of time (e.g., Point2D, BetterPoint2D). Ideally, you’d have a large number of classes serializing to JSON and one common function that could deserialize any of them back to a corresponding Python object.
To do this, I can include the serialized object’s class name in the JSON data:
>>> class BetterSerializable:
>>> def __init__(self, *args):
>>> self.args = args
>>>
>>> def serialize(self):
>>> return json.dumps({
>>> 'class': self.__class__.__name__,
>>> 'args': self.args,
>>> })
>>>
>>> def __repr__(self):
>>> name = self.__class__.__name__
>>> args_str = ', '.join(str(x) for x in self.args)
>>> return f'{name}({args_str})'
Then, I can maintain a mapping of class names back to constructors for those objects. The general deserialize function works for any classes passed to register_class:
>>> registry = {}
>>>
>>> def register_class(target_class):
>>> registry[target_class.__name__] = target_class
>>>
>>> def deserialize(data):
>>> params = json.loads(data)
>>>
>>> name = params['class']
>>> target_class = registry[name]
>>> return target_class(*params['args'])
To ensure that deserialize always works properly, I must call register_class for every class I may want to deserialize in the future:
>>> class EvenBetterPoint2D(BetterSerializable):
>>> def __init__(self, x, y):
>>> super().__init__(x, y)
>>> self.x = x
>>> self.y = y
>>>
>>> register_class(EvenBetterPoint2D)
Now, I can deserialize an arbitrary JSON string without having to know which class it contains:
>>> before = EvenBetterPoint2D(5, 3)
>>> print('Before: ', before)
>>> data = before.serialize()
>>> print('Serialized:', data)
>>> after = deserialize(data)
>>> print('After: ', after)
Before: EvenBetterPoint2D(5, 3)
Serialized: {"class": "EvenBetterPoint2D", "args": [5, 3]}
After: EvenBetterPoint2D(5, 3)
The problem with this approach is that it’s possible to forget to call register_class:
>>> class Point3D(BetterSerializable):
>>> def __init__(self, x, y, z):
>>> super().__init__(x, y, z)
>>> self.x = x
>>> self.y = y
>>> self.z = z
>>>
>>> # Forgot to call register_class! Whoops!
This causes the code to break at runtime, when I finally try to deserialize an instance of a class I forgot to register:
point = Point3D(5, 9, -4) data = point.serialize() deserialize(data)
>>>
Traceback ...
KeyError: 'Point3D'
Even though I chose to subclass BetterSerializable, I don’t actually get all of its features if I forget to call register_class after the class statement body. This approach is error prone and especially challenging for beginners. The same omission can happen with class decorators (see Item 51: “Prefer Class Decorators Over Metaclasses for Composable Class Extensions” for when those are appropriate).
What if I could somehow act on the programmer’s intent to use BetterSerializable and ensure that register_class is called in all cases? Metaclasses enable this by intercepting the class statement when subclasses are defined (see Item 48: “Validate Subclasses with __init_subclass__” for details on the machinery). Here, I use a metaclass to register the new type immediately after the class’s body:
>>> class Meta(type):
>>> def __new__(meta, name, bases, class_dict):
>>> cls = type.__new__(meta, name, bases, class_dict)
>>> register_class(cls)
>>> return cls
>>>
>>> class RegisteredSerializable(BetterSerializable,
>>> metaclass=Meta):
>>> pass
When I define a subclass of RegisteredSerializable, I can be confident that the call to register_class happened and deserialize will always work as expected:
>>> class Vector3D(RegisteredSerializable):
>>> def __init__(self, x, y, z):
>>> super().__init__(x, y, z)
>>> self.x, self.y, self.z = x, y, z
>>>
>>> before = Vector3D(10, -7, 3)
>>> print('Before: ', before)
>>> data = before.serialize()
>>> print('Serialized:', data)
>>> print('After: ', deserialize(data))
Before: Vector3D(10, -7, 3)
Serialized: {"class": "Vector3D", "args": [10, -7, 3]}
After: Vector3D(10, -7, 3)
An even better approach is to use the __init_subclass__ special class method. This simplified syntax, introduced in Python 3.6, reduces the visual noise of applying custom logic when a class is defined. It also makes it more approachable to beginners who may be confused by the complexity of metaclass syntax:
>>> class BetterRegisteredSerializable(BetterSerializable):
>>> def __init_subclass__(cls):
>>> super().__init_subclass__()
>>> register_class(cls)
>>>
>>> class Vector1D(BetterRegisteredSerializable):
>>> def __init__(self, magnitude):
>>> super().__init__(magnitude)
>>> self.magnitude = magnitude
>>>
>>> before = Vector1D(6)
>>> print('Before: ', before)
>>> data = before.serialize()
>>> print('Serialized: ', data)
>>> print('After: ', deserialize(data))
Before: Vector1D(6)
Serialized: {"class": "Vector1D", "args": [6]}
After: Vector1D(6)
By using __init_subclass__ (or metaclasses) for class registration, you can ensure that you’ll never miss registering a class as long as the inheritance tree is right. This works well for serialization, as I’ve shown, and also applies to database object-relational mappings (ORMs), extensible plug-in systems, and callback hooks.
6.6.1. Things to Remember¶
✦ Class registration is a helpful pattern for building modular Python programs.
✦ Metaclasses let you run registration code automatically each time a base class is subclassed in a program.
✦ Using metaclasses for class registration helps you avoid errors by ensuring that you never miss a registration call.
✦ Prefer __init_subclass__ over standard metaclass machinery because it’s clearer and easier for beginners to understand.