9.8. Use tracemalloc to Understand Memory Usage and Leaks¶
Memory management in the default implementation of Python, CPython, uses reference counting. This ensures that as soon as all references to an object have expired, the referenced object is also cleared from memory, freeing up that space for other data. CPython also has a built-in cycle detector to ensure that self-referencing objects are eventually garbage collected.
In theory, this means that most Python programmers don’t have to worry about allocating or deallocating memory in their programs. It’s taken care of automatically by the language and the CPython runtime. However, in practice, programs eventually do run out of memory due to no longer useful references still being held. Figuring out where a Python program is using or leaking memory proves to be a challenge.
The first way to debug memory usage is to ask the gc built-in module to list every object currently known by the garbage collector. Although it’s quite a blunt tool, this approach lets you quickly get a sense of where your program’s memory is being used.
Here, I define a module that fills up memory by keeping references:
>>> # waste_memory.py
>>> import os
>>>
>>> class MyObject:
>>> def __init__(self):
>>> self.data = os.urandom(100)
>>>
>>> def get_data():
>>> values = []
>>> for _ in range(100):
>>> obj = MyObject()
>>> values.append(obj)
>>> return values
>>>
>>> def run():
>>> deep_values = []
>>> for _ in range(100):
>>> deep_values.append(get_data())
>>> return deep_values
Then, I run a program that uses the gc built-in module to print out how many objects were created during execution, along with a small sample of allocated objects:
# using_gc.py import gc
found_objects = gc.get_objects() print('Before:', len(found_objects))
import waste_memory
hold_reference = waste_memory.run()
found_objects = gc.get_objects() print('After: ', len(found_objects)) for obj in found_objects[:3]:
print(repr(obj)[:100])
>>>
Before: 6207
After: 16801
<waste_memory.MyObject object at 0x10390aeb8> <waste_memory.MyObject object at 0x10390aef0> <waste_memory.MyObject object at 0x10390af28> ...
The problem with gc.get_objects is that it doesn’t tell you anything about how the objects were allocated. In complicated programs, objects of a specific class could be allocated many different ways. Knowing the overall number of objects isn’t nearly as important as identifying the code responsible for allocating the objects that are leaking memory.
Python 3.4 introduced a new tracemalloc built-in module for solving this problem. tracemalloc makes it possible to connect an object back to where it was allocated. You use it by taking before and after snapshots of memory usage and comparing them to see what’s changed. Here, I use this approach to print out the top three memory usage offenders in a program:
# top_n.py import tracemalloc
tracemalloc.start(10) # Set stack depth time1 = tracemalloc.take_snapshot() # Before snapshot
import waste_memory
x = waste_memory.run() # Usage to debug time2 = tracemalloc.take_snapshot() # After snapshot
stats = time2.compare_to(time1, 'lineno') # Compare snapshots for stat in stats[:3]:
print(stat)
>>>
waste_memory.py:5: size=2392 KiB (+2392 KiB), count=29994
➥(+29994), average=82 B
waste_memory.py:10: size=547 KiB (+547 KiB), count=10001
➥(+10001), average=56 B
waste_memory.py:11: size=82.8 KiB (+82.8 KiB), count=100
➥(+100), average=848 B
The size and count labels in the output make it immediately clear which objects are dominating my program’s memory usage and where in the source code they were allocated.
The tracemalloc module can also print out the full stack trace of each allocation (up to the number of frames passed to the tracemalloc.start function). Here, I print out the stack trace of the biggest source of memory usage in the program:
# with_trace.py import tracemalloc
tracemalloc.start(10) time1 = tracemalloc.take_snapshot() import waste_memory
x = waste_memory.run() time2 = tracemalloc.take_snapshot()
stats = time2.compare_to(time1, 'traceback') top = stats[0] print('Biggest offender is:') print('n'.join(top.traceback.format()))
>>>
Biggest offender is:
File "waste_memory.py", line 5
self.data = os.urandom(100)
A stack trace like this is most valuable for figuring out which particular usage of a common function or class is responsible for memory consumption in a program.
9.8.1. Things to Remember¶
✦ It can be difficult to understand how Python programs use and leak memory.
✦ The gc module can help you understand which objects exist, but it has no information about how they were allocated.
✦ The tracemalloc built-in module provides powerful tools for understanding the sources of memory usage.