7.2. Don’t Copy Mutable Values Without copy.copy() and copy.deepcopy()

It’s better to think of variables as labels or name tags that refer to objects rather than as boxes that contain objects. This mental model is especially useful when it comes to modifying mutable objects: objects such as lists, dictionaries, and sets whose value can mutate (that is, change). A common gotcha occurs when copying one variable that refers to a mutable object to another variable and thinking that the actual object is being copied. In Python, assignment statements never copy objects; they only copy the refer- ences to an object. (Python developer Ned Batchelder has a great PyCon 2015 talk on this idea titled, “Facts and Myths about Python Names and Values.” Watch it at https://youtu.be/_AEJHKGk9ns.)

For example, enter the following code into the interactive shell, and note that even though we change the spam variable only, the cheese variable changes as well:

>>> spam = ['cat', 'dog', 'eel']
>>> cheese = spam
>>> spam
['cat', 'dog', 'eel']
>>> cheese
['cat', 'dog', 'eel']
>>> spam[2] = 'MOOSE'
>>> spam
['cat', 'dog', 'MOOSE']
>>> cheese
['cat', 'dog', 'MOOSE']
>>> id(cheese), id(spam)
(139845815673728, 139845815673728)

A visualization of the execution of this code is at https://autbor.com/ listcopygotcha1. If you think that cheese = spam copied the list object, you might be surprised that cheese seems to have changed even though we only modified spam . But assignment statements never copy objects, only references to objects. The assignment statement cheese = spam makes cheese refer to the same list object in the computer’s memory as spam . It doesn’t duplicate the list object. This is why changing spam also changes cheese : both variables refer to the same list object.

The same principle applies to mutable objects passed to a function call. Enter the following into the interactive shell, and note that the global variable spam and the local parameter (remember, parameters are variables defined in the function’s def statement) theList both refer to the same object:

>>> def printIdOfParam(theList):
>>>
>>>     print(id(theList))
>>> eggs = ['cat', 'dog', 'eel']
>>> print(id(eggs))
139845815681728
>>> printIdOfParam(eggs)
139845815681728

A visualization of the execution of this code is at https://autbor.com/ listcopygotcha2. Notice that the identities returned by id() for eggs and theList are the same, meaning these variables refer to the same list object. The eggs variable’s list object wasn’t copied to theList ; rather, the reference was copied, which is why both variables refer to the same list. A reference is only a few bytes in size, but imagine if Python copied the entire list instead of just the reference. If eggs contained a billion items instead of just three, passing it to the printIdOfParam() function would require copying this giant list. This would eat up gigabytes of memory just to do a simple function call! That’s why Python assignment only copies references and never copies objects.

One way to prevent this gotcha is to make a copy of the list object (not just the reference) with the copy.copy() function. Enter the following into the interactive shell:

>>> import copy
>>> bacon = [2, 4, 8, 16]
>>> ham = copy.copy(bacon)
>>> id(bacon), id(ham)
(139845815682112, 139845815681536)
>>> bacon[0] = 'CHANGED'
>>> bacon
['CHANGED', 4, 8, 16]
>>> ham
[2, 4, 8, 16]
>>> id(bacon), id(ham)
(139845815682112, 139845815681536)

A visualization of the execution of this code is at https://autbor.com/ copycopy1. The ham variable refers to a copied list object rather than the original list object referred to by bacon , so it doesn’t suffer from this gotcha.

But just as variables are like labels or name tags rather than boxes that contain objects, lists also contain labels or name tags that refer to objects rather than the actual objects. If your list contains other lists, copy.copy() only copies the references to these inner lists. Enter the following into the interactive shell to see this problem:

>>> import copy
>>> bacon = [[1, 2], [3, 4]]
>>> ham = copy.copy(bacon)
>>> id(bacon), id(ham)
(139845815689472, 139845815683264)
>>> bacon.append('APPENDED')
>>> bacon
[[1, 2], [3, 4], 'APPENDED']
>>> ham
[[1, 2], [3, 4]]
>>> bacon[0][0] = 'CHANGED'
>>> bacon
[['CHANGED', 2], [3, 4], 'APPENDED']
>>> ham
[['CHANGED', 2], [3, 4]]
>>> id(bacon[0]), id(ham[0])
(139845815684608, 139845815684608)

A visualization of the execution of this code is at https://autbor.com/ copycopy2. Although bacon and ham are two different list objects, they refer to the same [1, 2] and [3, 4] inner lists, so changes to these inner lists get reflected in both variables, even though we used copy.copy() . The solution is to use copy.deepcopy() , which will make copies of any list objects inside the list object being copied (and any list objects in those list objects, and so on). Enter the following into the interactive shell:

>>> import copy
>>> bacon = [[1, 2], [3, 4]]
>>> ham = copy.deepcopy(bacon)
>>> id(bacon[0]), id(ham[0])
(139845815694080, 139845815694528)
>>> bacon[0][0] = 'CHANGED'
>>> bacon
[['CHANGED', 2], [3, 4]]
>>> ham
[[1, 2], [3, 4]]

A visualization of the execution of this code is at https://autbor.com/ copydeepcopy. Although copy.deepcopy() is slightly slower than copy.copy() , it’s safer to use if you don’t know whether the list being copied contains other lists (or other mutable objects like dictionaries or sets). My general advice is to always use copy.deepcopy() : it might prevent subtle bugs, and the slowdown in your code probably won’t be noticeable.