Initializing nested lists correctly
This post is part of the programming series.
If you want to initialize a list with a certain size in Python, you can use the following clean syntax:
>>> arr = [None]*3 >>> arr [None, None, None]
We can then fill the list with elements:
>>> arr = 1 >>> arr [None, 1, None]
But watch what happens when we try to use the same syntax for declaring a 2D dynamic array:
>>> arr2 = []*3 >>> arr2 [, , ] >>> arr2.append(1) >>> arr2 [, , ]
The desired result here was the ragged array
[, , ], but instead we accidentally filled all nested lists…
Well, we observe here that the
* operator creates a list where the elements reference the same underlying object!
>>> id(arr2) 1474142748360 >>> id(arr2) 1474142748360 >>> id(arr2) 1474142748360
So that explains why adjusting one sublist affects all sublists: they are the same object.
But why did we then not have the same issue when initializing the flat empty list with None objects?
* operator works exactly the same here and also creates a reference to the same object.
>>> id(arr) 140720689536224 >>> id(arr) 140720689536224
But if we inspect the element where we filled in a value, we do see that it is a new object:
>>> id(arr) 140720690012576
The crucial difference is that this
NoneType object is immutable.
The referenced object cannot be changed but is rather replaced with a new object.
The same reasoning holds when we have a list of integers or strings.
In case of a list of lists, or a list of dictionaries (any mutable data structure) however, we can adjust the referenced object and then the change will reflect onto all sublists.
Because something like
*3 works as expected, it can be hard to spot the difference in behavior when working with nested mutable data structures.
If we explicitly replace a whole sublist with a new object, there’s no issue:
>>> arr2 =  >>> arr2 [, , ]
This is not a practical solution though, because we want to be able to use functions like
append() on sublists correctly.
The general solution is to force Python to make a new object for each sublist, which means - however nice and clean the syntax looks - we have to avoid using
* for this!
Instead, create the sublists in an explicit loop or for example a list comprehension:
>>> arr3 = [ for _ in range(3)] >>> arr3 [, , ] >>> arr3.append(1) >>> arr3 [, , ]
This way each of the sublists is its own object, rather than being a reference to the same list, because we force Python to evaluate
 3 times instead of only once.