NumPy Indexing — the lists and tuples Gotcha

In a recent session of Python Foundations for Scientists & Engineers, a question came up about indexing a NumPy ndarray. Beyond getting and setting single values, NumPy enables some powerful efficiencies through slicing, which produces views of an array’s data without copying, and fancy indexing, which allows use of more-complex expressions to extract portions of arrays. We have written on the efficiency of array operations, and the details of slicing are pretty well covered, from the NumPy docs on slicing, to this chapter of “Beautiful Code” by the original author of NumPy, Travis Oliphant.

Slicing is pretty cool because it allows fast efficient computations of things like finite difference, for say, computing numerical derivatives. Recall that the derivative of a function describes the change in one variable with respect to another:

\frac{dy}{dx}

And in numerical computations, we can use a discrete approximation:

\frac{dy}{dx} \approx \frac{\Delta x}{\Delta y}

And to find the derivative at any particular location i, you compute the ratio of differences:

\frac{\Delta x}{\Delta y}\big|_i = \frac{x_{i+1} - x_{i}}{y_{i+1} - y{i}}

NumPy allows you to use slicing to avoid setting up costly-for-Python for: loops by specifying start, stop, and step values in the array indices. This lets you subtracting all of the i indices from the i+1 indices at the same time by specifying one slice that starts at element 1 and goes to the end (the i+1 indices), and another that starts at 0 and goes up to but not including the last element. No copies are made during the slicing operations. I use examples like this to show how you can get 2 and sometimes 3 or more orders of magnitude speedups of the same operation with for loops.

>>> import numpy as np

>>> x = np.linspace(-np.pi, np.pi, 101)
>>> y = np.sin(x)

>>> dy_dx = (
...     (y[1:] - y[:-1]) /
...     (x[1:] - x[:-1])
... )
>>> np.sum(dy_dx - np.cos(x[:-1] + (x[1]-x[0]) / 2))  # compare to cos(x)
np.float64(-6.245004513516506e-16)  # This is pretty close to 0

Fancy indexing is also well documented (but the NumPy docs now use the more staid term “Advanced Integer Indexing“, but I wanted to draw attention to a “Gotcha” that has bitten me a couple of times. With fancy indexing, you can either make a mask of Boolean values, typically using some kind of boolean operator:

>>> a = np.arange(10)
>>> evens_mask = a % 2 == 0
>>> odds_mask = a % 2 == 1
>>> print(a[evens_mask])
[0 2 4 6 8]

>>> print(a[odds_mask])
[1 3 5 7 9]

Or you can specify the indices you want, and this is the Gotcha, with tuples or lists, but the behavior is different either way. Let’s construct an example like one we use in class. We’ll make a 2-D array b and construct at positional fancy index that specifies elements in a diagonal. Notice that it’s a tuple, as shown by the (,) and each element is a list of coordinates in the array.

>>> b = np.arange(25).reshape(5, 5)
>>> print(b)
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]
>>> upper_diagonal = (
...     [0, 1, 2, 3],  # row indices
...     [1, 2, 3, 4],  # column indices
... )
>>> print(b[upper_diagonal])
[ 1  7 13 19]

In this case, the tuple has as many elements as there are dimensions, and each element is a list (or tuple, or array) of the indices to that dimension. So in the example above, the first element comes from b[0, 1], the second from b[1, 2] so on pair-wise through the lists of indices. The result is substantially different if you try to construct a fancy index from a list instead of a tuple:

>>> upper_diagonal_list = [
    [0, 1, 2, 3],
    [1, 2, 3, 4]
]
>>> b_with_a_list = b[upper_diagonal_list]
>>> print(b_with_a_list)
[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]
  [15 16 17 18 19]]

 [[ 5  6  7  8  9]
  [10 11 12 13 14]
  [15 16 17 18 19]
  [20 21 22 23 24]]]

What just happened?? In many places, lists and tuples have similar behaviors, but not here. What’s happening with the list version is different. This is in fact a form of broadcasting, where we’re repeating rows. Look at the shape of b_with_a_list:

>>> print(b_with_a_list.shape)
(2, 4, 5)

Notice that its dimension 0 has 2 elements, which is the same as the number of items in upper_diagoal_list. Notice the dimension 1 has 4 elements, corresponding to the size of each element in upper_diagoal_list. Then notice that dimension 2 matches the size of the rows of b, and hopefully it will be clear what’s happening. In upper_diagoal_list we’re constructing a new array by specifying the rows to use, so the first element of b_with_a_list (seen as the first block above) consist of rows 0, 1, 2, and 3 from b, and the second element is the rows from the second element of upper_diagonal_list. Let’s print it again with comments:

>>> print(b_with_a_list)
[[[ 0  1  2  3  4]   # b[0] \
  [ 5  6  7  8  9]   # b[1]  | indices are first element of
  [10 11 12 13 14]   # b[2]  | upper_diagonal_list
  [15 16 17 18 19]]  # b[3] /

 [[ 5  6  7  8  9]   # b[1] \
  [10 11 12 13 14]   # b[2]  | indices are second element of
  [15 16 17 18 19]   # b[3]  | upper_diagonal_list
  [20 21 22 23 24]]] # b[4] /

Forgetting this convention has bitten me more than once, so I hope this explanation helps you resolve some confusion if you should ever run into it.

Arrays and Lists

I often get questions about the difference between the Python list and the NumPy ndarray, and we talk about this a lot in Python Foundations for Scientists & Engineers. But recently I had a question about the array object that is part of the Python language, and why don’t we use it for computations? Why is NumPy necessary if Python has an array data type? Well, this is a great question!

First of all, let’s take a quick look at the Python array. It’s found in the array standard library and usually imported as arr:

>>> import array as arr

Like a NumPy ndarray, it has a data type, and every element of the array has to conform to that type. Set the data type in in the first argument when creating the array. 'l' in this case specifies a 4-byte signed integer:

>>> a = arr.array('l', range(5))
>>> a
array('l', [0, 1, 2, 3, 4])

It has the same indexing and slicing behavior as a Python list, and we even see that it has the same “multiplication” behavior as lists: i.e. “multiplying” means repetition, not element-wise, arithmetic multiplication as with the NumPy ndarray.

With that in mind, computations look the same for arrays as for lists:

>>> b = arr.array(a.typecode, (v + 1 for v in a))
>>> b
array('l', [1, 2, 3, 4, 5])

Let’s use IPython’s %timeit magic to see how array computations fare in comparison with Python lists and NumPy ndarrays. For the list and array, we’ve specified a list comprehensions as the most efficient way to do the computations and a simple vector computation for the NumPy ndarray, and our test is to increment sequence of 10 integers:

In [1]: import array as arr
   ...: import numpy as np
   ...:
   ...: %timeit [val + 1 for val in list(range(10))]
   ...: %timeit arr.array('q', [val + 1 for val in arr.array('q', range(10))])
   ...: %timeit np.arange(10) + 1
279 ns ± 45.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
825 ns ± 5.22 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
1.03 μs ± 98.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

It’s surprising to note that list comprehension is fastest, beating array computation by a factor of 3 or so, and beating ndarray computation by a factor of 3.7! Suspecting that a 10-integer sequence may not be representative, let’s bump it up to 1000:

In [2]: %timeit [val + 1 for val in list(range(1000))]
   ...: %timeit arr.array('q', [val + 1 for val in arr.array('q', range(1000))])
   ...: %timeit np.arange(1000) + 1
28.4 μs ± 3.47 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
82.4 μs ± 9.05 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
1.82 μs ± 175 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

With more numbers to chew on, NumPy clearly pulls ahead, suggesting the overhead of creating an ndarray may dominate computation time for small arrays, but the actual computation time is relatively small. The Python array is still relatively inefficient compared to the Python list. Let’s make some helper functions to compare how long it takes to increment each value in lists, arrays, and NumPy ndarrays accross a broad range of different sizes. Each function will accept a list, array, or ndarray and return the time (in ms) required to increment each value by 1.

In [3]: from datetime import datetime
   ...:
   ...: def inc_list_by_one(l):
   ...:     start = datetime.now()
   ...:     res = [val + 1 for val in l]
   ...:     return (datetime.now() - start).total_seconds() * 1000
   ...:
   ...: def inc_array_by_one(a):
   ...:     start = datetime.now()
   ...:     res = arr.array(a.typecode, [val + 1 for val in a])
   ...:     return (datetime.now() - start).total_seconds() * 1000
   ...:
   ...: def inc_numpy_array_by_one(arr):
   ...:     start = datetime.now()
   ...:     res = arr + 1
   ...:     return (datetime.now() - start).total_seconds() * 1000

Now let’s record the computation times for a range of array sizes and plot them using matplotlib and seaborn:

In [14]: import matplotlib.pyplot as plt
    ...: import seaborn as sns
    ...:
    ...: times = []
    ...: for size in range(1, 10_000_001, 200_000):
    ...:     times.append((
    ...:         size,
    ...:         inc_list_by_one(list(range(size))),
    ...:         inc_array_by_one(arr.array('q', range(size))),
    ...:         inc_numpy_array_by_one(np.arange(size)),
    ...:     ))
    ...: times_arr = np.array(times)
    ...:
    ...: sns.regplot(x=times_arr[:, 0], y=times_arr[:, 1], label='Python list')
    ...: sns.regplot(x=times_arr[:, 0], y=times_arr[:, 2], label='Python array')
    ...: sns.regplot(x=times_arr[:, 0], y=times_arr[:, -1], label='Numpy array')
    ...: plt.xlabel('Array Size (num elements)')
    ...: plt.ylabel('Time to Increment by 1 (ms)')
    ...: plt.legend()
    ...: plt.show()
Computation time v array size

…and now it’s clear that computation times for Python lists and arrays scale with the size of the object much faster than NumPy ndarrays do. And it’s also clear that the Python array is no improvement for computations, static typing notwithstanding. Let’s quantify:

In [26]: print(f'Python list {np.polyfit(times_arr[:, 0], times_arr[:, 1], 1)[0]:.2e} (ms/element)')
    ...: print(f'Python array {np.polyfit(times_arr[:, 0], times_arr[:, 2], 1)[0]:.2e} (ms/element)')
    ...: print(f'NumPy ndarray {np.polyfit(times_arr[:, 0], times_arr[:, -1], 1)[0]:.2e} (ms/element)')
Python list 4.02e-05 (ms/element)
Python array 6.50e-05 (ms/element)
NumPy ndarray 2.01e-06 (ms/element)

So if we take the slope of those curves as a gauge of performance, Python arrays are roughly 40% slower in computation than lists, and the NumPy ndarray computations run in about 95% less time, on average for arrays that are not trivially small. This is because NumPy takes advantage of knowing the data type of each element and setting up efficient strides in memory for computations, which can be optimized at the operating system and hardware levels.

What, then, is the advantage of the Python array over a Python list? With helper functions similar to the ones above, we can return sys.getsizeof() sample list and array objects over the same range of sizes. Plotting the results, we see something interesting:

Memory consumption v array size

Memory consumption for both Python arrays and NumPy ndarrays scale exactly linearly with the number of elements (the q typcode specifies a signed integer with a minimum 8 bytes, which corresponds to the default int64 dytpe for integer NumPy ndarrays.) But notice how the Python list memory consumption increases stepwise. This is an implementation detail of the Python list type that allows it to quickly add elements without shifting memory after each append operation. And this is likely the explanation for why lists can be faster and more efficient than arrays. When a new value is written to an array, it likely often occurs that the some or all of the object needs to be shuffled in memory to find contiguous locations, but a list is optimized for appending, so a certain number of additions can be made before any reshuffling happens.

To test this, we can compare the computation times for an array computed with a list comprehension against an array that is copied and overwritten:

def inc_array_by_one_with_copy(a):
    start = datetime.now()
    res = copy(a)
    for i, val in enumerate(a):
        res[i] = val + 1
    return (datetime.now() - start).total_seconds() * 1000

Plotted, we see that the copy-fill approach is marginally faster, and there is less variation, likely due to fewer memory relocations.

Computation time v array size for arrays created with list comprehension and copy-fill strategies

In conclusion, it helps to understand that the array type is really intended as a Python wrapper around a C array, and there are some functions and codes that use it to efficiently return a Python object. But as for computations, I hope I’ve convinced you that the NumPy ndarray is the right type for computations in terms of both memory and computation efficiency. As for Python arrays, now you know what they are, and you can safely ignore them in favor of the ndarray for your scientific computations.