NumPy Indexing — the lists and tuples Gotcha

In a recent session of Python Foundations for Scientists & Engineers, a question came up about indexing a NumPy ndarray. Beyond getting and setting single values, NumPy enables some powerful efficiencies through slicing, which produces views of an array’s data without copying, and fancy indexing, which allows use of more-complex expressions to extract portions of arrays. We have written on the efficiency of array operations, and the details of slicing are pretty well covered, from the NumPy docs on slicing, to this chapter of “Beautiful Code” by the original author of NumPy, Travis Oliphant.

Slicing is pretty cool because it allows fast efficient computations of things like finite difference, for say, computing numerical derivatives. Recall that the derivative of a function describes the change in one variable with respect to another:

\frac{dy}{dx}

And in numerical computations, we can use a discrete approximation:

\frac{dy}{dx} \approx \frac{\Delta x}{\Delta y}

And to find the derivative at any particular location i, you compute the ratio of differences:

\frac{\Delta x}{\Delta y}\big|_i = \frac{x_{i+1} - x_{i}}{y_{i+1} - y{i}}

NumPy allows you to use slicing to avoid setting up costly-for-Python for: loops by specifying start, stop, and step values in the array indices. This lets you subtracting all of the i indices from the i+1 indices at the same time by specifying one slice that starts at element 1 and goes to the end (the i+1 indices), and another that starts at 0 and goes up to but not including the last element. No copies are made during the slicing operations. I use examples like this to show how you can get 2 and sometimes 3 or more orders of magnitude speedups of the same operation with for loops.

>>> import numpy as np

>>> x = np.linspace(-np.pi, np.pi, 101)
>>> y = np.sin(x)

>>> dy_dx = (
...     (y[1:] - y[:-1]) /
...     (x[1:] - x[:-1])
... )
>>> np.sum(dy_dx - np.cos(x[:-1] + (x[1]-x[0]) / 2))  # compare to cos(x)
np.float64(-6.245004513516506e-16)  # This is pretty close to 0

Fancy indexing is also well documented (but the NumPy docs now use the more staid term “Advanced Integer Indexing“, but I wanted to draw attention to a “Gotcha” that has bitten me a couple of times. With fancy indexing, you can either make a mask of Boolean values, typically using some kind of boolean operator:

>>> a = np.arange(10)
>>> evens_mask = a % 2 == 0
>>> odds_mask = a % 2 == 1
>>> print(a[evens_mask])
[0 2 4 6 8]

>>> print(a[odds_mask])
[1 3 5 7 9]

Or you can specify the indices you want, and this is the Gotcha, with tuples or lists, but the behavior is different either way. Let’s construct an example like one we use in class. We’ll make a 2-D array b and construct at positional fancy index that specifies elements in a diagonal. Notice that it’s a tuple, as shown by the (,) and each element is a list of coordinates in the array.

>>> b = np.arange(25).reshape(5, 5)
>>> print(b)
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]
>>> upper_diagonal = (
...     [0, 1, 2, 3],  # row indices
...     [1, 2, 3, 4],  # column indices
... )
>>> print(b[upper_diagonal])
[ 1  7 13 19]

In this case, the tuple has as many elements as there are dimensions, and each element is a list (or tuple, or array) of the indices to that dimension. So in the example above, the first element comes from b[0, 1], the second from b[1, 2] so on pair-wise through the lists of indices. The result is substantially different if you try to construct a fancy index from a list instead of a tuple:

>>> upper_diagonal_list = [
    [0, 1, 2, 3],
    [1, 2, 3, 4]
]
>>> b_with_a_list = b[upper_diagonal_list]
>>> print(b_with_a_list)
[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]
  [15 16 17 18 19]]

 [[ 5  6  7  8  9]
  [10 11 12 13 14]
  [15 16 17 18 19]
  [20 21 22 23 24]]]

What just happened?? In many places, lists and tuples have similar behaviors, but not here. What’s happening with the list version is different. This is in fact a form of broadcasting, where we’re repeating rows. Look at the shape of b_with_a_list:

>>> print(b_with_a_list.shape)
(2, 4, 5)

Notice that its dimension 0 has 2 elements, which is the same as the number of items in upper_diagoal_list. Notice the dimension 1 has 4 elements, corresponding to the size of each element in upper_diagoal_list. Then notice that dimension 2 matches the size of the rows of b, and hopefully it will be clear what’s happening. In upper_diagoal_list we’re constructing a new array by specifying the rows to use, so the first element of b_with_a_list (seen as the first block above) consist of rows 0, 1, 2, and 3 from b, and the second element is the rows from the second element of upper_diagonal_list. Let’s print it again with comments:

>>> print(b_with_a_list)
[[[ 0  1  2  3  4]   # b[0] \
  [ 5  6  7  8  9]   # b[1]  | indices are first element of
  [10 11 12 13 14]   # b[2]  | upper_diagonal_list
  [15 16 17 18 19]]  # b[3] /

 [[ 5  6  7  8  9]   # b[1] \
  [10 11 12 13 14]   # b[2]  | indices are second element of
  [15 16 17 18 19]   # b[3]  | upper_diagonal_list
  [20 21 22 23 24]]] # b[4] /

Forgetting this convention has bitten me more than once, so I hope this explanation helps you resolve some confusion if you should ever run into it.

Batching and Folding in Machine Learning – What’s the Difference?

In a recent session of Machine Learning for Scientists & Engineers, we were talking about the use of folds in cross-validation, and a student did one of my favorite things — he asked a perceptive question. “How is folding related to the concept of batching I’ve heard about for deep learning?” We had a good discussion about batching and folding in machine learning and what the differences and similarities are.

What is Machine Learning?

Terms like “AI” and “machine learning” have become nearly meaningless in casual conversation and advertising media—especially since the arrival of large language models like ChatGPT. At Diller Digital, we define AI (that is, “artificial intelligence”) as computerized decision-making, covering areas from robotics and computer vision to language processing and machine learning.

Machine learning refers to the development of predictive models that are configured, or trained, by exposure to sample data rather than by explicitly encoded interactions. For example, you can develop a classification model that sorts pictures into dogs and cats by showing it a lot of examples of photos of dogs and cats. (Sign up for the class to learn the details of how to do this.).

Or you can develop a regression model to predict the temperature at my house tomorrow by training the model on the last 10 years’ worth of measurements of temperature, pressure, humidity, etc. from my personal weather station.

Classical vs Deep Learning

Broadly speaking, there are two kinds of machine learning: what we at Diller Digital call classical machine learning and deep learning. Classical machine learning is characterized by relatively small data sets, and it requires a skilled modeler to do feature engineering to make the best use of the available (and limited) training data. This is the subject of our Machine Learning for Scientists & Engineers class. Deep Learning is a subset of machine learning that makes use of many-layered models that function in a rough analog to how the neurons in a human brain function. Training such models requires much more data but less manual feature engineering by the modeler. The skill in deep learning is that of configuring the architecture of the model, and that is the subject of our Deep Learning for Scientists & Engineers.

Parameters and Hyperparameters

There is one more pair of definitions we need to cover before we can talk about folding versus batching: parameters and hyperparameters.

At the heart of both kinds of machine learning is the adjustment of a model’s parameters, sometimes also called coefficients or weights. Simply stated, these are the coefficients of what boils down to a linear regression problem.

Each model also has what are called hyperparameters, or parameters that govern how the model behaves algorithmically. These might include things like how you score your model’s performance or what method you use to update the model weights.

The process of training a model is the process of adjusting the parameters until you get the best possible predictions from your model. For this reason, we typically divide our training data into two parts: one (the training data set) for adjusting the weights, the other (the testing data set) for assessing the performance of the model. It’s important to score your model on data that was not used in the training step because you’re testing its predictive power on things it hasn’t seen before.

What is Folding?

So this brings us finally to the subject of folding and batching. Folding typically arises in the context of cross-validation, when you’re trying to decide on the best hyperparameters to use for your model. That process involves fitting your model with different sets of hyperparameters and seeing which combination gives the best results. How can you do that without using your test data set? (If we used the test data set during training, that would be cheating because it would sacrifice the ability of your model to generalize for the short-term gain of a better result.) We divide our training data into folds and hold each fold back as a “mini-test” data set and train on the others. We successively hold each fold back and then average the scores across the folds. That becomes our cross-validation score and gives us a way to score that set of hyperparameters without dipping into the test data set.

Folds divide a training data set into sections, one of which is held out as a mini “test” section for scoring a combination of hyperparameters in cross-validation.

What is Batching?

Batching looks a lot like folding but is a distinct concept used in a different context. Batching arises in the context of training deep models, and it serves two purposes. First, training a deep learning model typically requires a lot of training data (orders of magnitude more data than classical methods), and except for trivial cases you can’t fit all the training data into working memory at the same time. You solve that problem by dividing the training data into batches in much the same way that you would divide it into folds for cross-validation, and then iteratively update the model parameters using each batch of data until you have used the entire training data set. One full pass through all of the batches is called an epoch. Training a deep learning model typically takes multiple epochs.

A training data set is divided into batches to reduce memory requirements and provide variation for model parameter refinement. Each batch is used once per training epoch.

Beyond considerations of working memory, there’s a second important reason to train a deep model on batches: because there are so many model parameters with so many possible configurations, and because of the way the layers of the model insulate some of the parameters from information in the test data set, it’s helpful that smaller batches are “noisier” and provide more variation for the training algorithm to use to adjust the model parameters. As a physical analogy, you might think of the way that shaking a pan while you poured sand into it would help it settle into a flat surface more quickly than just waiting for gravity to do the work for you, and without shaking you might end up with lumps and bumps.

So hopefully, by this point you can see how folding is similar to batching and how they are distinct concepts. They both similarly divide training data into segments. Folding is used in cross-validation for optimizing hyperparameters, and batching is used in training deep learning models to limit memory requirements and improve convergence for fitting model parameters.

Diller Digital offers Machine Learning for Scientists & Engineers and Deep Learning for Scientists & Engineers at least once per quarter. Sign up to join us, and bring your curiosity, questions, and toughest problems and see what you can learn! Maybe you’ll join the chorus of those who leave glowing feedback.