“Popping the Hood” in Python

One man holds the hood of a car open while he and his friend look at the engine together.

Last weekend found me elbow-deep in the guts of my car, re-aligning the timing chain after replacing a cam sprocket. As I reflected on the joys of working on a car with only 4 cylinders and a relatively spacious engine bay, I found myself reflecting on one of the things I love best about the Python programming language — that is the ability to proverbially “pop the hood” and see what’s going on behind the abstractions. (With a background in Mechanical Engineering, car metaphors come naturally to me.)

As an Open Source, well-documented, scripted language, Python is already accessible. But there are some tools that let you get pretty deeply into the inner workings in case you want to understand how things work or to optimize performance.

Use the Source!

The first and easiest way to see what’s going on is to look at the inline help using Python’s built-in help() function, which displays the docstring using a pager. But I almost always prefer using the ? and ?? in IPython or Jupyter to display the just the docstring or all of the source code if available. For example consider the relatively simple parseaddr function from email.utils:

In [1]: import email

In [2]: email.utils.parseaddr?
Signature: parseaddr(addr, *, strict=True)
Docstring:
Parse addr into its constituent realname and email address parts.

Return a tuple of realname and email address, unless the parse fails, in
which case return a 2-tuple of ('', '').

If strict is True, use a strict parser which rejects malformed inputs.
File:      /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/email/utils.py
Type:      function

In our Python Foundations course, I can usually elicit some groans by encouraging my students to “Use the Source” with the ?? syntax, which displays the source code, if available:

In [3]: email.utils.parseaddr??
Signature: parseaddr(addr, *, strict=True)
Source:   
def parseaddr(addr, *, strict=True):
    """
    Parse addr into its constituent realname and email address parts.

    Return a tuple of realname and email address, unless the parse fails, in
    which case return a 2-tuple of ('', '').

    If strict is True, use a strict parser which rejects malformed inputs.
    """
    if not strict:
        addrs = _AddressList(addr).addresslist
        if not addrs:
            return ('', '')
        return addrs[0]

    if isinstance(addr, list):
        addr = addr[0]

    if not isinstance(addr, str):
        return ('', '')

    addr = _pre_parse_validation([addr])[0]
    addrs = _post_parse_validation(_AddressList(addr).addresslist)

    if not addrs or len(addrs) > 1:
        return ('', '')

    return addrs[0]
File:      /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/email/utils.py
Type:      function

Looking at the next-to-last line, you see there’s a path to the source code. That’s available programmatically in the module‘s .__file__ attribute, so you could open and print the contents if you want. If we do that for Python’s this module, we can expose a fun little Easter Egg.

In [4]: import this
# <output snipped - but try it for yourself and see what's there.>

In [5]: with open(this.__file__, 'r') as f:
   ...:     print(f.read())
   ...: 
s = """Gur Mra bs Clguba, ol Gvz Crgref

Ornhgvshy vf orggre guna htyl.
Rkcyvpvg vf orggre guna vzcyvpvg.
Fvzcyr vf orggre guna pbzcyrk.
Pbzcyrk vf orggre guna pbzcyvpngrq.
Syng vf orggre guna arfgrq.
Fcnefr vf orggre guna qrafr.
Ernqnovyvgl pbhagf.
Fcrpvny pnfrf nera'g fcrpvny rabhtu gb oernx gur ehyrf.
Nygubhtu cenpgvpnyvgl orngf chevgl.
Reebef fubhyq arire cnff fvyragyl.
Hayrff rkcyvpvgyl fvyraprq.
Va gur snpr bs nzovthvgl, ershfr gur grzcgngvba gb thrff.
Gurer fubhyq or bar-- naq cersrenoyl bayl bar --boivbhf jnl gb qb vg.
Nygubhtu gung jnl znl abg or boivbhf ng svefg hayrff lbh'er Qhgpu.
Abj vf orggre guna arire.
Nygubhtu arire vf bsgra orggre guna *evtug* abj.
Vs gur vzcyrzragngvba vf uneq gb rkcynva, vg'f n onq vqrn.
Vs gur vzcyrzragngvba vf rnfl gb rkcynva, vg znl or n tbbq vqrn.
Anzrfcnprf ner bar ubaxvat terng vqrn -- yrg'f qb zber bs gubfr!"""

d = {}
for c in (65, 97):
    for i in range(26):
        d[chr(i+c)] = chr((i+13) % 26 + c)

print("".join([d.get(c, c) for c in s]))

Another way to do this is to use the inspect module from Python’s standard library. Among many other useful functions is getsource which returns the source code:

In [6]: import inspect
In [7]: my_source_code_text = inspect.getsource(email.utils.parseaddr)

This works for libraries and functions that are written in Python, but there is a class of functions that are implemented in C (for the most popular version of Python, known as CPython) and called builtins. Source code is not available for those in the same way. The len function is an example:

In [8]: len??
Signature: len(obj, /)
Docstring: Return the number of items in a container.
Type:      builtin_function_or_method

For these functions, it takes a little more digging, but this is Open Source Software, so you can go to the Python source code on Github, and look in the module containing the builtins (called bltinmodule.c). Each of the builtin functions is defined there with the prefix builtin_, and the source code for len is at line 1866 (at least in Feb 2025 when I wrote this):

static PyObject *
builtin_len(PyObject *module, PyObject *obj)
/*[clinic end generated code: output=fa7a270d314dfb6c input=bc55598da9e9c9b5]*/
{
    Py_ssize_t res;

    res = PyObject_Size(obj);
    if (res < 0) {
        assert(PyErr_Occurred());
        return NULL;
    }
    return PyLong_FromSsize_t(res);
}

There you can see that most of the work is done by another function PyObject_Size(), but you get the idea, and now you know where to look.

Step by Step

To watch the Python interpreter step through the code a line at a time and explore code execution, you can use the Python Debugger pdb, or its tab-completed and syntax-colored cousin ipdb. These allow you to interact with the code as it runs and execute arbitrary code in the context of any frame of execution, including printing out the value of variables. They are the basis for most of the Python debuggers built in to IDEs like Spyder, PyCharm, or VS Code. Since they are best demonstrated live, and since we walk through their use in our Software Engineering for Scientists & Engineers class, I’ll leave it at that.

Inside the Engine

Like Java and Ruby, Python runs in a virtual machine, commonly known as the “Interpreter” or “runtime”. So in contrast to compiling code in, say, C, where the result is an executable object file consisting of system- and machine-level instructions that can be run as an application by your operating system, when you execute a script in Python, your code gets turned into bytecode. Bytecode is a set of instructions for the Python virtual machine. It’s what we would write if we were truly writing for the computer (see my comments on why you still need to learn programming).

But while it’s written for the virtual machine, it’s not entirely opaque, and it can sometimes be instructive to take a look. In my car metaphor, this is a bit like removing the valve cover and checking the timing marks inside. Usually we don’t have to worry about it, but it can be interesting to see what’s going on there, as I learned when producing and answer for a Stack Overflow question.

In the example below, we make a simple function add. The bytecode is visible in the add.__code__.co_code attribute, and we can disassemble it using the dis library and turn the bytecode into something slightly more friendly for human eyes:

In [9]: import dis
In [10]: def add(x, y):
    ...:     return x + y
    ...: 
In [11]: add.__code__.co_code
Out[11]: b'\x95\x00X\x01-\x00\x00\x00$\x00'
In [12]: dis.disassemble(add.__code__)
  1           RESUME                   0

  2           LOAD_FAST_LOAD_FAST      1 (x, y)
              BINARY_OP                0 (+)
              RETURN_VALUE

In the output of disassemble, the number in the first column is the line number in the source code. The middle column shows the bytecode instruction (see the docs for their meaning), and the right-hand side shows the arguments. For example in line 2, LOAD_FAST_LOAD_FAST pushes references to x and y to the stack, and the next line BINARY_OP executes the + operation on them.

Incidentally, if you’ve ever noticed files with the .pyc extension or folders called __pycache__ (which are full of .pyc files) in your project directory, that’s where Python stores (or caches) bytecode when a module is imported so that next time, the import is faster.

In Conclusion

There’s obviously a lot more to say about bytecodes, the execution stack, the memory heap, etc. But my goal here is not so much to give a lesson in computer science as to give an appreciation for the accessibility of the Python language to curious users. Much as I think it’s valuable to be able to pop the hood on your car and point to the engine, the oil dipstick, the brake fluid reservoir, and the air filter, I believe it’s valuable to understand some of what’s going on “under the hood” of the Python code you may be using for data analysis or other kinds of scientific computing.

Arrays and Lists

I often get questions about the difference between the Python list and the NumPy ndarray, and we talk about this a lot in Python Foundations for Scientists & Engineers. But recently I had a question about the array object that is part of the Python language, and why don’t we use it for computations? Why is NumPy necessary if Python has an array data type? Well, this is a great question!

First of all, let’s take a quick look at the Python array. It’s found in the array standard library and usually imported as arr:

>>> import array as arr

Like a NumPy ndarray, it has a data type, and every element of the array has to conform to that type. Set the data type in in the first argument when creating the array. 'l' in this case specifies a 4-byte signed integer:

>>> a = arr.array('l', range(5))
>>> a
array('l', [0, 1, 2, 3, 4])

It has the same indexing and slicing behavior as a Python list, and we even see that it has the same “multiplication” behavior as lists: i.e. “multiplying” means repetition, not element-wise, arithmetic multiplication as with the NumPy ndarray.

With that in mind, computations look the same for arrays as for lists:

>>> b = arr.array(a.typecode, (v + 1 for v in a))
>>> b
array('l', [1, 2, 3, 4, 5])

Let’s use IPython’s %timeit magic to see how array computations fare in comparison with Python lists and NumPy ndarrays. For the list and array, we’ve specified a list comprehensions as the most efficient way to do the computations and a simple vector computation for the NumPy ndarray, and our test is to increment sequence of 10 integers:

In [1]: import array as arr
   ...: import numpy as np
   ...:
   ...: %timeit [val + 1 for val in list(range(10))]
   ...: %timeit arr.array('q', [val + 1 for val in arr.array('q', range(10))])
   ...: %timeit np.arange(10) + 1
279 ns ± 45.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
825 ns ± 5.22 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
1.03 μs ± 98.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

It’s surprising to note that list comprehension is fastest, beating array computation by a factor of 3 or so, and beating ndarray computation by a factor of 3.7! Suspecting that a 10-integer sequence may not be representative, let’s bump it up to 1000:

In [2]: %timeit [val + 1 for val in list(range(1000))]
   ...: %timeit arr.array('q', [val + 1 for val in arr.array('q', range(1000))])
   ...: %timeit np.arange(1000) + 1
28.4 μs ± 3.47 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
82.4 μs ± 9.05 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
1.82 μs ± 175 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

With more numbers to chew on, NumPy clearly pulls ahead, suggesting the overhead of creating an ndarray may dominate computation time for small arrays, but the actual computation time is relatively small. The Python array is still relatively inefficient compared to the Python list. Let’s make some helper functions to compare how long it takes to increment each value in lists, arrays, and NumPy ndarrays accross a broad range of different sizes. Each function will accept a list, array, or ndarray and return the time (in ms) required to increment each value by 1.

In [3]: from datetime import datetime
   ...:
   ...: def inc_list_by_one(l):
   ...:     start = datetime.now()
   ...:     res = [val + 1 for val in l]
   ...:     return (datetime.now() - start).total_seconds() * 1000
   ...:
   ...: def inc_array_by_one(a):
   ...:     start = datetime.now()
   ...:     res = arr.array(a.typecode, [val + 1 for val in a])
   ...:     return (datetime.now() - start).total_seconds() * 1000
   ...:
   ...: def inc_numpy_array_by_one(arr):
   ...:     start = datetime.now()
   ...:     res = arr + 1
   ...:     return (datetime.now() - start).total_seconds() * 1000

Now let’s record the computation times for a range of array sizes and plot them using matplotlib and seaborn:

In [14]: import matplotlib.pyplot as plt
    ...: import seaborn as sns
    ...:
    ...: times = []
    ...: for size in range(1, 10_000_001, 200_000):
    ...:     times.append((
    ...:         size,
    ...:         inc_list_by_one(list(range(size))),
    ...:         inc_array_by_one(arr.array('q', range(size))),
    ...:         inc_numpy_array_by_one(np.arange(size)),
    ...:     ))
    ...: times_arr = np.array(times)
    ...:
    ...: sns.regplot(x=times_arr[:, 0], y=times_arr[:, 1], label='Python list')
    ...: sns.regplot(x=times_arr[:, 0], y=times_arr[:, 2], label='Python array')
    ...: sns.regplot(x=times_arr[:, 0], y=times_arr[:, -1], label='Numpy array')
    ...: plt.xlabel('Array Size (num elements)')
    ...: plt.ylabel('Time to Increment by 1 (ms)')
    ...: plt.legend()
    ...: plt.show()
Computation time v array size

…and now it’s clear that computation times for Python lists and arrays scale with the size of the object much faster than NumPy ndarrays do. And it’s also clear that the Python array is no improvement for computations, static typing notwithstanding. Let’s quantify:

In [26]: print(f'Python list {np.polyfit(times_arr[:, 0], times_arr[:, 1], 1)[0]:.2e} (ms/element)')
    ...: print(f'Python array {np.polyfit(times_arr[:, 0], times_arr[:, 2], 1)[0]:.2e} (ms/element)')
    ...: print(f'NumPy ndarray {np.polyfit(times_arr[:, 0], times_arr[:, -1], 1)[0]:.2e} (ms/element)')
Python list 4.02e-05 (ms/element)
Python array 6.50e-05 (ms/element)
NumPy ndarray 2.01e-06 (ms/element)

So if we take the slope of those curves as a gauge of performance, Python arrays are roughly 40% slower in computation than lists, and the NumPy ndarray computations run in about 95% less time, on average for arrays that are not trivially small. This is because NumPy takes advantage of knowing the data type of each element and setting up efficient strides in memory for computations, which can be optimized at the operating system and hardware levels.

What, then, is the advantage of the Python array over a Python list? With helper functions similar to the ones above, we can return sys.getsizeof() sample list and array objects over the same range of sizes. Plotting the results, we see something interesting:

Memory consumption v array size

Memory consumption for both Python arrays and NumPy ndarrays scale exactly linearly with the number of elements (the q typcode specifies a signed integer with a minimum 8 bytes, which corresponds to the default int64 dytpe for integer NumPy ndarrays.) But notice how the Python list memory consumption increases stepwise. This is an implementation detail of the Python list type that allows it to quickly add elements without shifting memory after each append operation. And this is likely the explanation for why lists can be faster and more efficient than arrays. When a new value is written to an array, it likely often occurs that the some or all of the object needs to be shuffled in memory to find contiguous locations, but a list is optimized for appending, so a certain number of additions can be made before any reshuffling happens.

To test this, we can compare the computation times for an array computed with a list comprehension against an array that is copied and overwritten:

def inc_array_by_one_with_copy(a):
    start = datetime.now()
    res = copy(a)
    for i, val in enumerate(a):
        res[i] = val + 1
    return (datetime.now() - start).total_seconds() * 1000

Plotted, we see that the copy-fill approach is marginally faster, and there is less variation, likely due to fewer memory relocations.

Computation time v array size for arrays created with list comprehension and copy-fill strategies

In conclusion, it helps to understand that the array type is really intended as a Python wrapper around a C array, and there are some functions and codes that use it to efficiently return a Python object. But as for computations, I hope I’ve convinced you that the NumPy ndarray is the right type for computations in terms of both memory and computation efficiency. As for Python arrays, now you know what they are, and you can safely ignore them in favor of the ndarray for your scientific computations.