
Last weekend found me elbow-deep in the guts of my car, re-aligning the timing chain after replacing a cam sprocket. As I reflected on the joys of working on a car with only 4 cylinders and a relatively spacious engine bay, I found myself reflecting on one of the things I love best about the Python programming language — that is the ability to proverbially “pop the hood” and see what’s going on behind the abstractions. (With a background in Mechanical Engineering, car metaphors come naturally to me.)
As an Open Source, well-documented, scripted language, Python is already accessible. But there are some tools that let you get pretty deeply into the inner workings in case you want to understand how things work or to optimize performance.
Use the Source!
The first and easiest way to see what’s going on is to look at the inline help using Python’s built-in help()
function, which displays the docstring using a pager. But I almost always prefer using the ? and ?? in IPython or Jupyter to display the just the docstring or all of the source code if available. For example consider the relatively simple parseaddr function from email.utils:
In [1]: import email
In [2]: email.utils.parseaddr?
Signature: parseaddr(addr, *, strict=True)
Docstring:
Parse addr into its constituent realname and email address parts.
Return a tuple of realname and email address, unless the parse fails, in
which case return a 2-tuple of ('', '').
If strict is True, use a strict parser which rejects malformed inputs.
File: /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/email/utils.py
Type: function
In our Python Foundations course, I can usually elicit some groans by encouraging my students to “Use the Source” with the ?? syntax, which displays the source code, if available:
In [3]: email.utils.parseaddr??
Signature: parseaddr(addr, *, strict=True)
Source:
def parseaddr(addr, *, strict=True):
"""
Parse addr into its constituent realname and email address parts.
Return a tuple of realname and email address, unless the parse fails, in
which case return a 2-tuple of ('', '').
If strict is True, use a strict parser which rejects malformed inputs.
"""
if not strict:
addrs = _AddressList(addr).addresslist
if not addrs:
return ('', '')
return addrs[0]
if isinstance(addr, list):
addr = addr[0]
if not isinstance(addr, str):
return ('', '')
addr = _pre_parse_validation([addr])[0]
addrs = _post_parse_validation(_AddressList(addr).addresslist)
if not addrs or len(addrs) > 1:
return ('', '')
return addrs[0]
File: /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/email/utils.py
Type: function
Looking at the next-to-last line, you see there’s a path to the source code. That’s available programmatically in the module‘s .__file__
attribute, so you could open and print the contents if you want. If we do that for Python’s this
module, we can expose a fun little Easter Egg.
In [4]: import this
# <output snipped - but try it for yourself and see what's there.>
In [5]: with open(this.__file__, 'r') as f:
...: print(f.read())
...:
s = """Gur Mra bs Clguba, ol Gvz Crgref
Ornhgvshy vf orggre guna htyl.
Rkcyvpvg vf orggre guna vzcyvpvg.
Fvzcyr vf orggre guna pbzcyrk.
Pbzcyrk vf orggre guna pbzcyvpngrq.
Syng vf orggre guna arfgrq.
Fcnefr vf orggre guna qrafr.
Ernqnovyvgl pbhagf.
Fcrpvny pnfrf nera'g fcrpvny rabhtu gb oernx gur ehyrf.
Nygubhtu cenpgvpnyvgl orngf chevgl.
Reebef fubhyq arire cnff fvyragyl.
Hayrff rkcyvpvgyl fvyraprq.
Va gur snpr bs nzovthvgl, ershfr gur grzcgngvba gb thrff.
Gurer fubhyq or bar-- naq cersrenoyl bayl bar --boivbhf jnl gb qb vg.
Nygubhtu gung jnl znl abg or boivbhf ng svefg hayrff lbh'er Qhgpu.
Abj vf orggre guna arire.
Nygubhtu arire vf bsgra orggre guna *evtug* abj.
Vs gur vzcyrzragngvba vf uneq gb rkcynva, vg'f n onq vqrn.
Vs gur vzcyrzragngvba vf rnfl gb rkcynva, vg znl or n tbbq vqrn.
Anzrfcnprf ner bar ubaxvat terng vqrn -- yrg'f qb zber bs gubfr!"""
d = {}
for c in (65, 97):
for i in range(26):
d[chr(i+c)] = chr((i+13) % 26 + c)
print("".join([d.get(c, c) for c in s]))
Another way to do this is to use the inspect
module from Python’s standard library. Among many other useful functions is getsource
which returns the source code:
In [6]: import inspect
In [7]: my_source_code_text = inspect.getsource(email.utils.parseaddr)
This works for libraries and functions that are written in Python, but there is a class of functions that are implemented in C (for the most popular version of Python, known as CPython) and called builtin
s. Source code is not available for those in the same way. The len
function is an example:
In [8]: len??
Signature: len(obj, /)
Docstring: Return the number of items in a container.
Type: builtin_function_or_method
For these functions, it takes a little more digging, but this is Open Source Software, so you can go to the Python source code on Github, and look in the module containing the builtins (called bltinmodule.c
). Each of the builtin functions is defined there with the prefix builtin_
, and the source code for len
is at line 1866 (at least in Feb 2025 when I wrote this):
static PyObject *
builtin_len(PyObject *module, PyObject *obj)
/*[clinic end generated code: output=fa7a270d314dfb6c input=bc55598da9e9c9b5]*/
{
Py_ssize_t res;
res = PyObject_Size(obj);
if (res < 0) {
assert(PyErr_Occurred());
return NULL;
}
return PyLong_FromSsize_t(res);
}
There you can see that most of the work is done by another function PyObject_Size()
, but you get the idea, and now you know where to look.
Step by Step
To watch the Python interpreter step through the code a line at a time and explore code execution, you can use the Python Debugger pdb
, or its tab-completed and syntax-colored cousin ipdb
. These allow you to interact with the code as it runs and execute arbitrary code in the context of any frame of execution, including printing out the value of variables. They are the basis for most of the Python debuggers built in to IDEs like Spyder, PyCharm, or VS Code. Since they are best demonstrated live, and since we walk through their use in our Software Engineering for Scientists & Engineers class, I’ll leave it at that.
Inside the Engine
Like Java and Ruby, Python runs in a virtual machine, commonly known as the “Interpreter” or “runtime”. So in contrast to compiling code in, say, C, where the result is an executable object file consisting of system- and machine-level instructions that can be run as an application by your operating system, when you execute a script in Python, your code gets turned into bytecode. Bytecode is a set of instructions for the Python virtual machine. It’s what we would write if we were truly writing for the computer (see my comments on why you still need to learn programming).
But while it’s written for the virtual machine, it’s not entirely opaque, and it can sometimes be instructive to take a look. In my car metaphor, this is a bit like removing the valve cover and checking the timing marks inside. Usually we don’t have to worry about it, but it can be interesting to see what’s going on there, as I learned when producing and answer for a Stack Overflow question.
In the example below, we make a simple function add
. The bytecode is visible in the add.__code__.co_code
attribute, and we can disassemble it using the dis
library and turn the bytecode into something slightly more friendly for human eyes:
In [9]: import dis
In [10]: def add(x, y):
...: return x + y
...:
In [11]: add.__code__.co_code
Out[11]: b'\x95\x00X\x01-\x00\x00\x00$\x00'
In [12]: dis.disassemble(add.__code__)
1 RESUME 0
2 LOAD_FAST_LOAD_FAST 1 (x, y)
BINARY_OP 0 (+)
RETURN_VALUE
In the output of disassemble
, the number in the first column is the line number in the source code. The middle column shows the bytecode instruction (see the docs for their meaning), and the right-hand side shows the arguments. For example in line 2, LOAD_FAST_LOAD_FAST
pushes references to x
and y
to the stack, and the next line BINARY_OP
executes the +
operation on them.
Incidentally, if you’ve ever noticed files with the .pyc
extension or folders called __pycache__
(which are full of .pyc
files) in your project directory, that’s where Python stores (or caches) bytecode when a module is imported so that next time, the import is faster.
In Conclusion
There’s obviously a lot more to say about bytecodes, the execution stack, the memory heap, etc. But my goal here is not so much to give a lesson in computer science as to give an appreciation for the accessibility of the Python language to curious users. Much as I think it’s valuable to be able to pop the hood on your car and point to the engine, the oil dipstick, the brake fluid reservoir, and the air filter, I believe it’s valuable to understand some of what’s going on “under the hood” of the Python code you may be using for data analysis or other kinds of scientific computing.