On the Usefulness of LLMs and Other Deep Learning Models

Lately I’ve been thinking a lot about the state of “AI” and its implications for us embodied “human intelligences”. Hardly a week (or even a day) goes by without some Silicon Valley titan proclaiming that “AI is smarter than humans” and arguing about whether this is good or bad for us as a species. “It’s a white-collar apocalypse”, “There will be all kinds of new jobs!”, “We are now confident we know how to build AGI as we have traditionally understood it.” What’s missing from these statements are clear definitions for terms like “smarter”and “intelligent”, and when they are provided, they conflict with what we already know. Consider Sam Altman’s definition of AGI:“AI systems that can perform most economically valuable work as well as or better than humans.” Or think about the well-respected Turing Test, which judges machine intelligence based on a human’s ability to distinguish the behavior of a machine from a human, based on specific intellectual tasks. That reduces human intelligence to competence at tasks that can be completed at a keyboard. I find the narrow scope of such definitions unsatisfying.

I recently returned from a mission trip to Guatemala, where I worked side-by-side with local masons, who mix concrete and plaster by hand and improvise solutions to deal with tricky build sites and keep homes dry in the rainy season. That was a humbling lesson in the limits of the kind of “intelligence” my PhD and digital skills afford me. Those guys are performing intelligent, economically valuable work. Then there are the nurses at the clinics I’ve visited recently whose reading of a patient’s physical and emotional state include levels of cultural and social nuance in addition to the complex medical conditions of the human body. In fact, scientists, engineers, technicians, nurses, farmers, and floral designers who solve problems all the time in environments full of uncertainty and human need are applying forms of intelligence and performing economically valuable work that no LLM can touch. These are embodied, culturally embedded, and morally aware practices—not lines of text on a screen.

“But wait,” you say, “LLMs like ChatGPT and Claude are amazing! Why are you being such a curmudgeon?” I agree. In fact, ChatGPT helped me draft this piece, and although I ended up throwing away most of what it wrote, its ability to do research and summary is excellent. It also pointed me to some resources faster than I would have found them on my own. So is ChatGPT “smarter” than me? I think the more interesting question is “When does ChatGPT, LLM or other AI, have an advantage over me?”

What started me down this path was a couple of articles I came across recently. Bruce Schneier and Will Anderson wrote at The Conversation about 4 axes, what they call “The 4 S’s”, of technology’s advantages over humans. The article is not long and worth a read; in short they point out that AI often has an advantage over humans when it comes to speed, scale, scope, and sophistication. When those are the barriers, it can make sense to implement AI. When they’re not, introduction of AI can feel gratuitous, or even downright annoying; witness auto-completion for text messages, or the many customer service chatbots. Schneier and Anderson point out that companies implement them seeking to benefit from scale, but customers don’t see benefits from speed or sophistication, and they suffer from the loss of human communication in terms of empathy, sincerity, context, and problem solving ability. But there are many contexts where AIs are able to surpass the performance of humans, such as when playing Chess or Go, analyzing protein folding structures, and identifying promising materials for engineering applications.

However, there are contexts and situations where the perception of speed up is actually illusory. In July 2025, the folks at Model Evaluation & Threat Research (METR) published a study of 16 experienced senior developers of large open source software projects in which they recorded and analyzed their activity as they resolved issues from the issue tracker on their project. The study controlled their use of the AI tool of their choice. The key finding was that the developers generally reported believing that AI had sped them up by 20% or more, when in fact it took them on average 19% longer to resolve the issues. They point out that often the benchmarks used to measure the productivity gains of AI coding tools don’t reflect the kinds of tasks found “in the wild” and thus aren’t helpful. Even self-reporting by experienced developers are not a reliable guide to productivity impacts. Also of interest is this white paper from GitClear on the decline in code quality with the use of AI coding tools.

Developers generally reported believing that AI had sped them up by 20% or more, when in fact it took them on average 19% longer to resolve issues from large, mature, open-source projects.

Furthermore, there are limits to the level of sophistication even “reasoning” models can attain. In a refreshingly honest piece from Apple, published in June 2025, the authors discuss the strengths and weaknesses of standard models (LLMs) and large reasoning models (LRMS) in performing tasks of varying complexity. They find a hard limit on the complexity of problems for which LLMs and LRMs are capable of finding solutions, even given arbitrarily more computing power.

The real danger of technology is not that it will become too intelligent and take over, but that it will become too convenient and seduce us into delegating the most human parts of our lives.

Andy Crouch, The Life We’re Looking For

So what’s my point in all of this? It’s surely not to reject the amazing tools available to us in the era of LLMs. It’s to recognize them as tools with strengths and weaknesses. And it’s also to remember something that Andy Crouch, an author whose commentary on the relationship of humans to technology I respect, talks about in his book The Life We’re Looking For, that superpowers often take something of our humanity when we assume them. When we step on an airplane to assume the superpower of crossing a continent in a matter of hours, we have to remain very still and give up exercise and mobility for the time it takes to travel. When by using our mobile phone we assume the superpower of navigating a city we’ve never been to before, we erode our human ability to find our way on our own (with consequences for cognitive decline, as it turns out, see this book and this article among others for nuance on the subject and what to do about it). And perhaps most relevant for this post, when you hand over the job of writing (code, or blog posts, or novels) to an LLM, you are eroding your ability to think about problems. As I’ve said before, learning to code is really learning to think about problems, and writing code is actively engaging with the problem in constructive ways.

This is why I founded Diller Digital, and why I still passionately believe in teaching coding skills. This principle guides the way we teach: starting with foundational principles and building up practical knowledge through examples and exercises with increasing independence. This is why by the end of a class, we are teaching you how to find out the answers to your questions for yourself using the knowledge framework we’ve developed together. We value human intelligence—not because it’s flawless, but because it’s rooted in judgment, context, and a lived understanding of the world. We believe machine learning is most powerful when it extends what humans can already do well. We build our courses to empower you to apply these tools responsibly, creatively, and critically.

Batching and Folding in Machine Learning – What’s the Difference?

In a recent session of Machine Learning for Scientists & Engineers, we were talking about the use of folds in cross-validation, and a student did one of my favorite things — he asked a perceptive question. “How is folding related to the concept of batching I’ve heard about for deep learning?” We had a good discussion about batching and folding in machine learning and what the differences and similarities are.

What is Machine Learning?

Terms like “AI” and “machine learning” have become nearly meaningless in casual conversation and advertising media—especially since the arrival of large language models like ChatGPT. At Diller Digital, we define AI (that is, “artificial intelligence”) as computerized decision-making, covering areas from robotics and computer vision to language processing and machine learning.

Machine learning refers to the development of predictive models that are configured, or trained, by exposure to sample data rather than by explicitly encoded interactions. For example, you can develop a classification model that sorts pictures into dogs and cats by showing it a lot of examples of photos of dogs and cats. (Sign up for the class to learn the details of how to do this.).

Or you can develop a regression model to predict the temperature at my house tomorrow by training the model on the last 10 years’ worth of measurements of temperature, pressure, humidity, etc. from my personal weather station.

Classical vs Deep Learning

Broadly speaking, there are two kinds of machine learning: what we at Diller Digital call classical machine learning and deep learning. Classical machine learning is characterized by relatively small data sets, and it requires a skilled modeler to do feature engineering to make the best use of the available (and limited) training data. This is the subject of our Machine Learning for Scientists & Engineers class. Deep Learning is a subset of machine learning that makes use of many-layered models that function in a rough analog to how the neurons in a human brain function. Training such models requires much more data but less manual feature engineering by the modeler. The skill in deep learning is that of configuring the architecture of the model, and that is the subject of our Deep Learning for Scientists & Engineers.

Parameters and Hyperparameters

There is one more pair of definitions we need to cover before we can talk about folding versus batching: parameters and hyperparameters.

At the heart of both kinds of machine learning is the adjustment of a model’s parameters, sometimes also called coefficients or weights. Simply stated, these are the coefficients of what boils down to a linear regression problem.

Each model also has what are called hyperparameters, or parameters that govern how the model behaves algorithmically. These might include things like how you score your model’s performance or what method you use to update the model weights.

The process of training a model is the process of adjusting the parameters until you get the best possible predictions from your model. For this reason, we typically divide our training data into two parts: one (the training data set) for adjusting the weights, the other (the testing data set) for assessing the performance of the model. It’s important to score your model on data that was not used in the training step because you’re testing its predictive power on things it hasn’t seen before.

What is Folding?

So this brings us finally to the subject of folding and batching. Folding typically arises in the context of cross-validation, when you’re trying to decide on the best hyperparameters to use for your model. That process involves fitting your model with different sets of hyperparameters and seeing which combination gives the best results. How can you do that without using your test data set? (If we used the test data set during training, that would be cheating because it would sacrifice the ability of your model to generalize for the short-term gain of a better result.) We divide our training data into folds and hold each fold back as a “mini-test” data set and train on the others. We successively hold each fold back and then average the scores across the folds. That becomes our cross-validation score and gives us a way to score that set of hyperparameters without dipping into the test data set.

Folds divide a training data set into sections, one of which is held out as a mini “test” section for scoring a combination of hyperparameters in cross-validation.

What is Batching?

Batching looks a lot like folding but is a distinct concept used in a different context. Batching arises in the context of training deep models, and it serves two purposes. First, training a deep learning model typically requires a lot of training data (orders of magnitude more data than classical methods), and except for trivial cases you can’t fit all the training data into working memory at the same time. You solve that problem by dividing the training data into batches in much the same way that you would divide it into folds for cross-validation, and then iteratively update the model parameters using each batch of data until you have used the entire training data set. One full pass through all of the batches is called an epoch. Training a deep learning model typically takes multiple epochs.

A training data set is divided into batches to reduce memory requirements and provide variation for model parameter refinement. Each batch is used once per training epoch.

Beyond considerations of working memory, there’s a second important reason to train a deep model on batches: because there are so many model parameters with so many possible configurations, and because of the way the layers of the model insulate some of the parameters from information in the test data set, it’s helpful that smaller batches are “noisier” and provide more variation for the training algorithm to use to adjust the model parameters. As a physical analogy, you might think of the way that shaking a pan while you poured sand into it would help it settle into a flat surface more quickly than just waiting for gravity to do the work for you, and without shaking you might end up with lumps and bumps.

So hopefully, by this point you can see how folding is similar to batching and how they are distinct concepts. They both similarly divide training data into segments. Folding is used in cross-validation for optimizing hyperparameters, and batching is used in training deep learning models to limit memory requirements and improve convergence for fitting model parameters.

Diller Digital offers Machine Learning for Scientists & Engineers and Deep Learning for Scientists & Engineers at least once per quarter. Sign up to join us, and bring your curiosity, questions, and toughest problems and see what you can learn! Maybe you’ll join the chorus of those who leave glowing feedback.

You Still Need to Learn to Write Code in the Age of LLMs

Can we really delegate most or all of our coding tasks to LLMs? Should we tell our kids not to study computer science? What are the reasons we should still learn to write code in the age of LLMs?

There is a chorus of voices telling us that soon we will be able to hand all of our coding tasks over to an AI agent powered by an LLM. It will do all the tedious boring things for us, and we can focus on the important stuff. While the new generative AIs are amazing in their capabilities, I for one think we shouldn’t be so quick to dismiss the value of learning to code. Granted, I make my living teaching people to write software, so maybe I should call this “Why I don’t quit my job and tell everyone to use ChatGPT to write their code”, because I believe that learning to code is still necessary and good.

In early 2024 the founder and CEO of NVIDIA Jensen Huang participated in a discussion at the World Governments Forum that inspired countless blog posts and videos with titles like “Jensen Huang says kids shouldn’t learn to code!”. What he actually said is a bit different but the message is essentially the same [click here to watch for yourself, it’s the last 4 minutes or so of the interview]: “It’s our job to make computing technology such that nobody has to program … and that the programming language is human. Everybody in the world is now a programmer…

Photo of Jensen Huang speaking at the 2024 World Governments Forum.

He suggests that instead of learning to code, he says we should focus on the Life Sciences because that’s the richest domain for discovery, and he thinks that developing nations that want to rise on the world stage should encourage their children to do the same. Oh, and he says we should build lots of infrastructure (with lots of NVIDIA chips of course).

There is a core part of his message I actually agree with. At Diller Digital, and at Enthought where our roots are, we have always believed it’s easier to add programming skills to a scientist, engineer, or other domain expert than it is to train a computer scientist in one of the hard sciences. That’s why if you’ve taken one of our courses, you’ve no doubt seen the graphic below, touting the scientific credentials of the development staff. And for that reason, I agree that becoming an expert in one of the natural sciences or engineering discipline is personally and socially valuable.

Image from the About Enthought slide in Enthought Academy course material showing that 85% of the developers have advanced degrees, and 70% hold a PhD.

At Enthought, almost no one was formally trained as a developer. Most of us (including me) studied some other domain (in my case it was Mechanical Engineering, the Thermal and Fluid Sciences, and Combustion in particular) but fell in love with writing software. And although there is a special place in my heart for the BASIC I learned on my brother’s TI/99 4A, or Pascal for the Macintosh 512k my Dad brought home to use for work, or C, which I self taught in High School and college, it was really Python that let me do useful stuff in engineering. Python has become a leading language for scientific computing, and a rich ecosystem has developed around the SciPy and NumPy packages and the SciPy conference. One of the main reasons is that it is pragmatic and easy to learn, and it is expressive for scientific computing.

And that brings me to my first beef with Huang’s message. While the idea of using “human language”, by which I believe he means “human language that we use to communicate with other humans” otherwise known as natural language, to write software has some appeal, it ignores the fact that we already use human language to program computers. If we were writing software using computer language, we’d be writing with 1s and 0s or hexadecimal codes. Although there are still corners of the world where specialists do that, it hasn’t been mainstream practice since the days of punch cards.

Image of human hands holding a stack of punch cards.  Image originally appears on IBM's web page describing the history of the punch card.

Modern computer languages like Python are designed to be expressive in the domain space, and they allow you to write code that is clear and unambiguous. For example, do you have any doubts about what is happening in this code snippet borrowed from the section on Naming Variables in our Software Engineering for Scientists & Engineers?

gold_watch_orders = 0
for employee in employee_list:
    gold_watch_orders += will_retire(employee.name)

Even a complete newcomer could see that we’re checking to see who’s about to retire, and we’re preparing to order gold watches. It is clearly for human consumption, but there are also decisions about data types and data structures that had to be made. The act of writing the code causes you to think about your problem more clearly. The language supports a way of thinking about the problem. When you give up learning the language, you inevitably give up learning a particular way of thinking.

This brings me to my second beef with the idea that we don’t need to learn programming. Using what Huang calls a “human language” in fact devolves pretty quickly into an exercise called “prompt engineering”, where the new skill is knowing how to precisely specify your goal to a generative model using a language that is not really designed for that. You end up needing to work through another layer of abstraction that doesn’t necessarily help. Or that is useful right up to the point where it isn’t, and then you’re stuck.

I often point my students to an article by Joel Spolsky called “The Law of Leaky Abstractions“, in which the author talks about “what computer scientists like to call an abstraction: a simplification of something much more complicated that is going on under the covers.” His point is that abstractions are useful and allow us to all sorts of amazing things, like send messages across the internet, or to our point, use a chat agent to write code. His central premise is there is no perfect abstraction.

All non-trivial abstractions, to some degree, are leaky.

Joel Spolsky

By that, he means that eventually the abstraction fails, and you are required to understand what’s going on beneath the abstraction to solve some tricky problem that eventually emerges. By the time he wrote the article in 2002, there was already a long history of code generation tools attempting to abstract away the complexity of getting a computer to do the thing you want to do. But inevitably, the abstraction fails, and to move forward you have to understand what’s going on behind the abstraction.

… the abstractions save us time working, but they don’t save us time learning.
And all this means that paradoxically, even as we have higher and higher level programming tools with better and better abstractions, becoming a proficient programmer is getting harder and harder.

Joel Spolsky

For example, I’m grateful for the WYSIWYG editor WordPress provides for producing this blog post, but without understanding the underlying HTML tags it produces and the CSS it relies on, I’d be frustrated by some of the formatting problems I’ve had to solve. The WYSIWYG abstraction leaks, so I learn how HTML works and how to find the offending CSS class, and it makes solving the image alignment problem much much easier.

But it’s not only the utility of the tool. There’s a cognitive benefit to learning to code. In my life as a consultant for Enthought, and especially during my tenure as a Director of Digital Transformation Services, I would frequently recommend that Managers, and even sometimes Directors, take our Python Foundations for Scientists & Engineers, not because they needed to learn to code, but because they needed to learn how to think about what software can and can’t do. And with Diller Digital, the story is the same. Especially in the Machine Learning and Deep Learning classes, managers join because they want to know how it works, what’s hype and what’s real, and they want to know how to think about the class of problems those technologies address. People are learning to code as a way of learning how to think about problems.

The best advice I’ve heard says this:

Learn to code manually first, then use a tool to save time.

I’ll say in summary, the best reason to learn to code, especially for the scientist, engineers, and analysts who take our classes, is that you are learning how to solve problems in a clear, unambiguous way. And even more so, you learn how to think about a problem, what’s possible, and what’s realistic. Don’t give that up. See also this article by Nathan Anacone.

What do you think? Let me know in the comments.