Batching and Folding in Machine Learning – What’s the Difference?

In a recent session of Machine Learning for Scientists & Engineers, we were talking about the use of folds in cross-validation, and a student did one of my favorite things — he asked a perceptive question. “How is folding related to the concept of batching I’ve heard about for deep learning?” We had a good discussion about batching and folding in machine learning and what the differences and similarities are.

What is Machine Learning?

Terms like “AI” and “machine learning” have become nearly meaningless in casual conversation and advertising media—especially since the arrival of large language models like ChatGPT. At Diller Digital, we define AI (that is, “artificial intelligence”) as computerized decision-making, covering areas from robotics and computer vision to language processing and machine learning.

Machine learning refers to the development of predictive models that are configured, or trained, by exposure to sample data rather than by explicitly encoded interactions. For example, you can develop a classification model that sorts pictures into dogs and cats by showing it a lot of examples of photos of dogs and cats. (Sign up for the class to learn the details of how to do this.).

Or you can develop a regression model to predict the temperature at my house tomorrow by training the model on the last 10 years’ worth of measurements of temperature, pressure, humidity, etc. from my personal weather station.

Classical vs Deep Learning

Broadly speaking, there are two kinds of machine learning: what we at Diller Digital call classical machine learning and deep learning. Classical machine learning is characterized by relatively small data sets, and it requires a skilled modeler to do feature engineering to make the best use of the available (and limited) training data. This is the subject of our Machine Learning for Scientists & Engineers class. Deep Learning is a subset of machine learning that makes use of many-layered models that function in a rough analog to how the neurons in a human brain function. Training such models requires much more data but less manual feature engineering by the modeler. The skill in deep learning is that of configuring the architecture of the model, and that is the subject of our Deep Learning for Scientists & Engineers.

Parameters and Hyperparameters

There is one more pair of definitions we need to cover before we can talk about folding versus batching: parameters and hyperparameters.

At the heart of both kinds of machine learning is the adjustment of a model’s parameters, sometimes also called coefficients or weights. Simply stated, these are the coefficients of what boils down to a linear regression problem.

Each model also has what are called hyperparameters, or parameters that govern how the model behaves algorithmically. These might include things like how you score your model’s performance or what method you use to update the model weights.

The process of training a model is the process of adjusting the parameters until you get the best possible predictions from your model. For this reason, we typically divide our training data into two parts: one (the training data set) for adjusting the weights, the other (the testing data set) for assessing the performance of the model. It’s important to score your model on data that was not used in the training step because you’re testing its predictive power on things it hasn’t seen before.

What is Folding?

So this brings us finally to the subject of folding and batching. Folding typically arises in the context of cross-validation, when you’re trying to decide on the best hyperparameters to use for your model. That process involves fitting your model with different sets of hyperparameters and seeing which combination gives the best results. How can you do that without using your test data set? (If we used the test data set during training, that would be cheating because it would sacrifice the ability of your model to generalize for the short-term gain of a better result.) We divide our training data into folds and hold each fold back as a “mini-test” data set and train on the others. We successively hold each fold back and then average the scores across the folds. That becomes our cross-validation score and gives us a way to score that set of hyperparameters without dipping into the test data set.

Folds divide a training data set into sections, one of which is held out as a mini “test” section for scoring a combination of hyperparameters in cross-validation.

What is Batching?

Batching looks a lot like folding but is a distinct concept used in a different context. Batching arises in the context of training deep models, and it serves two purposes. First, training a deep learning model typically requires a lot of training data (orders of magnitude more data than classical methods), and except for trivial cases you can’t fit all the training data into working memory at the same time. You solve that problem by dividing the training data into batches in much the same way that you would divide it into folds for cross-validation, and then iteratively update the model parameters using each batch of data until you have used the entire training data set. One full pass through all of the batches is called an epoch. Training a deep learning model typically takes multiple epochs.

A training data set is divided into batches to reduce memory requirements and provide variation for model parameter refinement. Each batch is used once per training epoch.

Beyond considerations of working memory, there’s a second important reason to train a deep model on batches: because there are so many model parameters with so many possible configurations, and because of the way the layers of the model insulate some of the parameters from information in the test data set, it’s helpful that smaller batches are “noisier” and provide more variation for the training algorithm to use to adjust the model parameters. As a physical analogy, you might think of the way that shaking a pan while you poured sand into it would help it settle into a flat surface more quickly than just waiting for gravity to do the work for you, and without shaking you might end up with lumps and bumps.

So hopefully, by this point you can see how folding is similar to batching and how they are distinct concepts. They both similarly divide training data into segments. Folding is used in cross-validation for optimizing hyperparameters, and batching is used in training deep learning models to limit memory requirements and improve convergence for fitting model parameters.

Diller Digital offers Machine Learning for Scientists & Engineers and Deep Learning for Scientists & Engineers at least once per quarter. Sign up to join us, and bring your curiosity, questions, and toughest problems and see what you can learn! Maybe you’ll join the chorus of those who leave glowing feedback.

Meet Your Instructors Series – Tim

Hello! This is Rachel and I will be hosting a series of Q+As with your valued instructors so that you can get a glimpse into their specific career backgrounds, teaching styles and processes.

We will be starting with our President and Founder of Diller Digital, Tim Diller, pictured here.


 
1) What is your name and where are you currently located? 

My name is Tim Diller, and I live in Austin, TX with my wife Hannah and my dog Stella.  I have three adult children who are all out of the house now. 

2 ) How did you end up in engineering education? 

For me it has been a long, winding, and nearly closed-loop path.  My reference point is my father, who has spent his entire career in academia, combining research and teaching in Biomedical Engineering at The University of Texas at Austin.  From him and others in my family, I developed a high regard for education, and from an early age I aspired to professorship, and that vision plus a deep-seated curiosity about mechanical things (especially cars and airplanes) guided my steps through high school, my bachelor’s degree in Mechanical Engineering at The University of Texas, and into the first semester of a Master’s degree program at MIT, where I hit an academic wall, nearly failing out of the program.  On academic probation, I did some deep soul searching and realized that the math-heavy robotics program I had been pursuing was not a good fit for my natural talents and inclinations.  Instead, with some guidance, I pivoted to project-based courses in manufacturing and production systems design, where I thrived.  That led me to my first “real” job, at the Michelin Americas R&C Corporation. 

My time at Michelin helped me realize a few more things:  I love the collaborative team environment, I love learning about what people are doing in industry (working for a Tier 1 supplier in the automotive industry is great for that), and I love teaching (I had the opportunity to pick up, revamp, and deliver Michelin’s course on tire performance for vehicle handling during my time there).  I also found myself gravitating to software development projects and had my first exposure to Python at that time.  I spent 5 years there before a growing sense of “unfinished business” led me to return to graduate school for a doctoral degree at The University of Texas. 

During my doctoral program, I spent a lot of time instrument an engine and analyzing data on exhaust gases (if you’ve taken a class from me or read some of my other posts, you’ll notice I use a lot of automotive references). had many opportunities to teach, substituting for my advising professor from time to time, delivering guest lectures on tire performance for another professor in the department.  By the time I had finished my degree and was working as a postdoctoral researcher, this was a regular occurrence.  During those years, I also made the transition from MATLAB and because thoroughly hooked on Python.  At the same time that I was seeking employment in academia, my love of using software for scientific computing was growing.  Thus it was that during another round of academic hiring (this was during the period after the downturn in 2008), I found that Enthought was hiring, and right in Austin, where I was located. 

The opportunity at Enthought included working on interesting coding and consulting projects across a broad swath of industry, working collaboratively in small teams, and teaching a 40-hr, week-long class called Python for Scientists & Engineers, which I would eventually teach over 50 times for Enthought. 

When in 2023 Enthought retired their training department during a reorganization, I founded Diller Digital to provide continuity of service to the customers they had served for decades.  I get a lot of joy working with smart, motivated engineers, scientists, and analysts to help them increase their digital skills in scientific computing. 

3 ) How do you stay current with the latest advancements in engineering technology and industry practices? 

I read papers and a lot of tech-oriented news sources.  It’s a lot of fun for me to do that. I buy (paper!) books on the topics I teach about and mark them up, code the demo examples and play around. For example, at present (mid 2025) I’m in the middle of reading and coding my through Sebastian Raschka’s Build a Large Language Model (From Scratch).   From time to time I will do small consulting jobs to stay engaged.  And it turns out I learn a lot from my students when they ask good questions.  I’ve learned that often the best answer is “I don’t know, but let me look into that”, and I’ll do enough of a deep dive to get an answer the next day, but often I’ll keep going.  And sometimes I’ll incorporate new material into the course based on that.  Or post about it.

4 ) Can you describe your teaching philosophy and how it aligns with Diller Digital’s mission and values? 

I believe that technology should be used to elevate the value and dignity of humanity’s work.  I also believe that a thorough understanding of fundamental principles is critical as a solid foundation for future self-learning in scientific computing and solving problems with software.  Because of that, I emphasize lots of hands-on experience, getting students to do basic things by hand and on their own before teaching them how to automate work with higher-level tools.  Although they might articulate it a little differently, this is close to the philosophy Enthought used to develop the materials we deliver at Diller Digital.  Enthought was clearly formative for me in my approach to teaching. 

5) What engineering software and tools do you have experience with, and how do you incorporate them into your teaching? 

My day-to-day coding work takes place in two contexts.  In the classroom, I use Jupyter Lab, which is just about the perfect tool for that environment— it’s simple enough to get everybody on the same platform quickly, even if someone has never used it before.  For maintaining the demos, exercises, automation scripts, or any other more-involved coding work, I’ll use VS Code with the Flake8 and Diff extensions installed. 

In the past, I liked Sublime Text because of its multiple-cursor and block-editing capabilities, and before that I was a proud (and probably obnoxious) fan of emacs, which I’ll still use on occasion when logged into a server with text-only interface.  But for that environment, I have come to appreciate the lighter-weight nano editor, which is available in pretty much every text-only environment I use these days. 

In addition to Python and the scientific computing libraries we teach, I’ve spent substantial time with MATLAB, C, LabView, and Visual Basic.  My first programming language was BASIC for the TI-99/4A, whose CALL SPRITE was the key technology that let me write my own video games. 

6) How do you balance theoretical knowledge with practical, hands-on learning in your classes? 

I try to teach in a way that theory and practice complement each other.  I use theory to explain the “Why”, and practical, hands-on learning to explain the “How”.  For example, when it comes to talking about lists and sets, the theory is important for understanding why sets have such faster look-up times, but I make sure that knowledge is accessible by demonstrating and having students follow along with %timeit commands. 

7) Can you discuss your experience with project-based learning and how you guide students through the data analysis workflow? 

As I talked about earlier, in graduate school I really struggled with theory-heavy instruction and thrived in project-based classes.  In addition, I have watched my father (an engineer professor) develop a course for graduate students in designing inquiry-based instruction, which is closely related to project-based learning.  Over the decades, we have had many long discussions about and have developed a shared passion for the subject. 

With that in mind, I start each class by asking them what goals they have in mind and then probing as much as time allows.  On the one hand, I’m genuinely interested in finding out what people do, but on the other hand, getting a student to articulate their problem and the value they are bringing to their organization is critical to creating context for learning.  That was a big part of what I did as a consultant for Enthought, and I have carried that into the classroom. 

Once context is established, I assume that the students need to see and hear, to follow along on their own machine, then to do on their own and exercise recall before they’ll master a concept.  We do this at multiple scales, typically building up from using and mastering data types, moving on to useful code segments, and ending with some kind of realistic capstone project that ties everything together.  Each of Python Foundations, Data Analysis with Pandas, Machine Learning, and Deep Learning follow this arc, and in choosing exercises, we have worked hard to make sure the problem is simple enough to be tractable in the relatively short time we have in class yet also complex enough to provide experience that will be useful in day-to-day work. 

8) What strategies do you use to assess student understanding and provide constructive feedback on their work? 

We have lots of small “Give It A Try” exercises that are designed to cement understanding and surface any confusion.  In virtual courses, it’s a bit more of a challenge, because I have to rely on self-reporting, and not everyone likes to unmute and ask a question.  For in-person classes, I walk the room during such exercises and look for the pink background of error messages.  The barrier to asking questions in that environment is lower.  But in either case, I tend to get good questions. 

The Give It A Try exercises are conducive to having students share their code, so sometimes I’ll use someone’s answer to explain the solution and ask for peer-suggestions. 

9) What strategies do you use to communicate complex engineering concepts to students with varying levels of understanding? 

This is the real challenge, because there is always a good diversity of backgrounds and experience.  One thing I do is try to provide a meta-level discussion, letting students know how important the following concept is and whether understanding it is critical or they can ignore it if needed. 

Another thing I do is to treat every question like pure gold, no matter the level.  If they ask about something fundamental I’ve already explained 5 times, great!  Because in that case, they finally have the context to get it, and by asking the question, they’ve owned the concept.  And if they ask a real stumper, and I have to do homework afterward to figure it out, that’s great too. 

Finally, I use a lot of physical analogies.  Students who have been in my classes may recall I tend to use a lot of automotive analogies, referring to “popping the hood” or talking about engines, brakes, and clutches.  But if someone mentions something like owning a hobby farm during introductions, we’ll talk about examples from the farm during class. 
 

10) What is your favorite way to spend a Saturday? Favorite meal? 
My ideal Saturday starts in the yard, mowing, weeding, or trimming. Once that’s done and the house is clean, maybe my wife and I will take our dog Stella for a walk on the local greenbelt trail.  After that, I work in the shop, where I like to build or restore furniture, make picture frames, or turn a bowl on my lathe.  Double the points if one of my kids is working with me.  If I can fire up the smoker and keep some ribs (when there’s more time) or fish (if there’s less) going while I’m in the shop, that completes the perfect Saturday. 

Tim Diller with his Family

Thanks for your answers to these questions, Tim so we can get to know you better as one of our respected instructors.