I have read in several threads that some of the people tune their data with so called "Mini-Batches",

a subset of the total dataset. What is the idea and how can it be used.

A link to an easy introduction on that topic would already be interesting.

Thanks in advance.

## Using Mini-Batch for tunig

**Moderators:** hgm, Dann Corbit, Harvey Williamson

**Forum rules**

This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

### Re: Using Mini-Batch for tunig

Stochastic gradient descent could be seen as mini-batch of size 1. So you iterate over your data, one by one and update the weights after one position. In mini-batch you divide your training data into parts of size N (typically in range 32..1024 or more) and update weights once per mini-batch. It is usually more effective than SGD and more parallelizable be it via GPU or CPU, so in practice much faster.

### Re: Using Mini-Batch for tunig

Splitting the training data into parts doesn't sound like it depends on a tuning algorithm itself, so what is the general idea.derjack wrote: ↑Tue Jan 12, 2021 8:11 pmStochastic gradient descent could be seen as mini-batch of size 1. So you iterate over your data, one by one and update the weights after one position. In mini-batch you divide your training data into parts of size N (typically in range 32..1024 or more) and update weights once per mini-batch. It is usually more effective than SGD and more parallelizable be it via GPU or CPU, so in practice much faster.

I can imagine, that results in more updates of an parameter vector and you reach very fast a semi-good solution.

Further i would guess the challenge could be to calm down to avoid fluctuations or divergence.

In basic algorithms where i don't have something like a learning rate i would control it with stepcount * stepsize when updating a parameter.

Additionally someone could operate on every parameter in the beginning and later only pick a subset of parameter vector.

What do you think ? sorry, a complete new field for me.

### Re: Using Mini-Batch for tunig

Actually it is algorithm specific, or gradient descent specific. In mini-batches you still operate on the entire data in each epoch.Desperado wrote: ↑Tue Jan 12, 2021 8:30 pmSplitting the training data into parts doesn't sound like it depends on a tuning algorithm itself, so what is the general idea.derjack wrote: ↑Tue Jan 12, 2021 8:11 pmStochastic gradient descent could be seen as mini-batch of size 1. So you iterate over your data, one by one and update the weights after one position. In mini-batch you divide your training data into parts of size N (typically in range 32..1024 or more) and update weights once per mini-batch. It is usually more effective than SGD and more parallelizable be it via GPU or CPU, so in practice much faster.

Maybe you misunderstood with something like using only a subset of positions and training on them, then using other subset of positions etc. training separately for each subsets, and since they are smaller, the training would be faster.

### Re: Using Mini-Batch for tunig

What i mean is that you accept changes during an epoch, so you update the reference fitness too (in the hope you are on the right track).derjack wrote: ↑Tue Jan 12, 2021 9:07 pmActually it is algorithm specific, or gradient descent specific. In mini-batches you still operate on the entire data in each epoch.Desperado wrote: ↑Tue Jan 12, 2021 8:30 pmSplitting the training data into parts doesn't sound like it depends on a tuning algorithm itself, so what is the general idea.derjack wrote: ↑Tue Jan 12, 2021 8:11 pmStochastic gradient descent could be seen as mini-batch of size 1. So you iterate over your data, one by one and update the weights after one position. In mini-batch you divide your training data into parts of size N (typically in range 32..1024 or more) and update weights once per mini-batch. It is usually more effective than SGD and more parallelizable be it via GPU or CPU, so in practice much faster.

Maybe you misunderstood with something like using only a subset of positions and training on them, then using other subset of positions etc. training separately for each subsets, and since they are smaller, the training would be faster.

The would speed up things at the beginning but you would need to "cool down" the learning rate in later epochs, so you don't begin to fluctuate or to diverge because of too many updates of the reference fitness.