Optimization algorithm for training a neural net

Fabio Gobbato · Post by **Fabio Gobbato** » Thu Aug 29, 2024 7:12 pm

So far for training the neural network of my engine I have used batch gradient descent. I've read that Adam gives faster convergence and lower error. Have you tried both and which one works better? Are there big differences between the two?

Joost Buijs · Post by **Joost Buijs** » Fri Aug 30, 2024 6:00 am

Fabio Gobbato wrote: ↑Thu Aug 29, 2024 7:12 pm So far for training the neural network of my engine I have used batch gradient descent. I've read that Adam gives faster convergence and lower error. Have you tried both and which one works better? Are there big differences between the two?

With adaptive optimization methods like Adam you'll just need a fraction of the number of iterations to reach convergence, this is true. That Adam would give you lower error is simply not true, the consensus is that gradient descent is a more 'stable' algorithm than adaptive methods. You could also try to add 'momentum' to gradient descent, this will make it converge faster too.

Most of the time I use AdamW to train my neural networks, this is a slightly modified Adam which handles weight-decay somewhat differently, I'm happy with it's performance.

Optimization algorithm for training a neural net

Optimization algorithm for training a neural net

Re: Optimization algorithm for training a neural net