Optimization algorithm for training a neural net

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
Fabio Gobbato
Posts: 219
Joined: Fri Apr 11, 2014 10:45 am
Full name: Fabio Gobbato

Optimization algorithm for training a neural net

Post by Fabio Gobbato »

So far for training the neural network of my engine I have used batch gradient descent. I've read that Adam gives faster convergence and lower error. Have you tried both and which one works better? Are there big differences between the two?
Joost Buijs
Posts: 1625
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Optimization algorithm for training a neural net

Post by Joost Buijs »

Fabio Gobbato wrote: Thu Aug 29, 2024 7:12 pm So far for training the neural network of my engine I have used batch gradient descent. I've read that Adam gives faster convergence and lower error. Have you tried both and which one works better? Are there big differences between the two?
With adaptive optimization methods like Adam you'll just need a fraction of the number of iterations to reach convergence, this is true. That Adam would give you lower error is simply not true, the consensus is that gradient descent is a more 'stable' algorithm than adaptive methods. You could also try to add 'momentum' to gradient descent, this will make it converge faster too.

Most of the time I use AdamW to train my neural networks, this is a slightly modified Adam which handles weight-decay somewhat differently, I'm happy with it's performance.