Choice of loss function to train a neural network evaluation

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
Fabio Gobbato
Posts: 219
Joined: Fri Apr 11, 2014 10:45 am
Full name: Fabio Gobbato

Choice of loss function to train a neural network evaluation

Post by Fabio Gobbato »

If you convert the centipawn score to probability what is the best loss function that you have tried to train a neural network evaluation?
I have tried with mean squared error and it gives good results, but the best net I've found so far is from error^2.5.
The best networks generated with the 2 loss function differ of about 2 elo, so there isn't much to gain.
Have you experimented with other loss function? With what results?
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Choice of loss function to train a neural network evaluation

Post by dangi12012 »

cp is pure garbage.
Train for WDL metric with a logistic function.
Don't forget that WL is much worse than WDL because you then can discriminate forced draws from drawing positions etc.

If you just mean loss the sum of squared differences works quite well.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
JoAnnP38
Posts: 253
Joined: Mon Aug 26, 2019 4:34 pm
Location: Clearwater, Florida USA
Full name: JoAnn Peeler

Re: Choice of loss function to train a neural network evaluation

Post by JoAnnP38 »

dangi12012 wrote: Sun Feb 12, 2023 3:21 pm cp is pure garbage.
Train for WDL metric with a logistic function.
Don't forget that WL is much worse than WDL because you then can discriminate forced draws from drawing positions etc.

If you just mean loss the sum of squared differences works quite well.
I have been looking into that over the past couple of days. A logistic regression or logit model seems like a nice stepping-stone to ML without going all the way to NN. I noticed that some engines that use this map the probability function onto cp purely for the purpose of reporting score back via UCI (or XBoard I would assume.) So instead of using it as an intermediary for training a NN, why not use it as the evaluation function itself? Maximizing the probability for a win seems more straightforward than trying to maximize a cp advantage.
jdart
Posts: 4398
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Choice of loss function to train a neural network evaluation

Post by jdart »

As I understand it, Stockfish currently tunes with lambda = 1.0, so only using scores, not WDL.
User avatar
Fabio Gobbato
Posts: 219
Joined: Fri Apr 11, 2014 10:45 am
Full name: Fabio Gobbato

Re: Choice of loss function to train a neural network evaluation

Post by Fabio Gobbato »

jdart wrote: Tue Feb 14, 2023 11:06 pm As I understand it, Stockfish currently tunes with lambda = 1.0, so only using scores, not WDL.
I don't know how stockfish does but from my experience you should convert the score of the search to probability and train the neural network with that. I think that lambda 1.0 means use only the probability from the search score and not the game result of that position. But as I have said I don't know exactly how the stockfish trainer works.