Sigmoid scale in neural network training question

Discussion of chess software programming and technical issues.

Moderator: Ras

sedicla
Posts: 182
Joined: Sat Jan 08, 2011 12:51 am
Location: USA
Full name: Alcides Schulz

Sigmoid scale in neural network training question

Post by sedicla »

Hello,

I changed tucano network to 768x512x1 architecture and was able to do the training and soon I will be releasing version 11. Version 10 had the bigger architecture similar to stockfish halfkp. I decided to that so I can build the training code and network eval. It has been a good learning experience.
Anyways the only parameter I'm not understanding is the sigmoid scale, which is used in the formula below:

Code: Select all

double tnn_sigmoid(double value)
{
    return 1.0f / (1.0f + exp(-value * SIGMOID_SCALE));
}
It is also used in the backpropagation sigmoid prime formula:

Code: Select all

double sigmoid_prime = output_sigmoid * (1.0f - output_sigmoid) * SIGMOID_SCALE;
What is the goal of this parameter? How to select a value? can this have a significant impact on the training? I'm using the value SIGMOID_SCALE = 4.0f / 1024.0f;
I see some other values in other training code and read somewhere that this can be related to the data itself.

I appreciate if someone can shed some light on that. Thanks!

Alcides.
alvinypeng
Posts: 36
Joined: Thu Mar 03, 2022 7:29 am
Full name: Alvin Peng

Re: Sigmoid scale in neural network training question

Post by alvinypeng »

A completely winning evaluation like 1000 cp should give a sigmoid value close to 1.0. If you don't scale the sigmoid, a position that is barely winning (e.g. 10 cp) will also give a sigmoid value close to 1.0. If you scale the sigmoid, the transition is less harsh. So a 10 cp evaluation might give 0.51, a 150 cp evaluation might give 0.60, and the 1000 cp position will still give 1.0.

I know Stockfish previously divided by 410. I think Koivisto divides by 160. Your dividing by 256 probably also works. The only way of knowing what scaling works best with your dataset is through trial and error. Keep in mind, changing the scale will change how your loss looks, so you cannot rely on that to determine which scale is better. You will need to test networks through play testing.
sedicla
Posts: 182
Joined: Sat Jan 08, 2011 12:51 am
Location: USA
Full name: Alcides Schulz

Re: Sigmoid scale in neural network training question

Post by sedicla »

Thanks for the explanation Alvin, that makes sense.
I think I will do some testing with that in mind.