Why computing K that minimizes the sigmoid func. value?...

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
User avatar
Desperado
Posts: 638
Joined: Mon Dec 15, 2008 10:45 am

Why computing K that minimizes the sigmoid func. value?...

Post by Desperado » Thu Nov 19, 2015 9:35 pm

Hello,

i just read the chessprogramming article about Texels tuning algorithm and found the following part which i do not understand.

Code: Select all


Sigmoid(s)=1/(1+10^(-K/400))

K is a scaling constant.

Compute the K that minimizes E. K is never changed again by the algorithm.

Can someone explain why this step will help or needs to be done ?

My error computation attempts only pass an evaluation score to the sigmoid function. So, if i get the result of 0.6 i can square the difference to 0.0/0.5/1.0 or just another reference value computed by the sigmoid function.

So, how does it help, if i just want the minimized sum of squared errors ?

Thanks in advance.

Sven
Posts: 3830
Joined: Thu May 15, 2008 7:57 pm
Location: Berlin, Germany
Full name: Sven Schüle
Contact:

Re: Why computing K that minimizes the sigmoid func. value?.

Post by Sven » Thu Nov 19, 2015 9:49 pm

Desperado wrote:Hello,

i just read the chessprogramming article about Texels tuning algorithm and found the following part which i do not understand.

Code: Select all

Sigmoid(s)=1/(1+10^(-K/400))

K is a scaling constant.

Compute the K that minimizes E. K is never changed again by the algorithm.
Can someone explain why this step will help or needs to be done ?

My error computation attempts only pass an evaluation score to the sigmoid function. So, if i get the result of 0.6 i can square the difference to 0.0/0.5/1.0 or just another reference value computed by the sigmoid function.

So, how does it help, if i just want the minimized sum of squared errors ?

Thanks in advance.
It is Ks/400, not K/400. In the article you refer to, "s" is the qSearch score returned by the engine, which is usually not in the range [0..1]. Sigmoid(s) maps it to a kind of winning probability, I think.

User avatar
Desperado
Posts: 638
Joined: Mon Dec 15, 2008 10:45 am

Re: Why computing K that minimizes the sigmoid func. value?.

Post by Desperado » Thu Nov 19, 2015 10:09 pm

Sven Schüle wrote:
Desperado wrote:Hello,

i just read the chessprogramming article about Texels tuning algorithm and found the following part which i do not understand.

Code: Select all

Sigmoid(s)=1/(1+10^(-K/400))

K is a scaling constant.

Compute the K that minimizes E. K is never changed again by the algorithm.
Can someone explain why this step will help or needs to be done ?

My error computation attempts only pass an evaluation score to the sigmoid function. So, if i get the result of 0.6 i can square the difference to 0.0/0.5/1.0 or just another reference value computed by the sigmoid function.

So, how does it help, if i just want the minimized sum of squared errors ?

Thanks in advance.
It is Ks/400, not K/400. In the article you refer to, "s" is the qSearch score returned by the engine, which is usually not in the range [0..1]. Sigmoid(s) maps it to a kind of winning probability, I think.
Hi Sven,

ok, no doubt that the q-score or any other score will be mapped into the range between 0...1. But currently i don't understand

"Compute the K that minimizes E. K is never changed again by the algorithm"

I can follow that i should:

1. map the score into the range 0...1 (eg 0.6)
2. compute a difference (eg for a a result score this might be 1.0 - 0.6)
3. square the result, which is finally the squared error
4. sum it together for all positions that will be analyzed.

5. the tuner will repeat step 1 to 4 to find configurations (lower squared error sums)

"Computing the K that minimizes E" is not the same as minimizing the total error based on the squared errors (step 1 to 4), or should it mean exactly this. If so, the second part does not make sense to me, that K needs to be constant then.

Sven
Posts: 3830
Joined: Thu May 15, 2008 7:57 pm
Location: Berlin, Germany
Full name: Sven Schüle
Contact:

Re: Why computing K that minimizes the sigmoid func. value?.

Post by Sven » Thu Nov 19, 2015 10:35 pm

Desperado wrote:But currently i don't understand

"Compute the K that minimizes E. K is never changed again by the algorithm"

I can follow that i should:

1. map the score into the range 0...1 (eg 0.6)
2. compute a difference (eg for a a result score this might be 1.0 - 0.6)
3. square the result, which is finally the squared error
4. sum it together for all positions that will be analyzed.

5. the tuner will repeat step 1 to 4 to find configurations (lower squared error sums)

"Computing the K that minimizes E" is not the same as minimizing the total error based on the squared errors (step 1 to 4), or should it mean exactly this. If so, the second part does not make sense to me, that K needs to be constant then.
What I understood is that you determine K once based on your initial evaluation parameters, then keep it constant while repeating 1-4. I think that always recalculating K would invalidate the algorithm. It is obviously meant as a constant. I don't know whether it has any influence on the convergence behavior of the tuning algorithm.

Sven
Posts: 3830
Joined: Thu May 15, 2008 7:57 pm
Location: Berlin, Germany
Full name: Sven Schüle
Contact:

Re: Why computing K that minimizes the sigmoid func. value?.

Post by Sven » Thu Nov 19, 2015 10:39 pm

Sven Schüle wrote:
Desperado wrote:But currently i don't understand

"Compute the K that minimizes E. K is never changed again by the algorithm"

I can follow that i should:

1. map the score into the range 0...1 (eg 0.6)
2. compute a difference (eg for a a result score this might be 1.0 - 0.6)
3. square the result, which is finally the squared error
4. sum it together for all positions that will be analyzed.

5. the tuner will repeat step 1 to 4 to find configurations (lower squared error sums)

"Computing the K that minimizes E" is not the same as minimizing the total error based on the squared errors (step 1 to 4), or should it mean exactly this. If so, the second part does not make sense to me, that K needs to be constant then.
What I understood is that you determine K once based on your initial evaluation parameters, then keep it constant while repeating 1-4. I think that always recalculating K would invalidate the algorithm. It is obviously meant as a constant. I don't know whether it has any influence on the convergence behavior of the tuning algorithm.
In the same article Peter wrote in the section "Results":
Peter Österlund wrote:When I started using this method, I calculated that K=1.13 best matched the then current evaluation function in my engine. I have not changed the value of K since then and do not intend to change it in the future either, since it is just an arbitrary scaling constant.
That should perfectly answer your question.

mar
Posts: 2015
Joined: Fri Nov 26, 2010 1:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Why computing K that minimizes the sigmoid func. value?.

Post by mar » Fri Nov 20, 2015 2:47 am

Desperado wrote:Can someone explain why this step will help or needs to be done ?
I'm sure Peter will explain this in detail.
The sigmoid maps eval to winning probability.
This step is crucial. In my first attempt I didn't bother computing K and I couldn't improve.
After that I was able to gain around 100 elo (in several steps), which is a fantastic result.
My current K is 1.121 and current error used for tuning is around 0.0834;
YMMV but you should probably get values in a similar range. Hope it helps.

Ferdy
Posts: 4113
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: Why computing K that minimizes the sigmoid func. value?.

Post by Ferdy » Fri Nov 20, 2015 5:20 am

There is this relationship between win probablity and pawn advantage,
where as the score increases the win probability also increases.
But the figure defers from engine to engine, Sf might be winning all the time (100%)
if its score is already 400 cp or more. This is where the K comes in, if your
qscore returns 600 cp, what error would you like it to assign? Say
error is 0, that would mean a 100% winning probability. You can adjust the K to reflect
this. But if you are not convinced that 600 cp score is not 100% winning then there
is error in it, so adjust your K to produce your expected error, or adjust K to define
win probability (not 100%) based from your engine.

Kai had some post regarding expected score from some engines, showing that
different engines may have different expected score depending on the evaluation and
game phase. Game phase is tricky at ending positions higher score is required to
get higher win probability.

See sample below, K can be adjusted to define errors, as K decreases, the error decreases.
I think as the engine gets stronger this K should be adjusted so that even when its score is not that big the engine is able to win some more games.

Image

User avatar
Desperado
Posts: 638
Joined: Mon Dec 15, 2008 10:45 am

Re: Why computing K that minimizes the sigmoid func. value?.

Post by Desperado » Fri Nov 20, 2015 7:41 am

Good morning,

thanks for your replies, but i still don't understand it.

I think it is absolutely clear to me that the absolute score of the mapped cp value does change with different K. But is it important ?

The mapped value of 600cp will always be greater than for 100cp as long both are computed with the same K.
Because of that, the tuner will always make the same decisions and it does not matter which K you use, it only matters that K needs to be constant.

What do i overlook in this picture ?

User avatar
Desperado
Posts: 638
Joined: Mon Dec 15, 2008 10:45 am

Re: Why computing K that minimizes the sigmoid func. value?.

Post by Desperado » Fri Nov 20, 2015 7:48 am

Sven Schüle wrote:
Sven Schüle wrote:
Desperado wrote:But currently i don't understand

"Compute the K that minimizes E. K is never changed again by the algorithm"

I can follow that i should:

1. map the score into the range 0...1 (eg 0.6)
2. compute a difference (eg for a a result score this might be 1.0 - 0.6)
3. square the result, which is finally the squared error
4. sum it together for all positions that will be analyzed.

5. the tuner will repeat step 1 to 4 to find configurations (lower squared error sums)

"Computing the K that minimizes E" is not the same as minimizing the total error based on the squared errors (step 1 to 4), or should it mean exactly this. If so, the second part does not make sense to me, that K needs to be constant then.
What I understood is that you determine K once based on your initial evaluation parameters, then keep it constant while repeating 1-4. I think that always recalculating K would invalidate the algorithm. It is obviously meant as a constant. I don't know whether it has any influence on the convergence behavior of the tuning algorithm.
In the same article Peter wrote in the section "Results":
Peter Österlund wrote:When I started using this method, I calculated that K=1.13 best matched the then current evaluation function in my engine. I have not changed the value of K since then and do not intend to change it in the future either, since it is just an arbitrary scaling constant.
That should perfectly answer your question.
Good morning, Sven.

Peter's comment just tells it is an arbitrary scaling constant. So, why to compute it anyway ? I guess there is no need to do it, so i am even more suprised that some people consider it as relevant.

Finally starting with the arbitrary K=1 and keep using it is enough imho.
As Peter pointed out, computing the K today would lead to another initial K, so a pointless process...

mvk
Posts: 589
Joined: Tue Jun 04, 2013 8:15 pm

Re: Why computing K that minimizes the sigmoid func. value?.

Post by mvk » Fri Nov 20, 2015 8:19 am

Desperado wrote:

Code: Select all

Sigmoid(s)=1/(1+10^(-K/400))

K is a scaling constant.

Compute the K that minimizes E. K is never changed again by the algorithm.
Can someone explain why this step will help or needs to be done ?
.
My interpretation is that measuring and fixating `K' is only important if you start with an existing hand-crafted and weighted evaluation, and then want to use this method to tune a few parameters and leave others unchanged. That way you can preserve some of the bias of the old evaluation. Otherwise, just use K=1 (or anything else: K=1 just makes it easiest to compare evaluations with other programs where a pawn is ~100).

I have always tuned from the ground up, so I have K=1 in Rookie and Floyd. See here the snippet from tune.py:

Code: Select all

def scoreToP(score):
        return 1 / (1 + 10 ** (-score/4.0))
Given that, it seems to me that the formula you quote from the wiki should read

Code: Select all

Sigmoid(s)=1/(1+10^(-K*s/400))
(`s' is missing). Maybe that explains the confusion.

BTW, one of the nice things about this method is that fixating the score-to-P mapping is far more elegant than pegging one of the parameters to an arbitrary value, as other methods do (such as pawn=100 or a knight=320 or whatever).
[Account deleted]

Post Reply