The future of chess and elo ratings

Michel · Post by **Michel** » Mon Sep 21, 2015 6:07 pm

can you repeat for drawelo=240?

I did. Balanced is still best according the model I am using although the difference is small.

When I have time I will redo the computation for Davidson. Do you know what is a sensible value for d/sqrt(w*l) ? I have no experience with Davidson.

Laskos · Post by **Laskos** » Mon Sep 21, 2015 6:18 pm

Michel wrote:
can you repeat for drawelo=240?
I did. Balanced is still best according the model I am using although the difference is small.

When I have time I will redo the computation for Davidson. Do you know what is a sensible value for d/sqrt(w*l) ? I have no experience with Davidson.

Say 6 would be fine. I am not sure about the model, you seem to use drawelo for both draw model and unbalanced positions model, and in high values of drawelo, where the model becomes dubious.

Michel · Post by **Michel** » Mon Sep 21, 2015 6:26 pm

Laskos wrote:
Michel wrote:
can you repeat for drawelo=240?
I did. Balanced is still best according the model I am using although the difference is small.

When I have time I will redo the computation for Davidson. Do you know what is a sensible value for d/sqrt(w*l) ? I have no experience with Davidson.
Say 6 would be fine. I am not sure about the model, you seem to use drawelo for both draw model and unbalanced positions model, and in high values of drawelo, where the model becomes dubious.

Thanks! I'll try 6 then.

I am just using the standard BE model. The "Advantage" parameter which is supposed to measure the white advantage in the opening position is now used to measure the unbalancedness of the positions that are being used.

Michel · Post by **Michel** » Mon Sep 21, 2015 7:32 pm

With Davidson it doesn't seem to make any difference. The resolution of balanced versus unbalanced is indistinguishable.

Code: Select all

bb=log&#40;10&#41;/400
theta=6
var&#40;"un")
L&#40;x&#41;=1/&#40;1+exp&#40;-bb*x&#41;)
w1_=L&#40;x+un&#41;
l1_=L&#40;-x-un&#41;
d1_=theta*sqrt&#40;w1_*l1_)
s1=w1_+l1_+d1_
w1=w1_/s1
l1=l1_/s1

w2_=L&#40;x-un&#41;
l2_=L&#40;-x+un&#41;
d2_=theta*sqrt&#40;w2_*l2_)
s2=w2_+l2_+d2_
w2=w2_/s2
l2=l2_/s2

avw=&#40;w1+w2&#41;/2
avl=&#40;l1+l2&#41;/2
d=avw-avl
s=sqrt&#40;avw*&#40;1-avw&#41;+avl*&#40;1-avl&#41;+2*avl*avw&#41;
res=d/s
A=plot&#40;res&#40;un=0&#41;,&#91;x,-200,200&#93;,color="blue")
B=plot&#40;res&#40;un=50&#41;,&#91;x,-200,200&#93;,color="yellow")
C=plot&#40;res&#40;un=100&#41;,&#91;x,-200,200&#93;,color="red")
D=plot&#40;res&#40;un=200&#41;,&#91;x,-200,200&#93;,color="green")
show&#40;A+B+C+D&#41;

Laskos · Post by **Laskos** » Mon Sep 21, 2015 10:40 pm

Michel wrote:With Davidson it doesn't seem to make any difference. The resolution of balanced versus unbalanced is indistinguishable.

Code: Select all

bb=log&#40;10&#41;/400
theta=6
var&#40;"un")
L&#40;x&#41;=1/&#40;1+exp&#40;-bb*x&#41;)
w1_=L&#40;x+un&#41;
l1_=L&#40;-x-un&#41;
d1_=theta*sqrt&#40;w1_*l1_)
s1=w1_+l1_+d1_
w1=w1_/s1
l1=l1_/s1

w2_=L&#40;x-un&#41;
l2_=L&#40;-x+un&#41;
d2_=theta*sqrt&#40;w2_*l2_)
s2=w2_+l2_+d2_
w2=w2_/s2
l2=l2_/s2

avw=&#40;w1+w2&#41;/2
avl=&#40;l1+l2&#41;/2
d=avw-avl
s=sqrt&#40;avw*&#40;1-avw&#41;+avl*&#40;1-avl&#41;+2*avl*avw&#41;
res=d/s
A=plot&#40;res&#40;un=0&#41;,&#91;x,-200,200&#93;,color="blue")
B=plot&#40;res&#40;un=50&#41;,&#91;x,-200,200&#93;,color="yellow")
C=plot&#40;res&#40;un=100&#41;,&#91;x,-200,200&#93;,color="red")
D=plot&#40;res&#40;un=200&#41;,&#91;x,-200,200&#93;,color="green")
show&#40;A+B+C+D&#41;

Thanks!
Even if I trust more Davidson and even if it is correctly describing the draw model, the opening positions model is still drawelo based here. As these things seem to be model dependent, I took a database of 30,000 games of closely related recent Stockfishes, and performed manipulations with the python tool provided several months ago by Ferdinand. The initial openings are from 2moves_v2.pgn used in SF testing framework.

Code: Select all

Balanced openings&#58;

Summary&#58;
                 players      min      max     Gcnt     Wcnt     Lcnt     Dcnt
   perf
              Stockfish1     0.00     0.40    25819     6080     5392    14347
 51.33%
              Stockfish2     0.00     0.40    26283     6956     4981    14346
 53.76%

Total real games&#58; 30000
Eval window&#58; 0.00 to 0.40
Divisor&#58; 1
Move range&#58; 6 to 8
Elpased time&#58; 0.64m



Unbalanced openings&#58;

Summary&#58;
                 players      min      max     Gcnt     Wcnt     Lcnt     Dcnt
   perf
              Stockfish1     1.20     1.40      957      501      121      335
 69.85%
              Stockfish2     1.20     1.40     1174      670       97      407
 74.40%

Total real games&#58; 30000
Eval window&#58; 1.20 to 1.40
Divisor&#58; 1
Move range&#58; 6 to 8
Elpased time&#58; 0.63m
---------------------------------------------------------------------------------


Balanced&#58; w-l = 2.43%
sigma = sqrt&#40;w*&#40;1-w&#41;+l*&#40;1-l&#41;+2*w*l&#41; = 0.668
&#40;w-l&#41;/sigma = 3.64


Unbalanced&#58; w-l = 4.55% 
sigma = sqrt&#40;w*&#40;1-w&#41;+l*&#40;1-l&#41;+2*w*l&#41; = 0.675
&#40;w-l&#41;/sigma = 6.74

Empiric result shows a clear improvement in (w-l)/sigma of _unbalanced_ positions compared to balanced ones (6.74 versus 3.64).

Michel · Post by **Michel** » Tue Sep 22, 2015 7:16 am

I can confirm your findings although I do not quite understand your numbers. Perhaps I made an arithmetic mistake since I did it quickly before going to work. Here is what I get (with hopefully self explanatory notations)

Code: Select all

W1=11061
D1=28693
L1=12348
N1=52102
w1=W1/N1=0.212295113431346
l1=L1/N1=0.236996660396914
w1-l1=-0.0247015469655675
sigma&#40;w1-l1&#41;=sqrt&#40;&#40;1-w1&#41;*w1+&#40;1-l1&#41;*l1+2*w1*l1&#41;=0.669837000624605
&#40;w1-l1&#41;/sigma=-0.0368769520682405

W2=598
D2=742
L2=791
N2=2131
w2=W2/N2=0.280619427498827
l2=L2/N2=0.371187236039418
w2-l2=-0.0905678085405913
sigma&#40;w2-l2&#41;=0.802249422308548
&#40;w2-l2&#41;/sigma=-0.112892332511719

At least my computation also suggests the second option is better (I checked that the difference is actually significant).

So it seems that assigning elo to a position is incorrect. This is weird since what else could you do?

Michel · Post by **Michel** » Tue Sep 22, 2015 10:54 am

Actually now I must confess I do not understand your data. I assumed that in the matrix

Code: Select all

   players      min      max     Gcnt     Wcnt     Lcnt     Dcnt 
Stockfish1     0.00     0.40    25819     6080     5392    14347
Stockfish2     0.00     0.40    26283     6956     4981    14346

you were giving the games where the bias was respectively in [0.00,0.40] and [-0.40,0.00] for SF1. But now I see that this is impossible since the total number of games is only 30,000.

However if it concerns the games where the bias is in the interval [-0.40,0.40] then we should have wins(SF1)=losses(SF2) which is not satisfied.

Can you clarify?

Laskos · Post by **Laskos** » Tue Sep 22, 2015 12:00 pm

Michel wrote:Actually now I must confess I do not understand your data. I assumed that in the matrix
Code: Select all
   players      min      max     Gcnt     Wcnt     Lcnt     Dcnt 
Stockfish1     0.00     0.40    25819     6080     5392    14347
Stockfish2     0.00     0.40    26283     6956     4981    14346 
you were giving the games where the bias was respectively in [0.00,0.40] and [-0.40,0.00] for SF1. But now I see that this is impossible since the total number of games is only 30,000.

However if it concerns the games where the bias is in the interval [-0.40,0.40] then we should have wins(SF1)=losses(SF2) which is not satisfied.

Can you clarify?

Actually, Gcnt shows how many times a position with SF1 eval (respectively SF2 eval) in the interval [0.00,0.40] occurred in 30,000 "real" games between moves [6,8]. If the eval of [0.00,0.40] occurs at each of the moves 6,7,8 of a game, it will count inside Gcnt all 3 times as 3 independent positions. Evals of [-0.40,00] do not appear at all. Then, for each picked position inside Gcnt it assigns the result of that game where this position occurred.

Michel · Post by **Michel** » Tue Sep 22, 2015 12:50 pm

Evals of [-0.40,00] do not appear at all.

Ok. I understand. I was assuming the eval was provided by a referee engine to avoid bias. But what you are doing might be ok.

But if you are really talking consecutive positions from the same game and counting them as different games then there are too may unknowns I think to do a valid statistical analysis. For one thing the computation I did to check that the result is significant would no longer be valid.

The (theoretical) statistical analysis I did is for an eng1-eng2 match with the unbalancedness of the end of book position being +-A with equal probability. (*)

If we want to refute the theoretical result we should do it under those conditions.

(*) The reason for these conditions is that the analysis can be tested empirically without reference to any elo model as the outcome of the match is controlled by a trinomial distribution and under the null hypothesis we have w=l. +-A is not so important but care should be taking that the average bias is zero from the POV of eng1, for example by assigning sides randomly. Replaying games with different colors is also ok (with the caveat that has already been discussed).

jhellis3 · Post by **jhellis3** » Tue Sep 22, 2015 7:23 pm

I don't really buy the premise of the OP.

1) Engines still have a long, long way to go yet.

2) Humans still play the game (without consulting the engines).

3) There are many, many paths to all three results.

Exhibit A: Magnus Carlsen - has made his way to World Champion and Number 1 Rating often using "sub-optimal" moves in the opening. He has stated he wants to just "play chess", and I would say thus far has proven very capable at achieving exactly that.

The only problem I can foresee is a savant with a photographic memory who is capable of memorizing billions of computer evals and is also GM strength in natural chess playing ability. That seems pretty unlikely, but even if someone like that were to come along.... fair play to them.

The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings