H4 or S5 !?

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: H4 or SF5!?

Post by Laskos »

Ajedrecista wrote:
The following thread is probably related to that:

1 draw=1 win + 1 loss (always!)

It is a little long but worthwhile.

Regards from Spain.

Ajedrecista.
I think I missed this thread, thanks Jesus.
I fried my main quad desktop, so on a crappy notebook I did a simple test. With Cutechess-Cli, I played a gauntlet against Houdini 4 at 10,000 nodes per move, with Stockfish at 1250, 2500, 5000, 10,000, 20,000, 40,000, 80,000 nodes per move, 1000 games each match. Daniel and Adam used much more data points from rating lists databases to show how the draw rate behaves, but my test, with few points, is more clinical, more games per data-point.
w and d are win and draw rates:

Rao-Kupper:

Code: Select all

1 draw = 1 win + 1 loss
d = C1*w*(1 - w - d)
Solution: d -> (C1 w - C1 w^2)/(1 + C1 w)
Fitting to data points (least squares): C1 = 1.6821
Davidson:

Code: Select all

2 draws = 1 win + 1 loss
C2*d^2 == w*(1 - w - d)
Solution: d -> (-w + Sqrt[4 C2 w + w^2 - 4 C2 w^2])/(2 C2)
Fitting to data points (least sqaures): C2 = 3.4133
Empirical comparison of the two models with the data (SF vs H 1,000 games matches):
Image

It seems Davidson model fits better the data points than Rao-Kupper. The standard deviation in draw rate for Davidson model is 0.8%, for Rao-Kupper 2.1%.

Conclusion: from my small sample, it seems that 2 draws = 1 win + 1 loss fits better data than 1 draw = 1 win + 1 loss (which BayesElo uses).
lkaufman
Posts: 6224
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: H4 or SF5!?

Post by lkaufman »

Laskos wrote:
Ajedrecista wrote:
The following thread is probably related to that:

1 draw=1 win + 1 loss (always!)

It is a little long but worthwhile.

Regards from Spain.

Ajedrecista.
I think I missed this thread, thanks Jesus.
I fried my main quad desktop, so on a crappy notebook I did a simple test. With Cutechess-Cli, I played a gauntlet against Houdini 4 at 10,000 nodes per move, with Stockfish at 1250, 2500, 5000, 10,000, 20,000, 40,000, 80,000 nodes per move, 1000 games each match. Daniel and Adam used much more data points from rating lists databases to show how the draw rate behaves, but my test, with few points, is more clinical, more games per data-point.
w and d are win and draw rates:

Rao-Kupper:

Code: Select all

1 draw = 1 win + 1 loss
d = C1*w*(1 - w - d)
Solution: d -> (C1 w - C1 w^2)/(1 + C1 w)
Fitting to data points (least squares): C1 = 1.6821
Davidson:

Code: Select all

2 draws = 1 win + 1 loss
C2*d^2 == w*(1 - w - d)
Solution: d -> (-w + Sqrt[4 C2 w + w^2 - 4 C2 w^2])/(2 C2)
Fitting to data points (least sqaures): C2 = 3.4133
Empirical comparison of the two models with the data (SF vs H 1,000 games matches):
Image

It seems Davidson model fits better the data points than Rao-Kupper. The standard deviation in draw rate for Davidson model is 0.8%, for Rao-Kupper 2.1%.

Conclusion: from my small sample, it seems that 2 draws = 1 win + 1 loss fits better data than 1 draw = 1 win + 1 loss (which BayesElo uses).
Very interesting! The burden of proof is on BayesElo to prove that its nonstandard assumption is better than the Elo system assumption (which corresponds to how tournaments are actually scored), so unless someone has a more conclusive study to back up that assumption, we should all be using Ordo.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: H4 or SF5!?

Post by IWB »

Laskos wrote: ...
Empirical comparison of the two models with the data (SF vs H 1,000 games matches):
Image

It seems Davidson model fits better the data points than Rao-Kupper. The standard deviation in draw rate for Davidson model is 0.8%, for Rao-Kupper 2.1%.

Conclusion: from my small sample, it seems that 2 draws = 1 win + 1 loss fits better data than 1 draw = 1 win + 1 loss (which BayesElo uses).
Thx Kai

For my 4th grade mathematical understanding especially the graph is very convincing.
Somehow I hesitate, but maybe I should rethink my calculation method.

Bye
Ingo
User avatar
Ajedrecista
Posts: 2098
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: H4 or SF5!?

Post by Ajedrecista »

Hello Kai:
Laskos wrote:
Ajedrecista wrote:
The following thread is probably related to that:

1 draw=1 win + 1 loss (always!)

It is a little long but worthwhile.

Regards from Spain.

Ajedrecista.
I think I missed this thread, thanks Jesus.
I fried my main quad desktop, so on a crappy notebook I did a simple test. With Cutechess-Cli, I played a gauntlet against Houdini 4 at 10,000 nodes per move, with Stockfish at 1250, 2500, 5000, 10,000, 20,000, 40,000, 80,000 nodes per move, 1000 games each match. Daniel and Adam used much more data points from rating lists databases to show how the draw rate behaves, but my test, with few points, is more clinical, more games per data-point.
w and d are win and draw rates:

Rao-Kupper:

Code: Select all

1 draw = 1 win + 1 loss
d = C1*w*(1 - w - d)
Solution: d -> (C1 w - C1 w^2)/(1 + C1 w)
Fitting to data points (least squares): C1 = 1.6821
Davidson:

Code: Select all

2 draws = 1 win + 1 loss
C2*d^2 == w*(1 - w - d)
Solution: d -> (-w + Sqrt[4 C2 w + w^2 - 4 C2 w^2])/(2 C2)
Fitting to data points (least sqaures): C2 = 3.4133
Empirical comparison of the two models with the data (SF vs H 1,000 games matches):
Image

It seems Davidson model fits better the data points than Rao-Kupper. The standard deviation in draw rate for Davidson model is 0.8%, for Rao-Kupper 2.1%.

Conclusion: from my small sample, it seems that 2 draws = 1 win + 1 loss fits better data than 1 draw = 1 win + 1 loss (which BayesElo uses).
Thank you very much for your effort!

I think I did not understand the concept, but it could be D^a = C*W*L with something like a draws = 1 win + 1 lose? My common sense says that the logical values of a are between 1 and 2. I took seven pairs of (W, D) values from your graph: (0.05, 0.1), (0.07, 0.13), (0.19, 0.19), (0.35, 0.21), (0.62, 0.19), (0.81, 0.14) and (0.89, 0.08). I did C = (D^a)/(W*L) with each value and I did some computations with many values of a and C, just searching the minimum of SUM{[(C*W_i*L_i)^(1/a) - D_i]²; i = 1, ..., 7} and the minimum of that sum happens in a > 2. Here is my initial Fortran code:

Code: Select all

program b
implicit none
integer,parameter::imax=7,pasos_a=1000,pasos_c=1000
integer::i,a0,c0
real(2)::w(imax),d(imax),l(imax),a,c,ci(imax),cimedio,cimin,cimax,k,s(imax),suma,suma2(0:pasos_a,-pasos_c/2:pasos_c/2)
w(1)=5d-2;w(2)=7d-2;w(3)=1.9d-1;w(4)=3.5d-1;w(5)=6.2d-1;w(6)=8.1d-1;w(7)=8.9d-1
d(1)=1d-1;d(2)=1.3d-1;d(3)=1.9d-1;d(4)=2.1d-1;d(5)=1.9d-1;d(6)=1.4d-1;d(7)=8d-2
l=1d0-w-d
open(unit=10,file='a_c_suma.txt',status='unknown',action='write')
do a0=0,pasos_a
  a=1d0+2d0*a0/pasos_a
  do i=1,imax
    ci(i)=(d(i)**a)/(w(i)*l(i))
  end do
  cimedio=sum(ci)/(1d0*imax)
  cimin=min(ci(1),ci(2),ci(3),ci(4),ci(5),ci(6),ci(7))
  cimax=max(ci(1),ci(2),ci(3),ci(4),ci(5),ci(6),ci(7))
  k=max(cimedio-cimin,cimax-cimedio)
  do c0=-pasos_c/2,pasos_c/2
    c=cimedio+c0*k/pasos_c
    do i=1,imax
      s(i)=(((c*w(i)*l(i))**(1d0/a))-d(i))*(((c*w(i)*l(i))**(1d0/a))-d(i))
    end do
    suma=0d0
    do i=1,imax
      suma=suma+s(i)
    end do
    write(10,*) a,c,suma
    suma2(a0,c0)=suma
  end do
  write(10,*)
end do
close(10)
write(*,*) 'Min.: ',minval(suma2)
end program b
Then I narrowed the bounds of a for get a better resolution, from 2.05 to 2.1 because my first try indicated that the minimum is close to a = 2.075. Later I tried a between 2.075 and 2.076 because my second run indicated a minimum near a = 2.07555. I finally got:

Code: Select all

a = 2.07555700000            C ~ 0.264418857063          sum ~ 1.156193741653E-03
This sum must be understood like a kind of error, that is why I searched the minimum. Does a > 2 have sense?

------------------------

Just for verificate the correctness of my results, my minimum error for a = 2 (Davidson) is for C ~ 0.302587148790. With your notation: C2 = 1/C ~ 3.3048 while you got 3.4133... this difference should be explained as I took approximated pairs of values from the graph and not the exact ones. I do not know how did you compute the standard deviation. For a = 1 (Rao-Kupper), I got C ~ 1.64111427372 while you got C1 ~ 1.6821, with the same explanation as before. This time, I only computed numbers with a = 1 and a = 2 but with many more possible C values.

I plotted errors in a 3D graph using Derive 6 (axis: a, C, sum) for a between 1 and 2, and it looked like a smoothed surface with decreasing values of the error (a better fit) with raising values of a and decreasing values of C. I hope I am right.

Of course my calculations should be valid only for these pairs of values. I am clueless in a more general scenario.

Regards from Spain.

Ajedrecista.
lkaufman
Posts: 6224
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: H4 or SF5!?

Post by lkaufman »

IWB wrote:
Laskos wrote: ...
Empirical comparison of the two models with the data (SF vs H 1,000 games matches):
Image

It seems Davidson model fits better the data points than Rao-Kupper. The standard deviation in draw rate for Davidson model is 0.8%, for Rao-Kupper 2.1%.

Conclusion: from my small sample, it seems that 2 draws = 1 win + 1 loss fits better data than 1 draw = 1 win + 1 loss (which BayesElo uses).
Thx Kai

For my 4th grade mathematical understanding especially the graph is very convincing.
Somehow I hesitate, but maybe I should rethink my calculation method.

Bye
Ingo
Unless the "right" value is far from 2, we should stick with the elo system, which in turn means Ordo. Maybe for this data the right value is slightly above 2, but you are the first one to suggest that the "right" value might actually be above 2. My own belief is that the right value is something like 1.7 or so. Still close enough to 2 to stick with Elo system, since tournaments are scored using "2" as the value.
Norm Pollock
Posts: 1070
Joined: Thu Mar 09, 2006 4:15 pm
Location: Long Island, NY, USA

Re: H4 or S5 !?

Post by Norm Pollock »

michiguel wrote:Ingo,

You used to provide the pgn file with only the results. Can you do that again? In that way, we can toy around with the rating programs and/or algorithms.

Miguel
Easily done with 1 of the tools in my "40H-PGN" (see www below):

truncate alpha.pgn 0
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: H4 or SF5!?

Post by Laskos »

Ajedrecista wrote: Thank you very much for your effort!

I think I did not understand the concept, but it could be D^a = C*W*L with something like a draws = 1 win + 1 lose? My common sense says that the logical values of a are between 1 and 2. I took seven pairs of (W, D) values from your graph: (0.05, 0.1), (0.07, 0.13), (0.19, 0.19), (0.35, 0.21), (0.62, 0.19), (0.81, 0.14) and (0.89, 0.08). I did C = (D^a)/(W*L) with each value and I did some computations with many values of a and C, just searching the minimum of SUM{[(C*W_i*L_i)^(1/a) - D_i]²; i = 1, ..., 7} and the minimum of that sum happens in a > 2. Here is my initial Fortran code:

Code: Select all

program b
implicit none
integer,parameter::imax=7,pasos_a=1000,pasos_c=1000
integer::i,a0,c0
real(2)::w(imax),d(imax),l(imax),a,c,ci(imax),cimedio,cimin,cimax,k,s(imax),suma,suma2(0:pasos_a,-pasos_c/2:pasos_c/2)
w(1)=5d-2;w(2)=7d-2;w(3)=1.9d-1;w(4)=3.5d-1;w(5)=6.2d-1;w(6)=8.1d-1;w(7)=8.9d-1
d(1)=1d-1;d(2)=1.3d-1;d(3)=1.9d-1;d(4)=2.1d-1;d(5)=1.9d-1;d(6)=1.4d-1;d(7)=8d-2
l=1d0-w-d
open(unit=10,file='a_c_suma.txt',status='unknown',action='write')
do a0=0,pasos_a
  a=1d0+2d0*a0/pasos_a
  do i=1,imax
    ci(i)=(d(i)**a)/(w(i)*l(i))
  end do
  cimedio=sum(ci)/(1d0*imax)
  cimin=min(ci(1),ci(2),ci(3),ci(4),ci(5),ci(6),ci(7))
  cimax=max(ci(1),ci(2),ci(3),ci(4),ci(5),ci(6),ci(7))
  k=max(cimedio-cimin,cimax-cimedio)
  do c0=-pasos_c/2,pasos_c/2
    c=cimedio+c0*k/pasos_c
    do i=1,imax
      s(i)=(((c*w(i)*l(i))**(1d0/a))-d(i))*(((c*w(i)*l(i))**(1d0/a))-d(i))
    end do
    suma=0d0
    do i=1,imax
      suma=suma+s(i)
    end do
    write(10,*) a,c,suma
    suma2(a0,c0)=suma
  end do
  write(10,*)
end do
close(10)
write(*,*) 'Min.: ',minval(suma2)
end program b
Then I narrowed the bounds of a for get a better resolution, from 2.05 to 2.1 because my first try indicated that the minimum is close to a = 2.075. Later I tried a between 2.075 and 2.076 because my second run indicated a minimum near a = 2.07555. I finally got:

Code: Select all

a = 2.07555700000            C ~ 0.264418857063          sum ~ 1.156193741653E-03
This sum must be understood like a kind of error, that is why I searched the minimum. Does a > 2 have sense?

------------------------

Just for verificate the correctness of my results, my minimum error for a = 2 (Davidson) is for C ~ 0.302587148790. With your notation: C2 = 1/C ~ 3.3048 while you got 3.4133... this difference should be explained as I took approximated pairs of values from the graph and not the exact ones. I do not know how did you compute the standard deviation. For a = 1 (Rao-Kupper), I got C ~ 1.64111427372 while you got C1 ~ 1.6821, with the same explanation as before. This time, I only computed numbers with a = 1 and a = 2 but with many more possible C values.

I plotted errors in a 3D graph using Derive 6 (axis: a, C, sum) for a between 1 and 2, and it looked like a smoothed surface with decreasing values of the error (a better fit) with raising values of a and decreasing values of C. I hope I am right.

Of course my calculations should be valid only for these pairs of values. I am clueless in a more general scenario.

Regards from Spain.

Ajedrecista.
Hello Jesus,
I played another gauntlet against Houdini 4 at 10,000 nodes per move, with Stockfish at 625, 1250, 2500, 5000, 10,000, 20,000, 40,000, 80,000, 160,000 nodes per move, 1000 games each match. So, now I have 9 points instead of 7, more importantly, more on the tails, and I set the contempt of Houdini 4 to 0, I was unsure if I did that in the previous test, and it could matter. The results are pretty much the same, so nothing was a statistical fluke.

As to your a bit above 2 draws fit, this sure can happen, because each point is built from 1,000 games only. Testing with 10,000 games each point with my notebook is a tedious overkill, the results are anyway pretty relevant.

My guess (not rigorous) of error margins is that test shows (for a couple of engines only) that:
1 win + 1 loss = 2 +/- 0.2 draws (error 1 standard deviation).
So, Larry's 1.7 could well be true, but the results are outside the error margins that the behavior is 1 draw, as BayesElo assumes.

The exact vector of results in previous test was in {wins, draws}:
{.350, .210}, {.617, .194}, {.812, .135}, {.894, .085}, {.189, .191}, {.078, .130}, {.055, .101}}
You can play and fit these results.

New results:

Code: Select all

S1                            : 1000 (+410,=210,-380), 51.5 %
S2                            : 1000 (+157,=205,-638), 26.0 %
S3                            : 1000 (+ 59,=146,-795), 13.2 %
S4                            : 1000 (+ 18,= 74,-908),  5.5 %
S5                            : 1000 (+  2,= 35,-963),  1.9 %
S12                           : 1000 (+627,=186,-187), 72.0 %
S13                           : 1000 (+791,=123,- 86), 85.2 %
S14                           : 1000 (+893,= 77,- 30), 93.2 %
S15                           : 1000 (+882,= 82,- 36), 92.3 %
Where "-" stands for wins of SF, and "=" for draws (i.e. Houdini's point of view)

w and d are win and draw rates:

Rao-Kupper:

Code: Select all

1 draw = 1 win + 1 loss
d = C1*w*(1 - w - d)
Solution: d -> (C1 w - C1 w^2)/(1 + C1 w)
Fitting to data points (least squares): C1 = 1.709
Davidson:

Code: Select all

2 draws = 1 win + 1 loss
C2*d^2 == w*(1 - w - d)
Solution: d -> (-w + Sqrt[4 C2 w + w^2 - 4 C2 w^2])/(2 C2)
Fitting to data points (least squares): C2 = 3.473
Empirical comparison of the two models with the data (SF vs H 1,000 games matches, 9 points now):
Image

It seems Davidson model fits again better the data points than Rao-Kupper. The standard deviation in draw rate for Davidson model is 1.1%, for Rao-Kupper 2.1%.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: H4 or SF5!?

Post by Laskos »

IWB wrote:
Thx Kai

For my 4th grade mathematical understanding especially the graph is very convincing.
Somehow I hesitate, but maybe I should rethink my calculation method.

Bye
Ingo
Yes, BayesElo not only uses _BayesElos_, which are not invertible in Logistic Elos, the hallmark of BayesElo, its draw model, seems to not fit the empirical data. Ordo is much safer to use.
User avatar
Ajedrecista
Posts: 2098
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: H4 or SF5!?

Post by Ajedrecista »

Hello Kai:
Laskos wrote:The exact vector of results in previous test was in {wins, draws}:
{.350, .210}, {.617, .194}, {.812, .135}, {.894, .085}, {.189, .191}, {.078, .130}, {.055, .101}}
You can play and fit these results.
Thank you very much. I fitted them in my way and I got somewhat different results than yours (?):

Code: Select all

a = 1 (Rao-Kupper)     C ~ 1.64583939508
a = 2 (Davidson)       C ~ 0.304427133114; C2 = 1/C ~ 3.2849

My optimum:

a = 2.129844           C ~ 0.241742100867
I thought I would get more similar results but I was wrong.

------------------------
Laskos wrote:New results:

Code: Select all

S1                            : 1000 (+410,=210,-380), 51.5 % 
S2                            : 1000 (+157,=205,-638), 26.0 % 
S3                            : 1000 (+ 59,=146,-795), 13.2 % 
S4                            : 1000 (+ 18,= 74,-908),  5.5 % 
S5                            : 1000 (+  2,= 35,-963),  1.9 % 
S12                           : 1000 (+627,=186,-187), 72.0 % 
S13                           : 1000 (+791,=123,- 86), 85.2 % 
S14                           : 1000 (+893,= 77,- 30), 93.2 % 
S15                           : 1000 (+882,= 82,- 36), 92.3 %
Where "-" stands for wins of SF, and "=" for draws (i.e. Houdini's point of view)
Now, trying your nine new points:

Code: Select all

a = 1 (Rao-Kupper)     C ~ 1.69165509087
a = 2 (Davidson)       C ~ 0.307671793421; C2 = 1/C ~ 3.2502

My optimum:

a = 1.99922489         C ~ 0.30810872973965895
Our C1 = C constant is more similar this time. :) After several runs, my optimum was very, very near to Davidson's model!

Regards from Spain.

Ajedrecista.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: H4 or SF5!?

Post by Laskos »

Ajedrecista wrote:Hello Kai:
Laskos wrote:The exact vector of results in previous test was in {wins, draws}:
{.350, .210}, {.617, .194}, {.812, .135}, {.894, .085}, {.189, .191}, {.078, .130}, {.055, .101}}
You can play and fit these results.
Thank you very much. I fitted them in my way and I got somewhat different results than yours (?):

Code: Select all

a = 1 (Rao-Kupper)     C ~ 1.64583939508
a = 2 (Davidson)       C ~ 0.304427133114; C2 = 1/C ~ 3.2849

My optimum:

a = 2.129844           C ~ 0.241742100867
I thought I would get more similar results but I was wrong.

------------------------
Laskos wrote:New results:

Code: Select all

S1                            : 1000 (+410,=210,-380), 51.5 % 
S2                            : 1000 (+157,=205,-638), 26.0 % 
S3                            : 1000 (+ 59,=146,-795), 13.2 % 
S4                            : 1000 (+ 18,= 74,-908),  5.5 % 
S5                            : 1000 (+  2,= 35,-963),  1.9 % 
S12                           : 1000 (+627,=186,-187), 72.0 % 
S13                           : 1000 (+791,=123,- 86), 85.2 % 
S14                           : 1000 (+893,= 77,- 30), 93.2 % 
S15                           : 1000 (+882,= 82,- 36), 92.3 %
Where "-" stands for wins of SF, and "=" for draws (i.e. Houdini's point of view)
Now, trying your nine new points:

Code: Select all

a = 1 (Rao-Kupper)     C ~ 1.69165509087
a = 2 (Davidson)       C ~ 0.307671793421; C2 = 1/C ~ 3.2502

My optimum:

a = 1.99922489         C ~ 0.30810872973965895
Our C1 = C constant is more similar this time. :) After several runs, my optimum was very, very near to Davidson's model!

Regards from Spain.

Ajedrecista.
Hello Jesus,

I added games to the matches, so now we have:

Code: Select all

S1                            : 3000 (+1225,=631,-1144), 51.4 %
S2                            : 3000 (+513,=602,-1885), 27.1 %
S3                            : 3000 (+191,=405,-2404), 13.1 %
S4                            : 3000 (+ 61,=227,-2712),  5.8 %
S5                            : 3000 (+  8,=118,-2874),  2.2 %
S12                           : 3000 (+1845,=578,-577), 71.1 %
S13                           : 3000 (+2398,=353,-249), 85.8 %
S14                           : 3000 (+2608,=262,-130), 91.3 %
S15                           : 3000 (+2659,=233,-108), 92.5 %
Fitted function is:

Code: Select all

d^a = C*w*(1-d-w)
I get a contour plot for least squares:
Image
And values for minimum:
a = 1.83
c = 0.40
With this many games one can safely assume:
1 win + 1 loss = 1.85 +/- 0.2 draws (this time error is 2 Standard Deviations).
BayesElo model (Rao-Kupper) is pretty much ruled out.

Can you confirm my results?
Thanks!