Empirically 1 win + 1 loss ~ 2 draws

Laskos · Post by **Laskos** » Wed Jun 25, 2014 11:58 am

hgm wrote:Indeed. But individual errors in games do not have limited impact. It can very well be that games in general are dominated by the worst error.

I am not here advocating Glenn-David model (which gives 3/2 draws), I don't know if most games are decided by accumulation of small errors obeying Gaussian, or by worst error, ruining the Gaussian. I do not know what is the "plausibility" of Davidson Logistic draw model, but it fits the empirical data, while Rao-Kupper used in BayesElo is ruled out by empirical data.

I was surprised to see that Daniel and Remi knew that for months, and I was late to the party. Look what Daniel ranted to Larry Kaufman:

Daniel Shawul wrote:

You should give citations as to why 2 * 0 = 1 + -1 (davidson) is 'clearly' better than 0 = 1 + -1 (rao-kupper).

They could have opened a thread here showing that BayesElo draw model is wrong. Instead of me doing some stupid tests to open a thread.

Daniel Shawul · Post by **Daniel Shawul** » Wed Jun 25, 2014 5:46 pm

Laskos wrote:
hgm wrote:Indeed. But individual errors in games do not have limited impact. It can very well be that games in general are dominated by the worst error.
I am not here advocating Glenn-David model (which gives 3/2 draws), I don't know if most games are decided by accumulation of small errors obeying Gaussian, or by worst error, ruining the Gaussian. I do not know what is the "plausibility" of Davidson Logistic draw model, but it fits the empirical data, while Rao-Kupper used in BayesElo is ruled out by empirical data.

I was surprised to see that Daniel and Remi knew that for months, and I was late to the party. Look what Daniel ranted to Larry Kaufman:

Daniel Shawul wrote:
You should give citations as to why 2 * 0 = 1 + -1 (davidson) is 'clearly' better than 0 = 1 + -1 (rao-kupper).
They could have opened a thread here showing that BayesElo draw model is wrong. Instead of me doing some stupid tests to open a thread.

Man, some people on that thread had no idea what they were talking about. Infact they thought we were talking about changing the point system, which is why they thought 2D=W+L is so obvious because it gives 0.5 for draws and 2*0.5=0+1, implying a 3D=W+L model would give D=1/3 for draws?? That is why I changed to a -1,0,1 scoring to show them we are talking about the fraction of draws instead!

All of Remi's plots in the beginning of the paper were done with fraction of draws which assumes P(W/delta=0)=P(L/delta=0)=P(D/delta=0) = 1/3. Using that to compute model parameters, Davidson will have its parameter \nu become 1, making the draw fraction for any delta to be exactly P(D/delta) = sqrt(P(W/delta) * P(L/delta)) but it is not exactly a 2D=W+L in other cases! So Ordo doesn't do Davidson for that matter. There is just a lot of misinformation out there.

Anyway for delta=0 (two players of equal strength), who is to say the fraction of draws has to be 33.3%?? Infact no draws could happen , P(d)=0, depending on their style of play in which case Davidson becomes Bradley-Terry model. Or maybe D=20, W=L=40% etc.. We could have easily chosen other params to give some kind of comparison.

Laskos · Post by **Laskos** » Wed Jun 25, 2014 10:14 pm

Daniel Shawul wrote:
Laskos wrote:
hgm wrote:Indeed. But individual errors in games do not have limited impact. It can very well be that games in general are dominated by the worst error.
I am not here advocating Glenn-David model (which gives 3/2 draws), I don't know if most games are decided by accumulation of small errors obeying Gaussian, or by worst error, ruining the Gaussian. I do not know what is the "plausibility" of Davidson Logistic draw model, but it fits the empirical data, while Rao-Kupper used in BayesElo is ruled out by empirical data.

I was surprised to see that Daniel and Remi knew that for months, and I was late to the party. Look what Daniel ranted to Larry Kaufman:

Daniel Shawul wrote:
You should give citations as to why 2 * 0 = 1 + -1 (davidson) is 'clearly' better than 0 = 1 + -1 (rao-kupper).
They could have opened a thread here showing that BayesElo draw model is wrong. Instead of me doing some stupid tests to open a thread.
Man, some people on that thread had no idea what they were talking about. Infact they thought we were talking about changing the point system, which is why they thought 2D=W+L is so obvious because it gives 0.5 for draws and 2*0.5=0+1, implying a 3D=W+L model would give D=1/3 for draws?? That is why I changed to a -1,0,1 scoring to show them we are talking about the fraction of draws instead!

All of Remi's plots in the beginning of the paper were done with fraction of draws which assumes P(W/delta=0)=P(L/delta=0)=P(D/delta=0) = 1/3. Using that to compute model parameters, Davidson will have its parameter \nu become 1, making the draw fraction for any delta to be exactly P(D/delta) = sqrt(P(W/delta) * P(L/delta)) but it is not exactly a 2D=W+L in other cases! So Ordo doesn't do Davidson for that matter. There is just a lot of misinformation out there.

Anyway for delta=0 (two players of equal strength), who is to say the fraction of draws has to be 33.3%?? Infact no draws could happen , P(d)=0, depending on their style of play in which case Davidson becomes Bradley-Terry model. Or maybe D=20, W=L=40% etc.. We could have easily chosen other params to give some kind of comparison.

The natural prior for trinomial is (1,1,1), so w=d=l=1/3. Take the binomial, the likelihood for heads probability a for an (assumed) fair coin after observing 5 heads and 1 tail (totally 6 tries) is C*(1-a)*a^5. The expectation value of this coin is 3/4, and not 5/6. So use (5+1)/(6+2) for binomial. Similarly for trinomial (M+1 / N+3).

I am unable to check that Ordo empirically fits Davidson model, I am bad working with databases (CCRL etc.). You can check Ordo for correctness (by cross-validation, not cross-correlation, which can be misleading), as Davidson model in computer chess seems confirmed by both you and me.

Daniel Shawul · Post by **Daniel Shawul** » Wed Jun 25, 2014 11:53 pm

Laskos wrote:
Daniel Shawul wrote:
Laskos wrote:
hgm wrote:Indeed. But individual errors in games do not have limited impact. It can very well be that games in general are dominated by the worst error.
I am not here advocating Glenn-David model (which gives 3/2 draws), I don't know if most games are decided by accumulation of small errors obeying Gaussian, or by worst error, ruining the Gaussian. I do not know what is the "plausibility" of Davidson Logistic draw model, but it fits the empirical data, while Rao-Kupper used in BayesElo is ruled out by empirical data.

I was surprised to see that Daniel and Remi knew that for months, and I was late to the party. Look what Daniel ranted to Larry Kaufman:

Daniel Shawul wrote:
You should give citations as to why 2 * 0 = 1 + -1 (davidson) is 'clearly' better than 0 = 1 + -1 (rao-kupper).
They could have opened a thread here showing that BayesElo draw model is wrong. Instead of me doing some stupid tests to open a thread.
Man, some people on that thread had no idea what they were talking about. Infact they thought we were talking about changing the point system, which is why they thought 2D=W+L is so obvious because it gives 0.5 for draws and 2*0.5=0+1, implying a 3D=W+L model would give D=1/3 for draws?? That is why I changed to a -1,0,1 scoring to show them we are talking about the fraction of draws instead!

All of Remi's plots in the beginning of the paper were done with fraction of draws which assumes P(W/delta=0)=P(L/delta=0)=P(D/delta=0) = 1/3. Using that to compute model parameters, Davidson will have its parameter \nu become 1, making the draw fraction for any delta to be exactly P(D/delta) = sqrt(P(W/delta) * P(L/delta)) but it is not exactly a 2D=W+L in other cases! So Ordo doesn't do Davidson for that matter. There is just a lot of misinformation out there.

Anyway for delta=0 (two players of equal strength), who is to say the fraction of draws has to be 33.3%?? Infact no draws could happen , P(d)=0, depending on their style of play in which case Davidson becomes Bradley-Terry model. Or maybe D=20, W=L=40% etc.. We could have easily chosen other params to give some kind of comparison.
The natural prior for trinomial is (1,1,1), so w=d=l=1/3. Take the binomial, the likelihood for heads probability a for an (assumed) fair coin after observing 5 heads and 1 tail (totally 6 tries) is C*(1-a)*a^5. The expectation value of this coin is 3/4, and not 5/6. So use (5+1)/(6+2) for binomial. Similarly for trinomial (M+1 / N+3).

I am unable to check that Ordo empirically fits Davidson model, I am bad working with databases (CCRL etc.). You can check Ordo for correctness (by cross-validation, not cross-correlation, which can be misleading), as Davidson model in computer chess seems confirmed by both you and me.

Ooops, i am forgetting about this stuff. The pdfs are normalized so the factors of DV and RK models cancel out when computing pdf, so they are always a 2D=W+L and 1D=W+L model no matter what parameters are chosen. Glenn-David, on the other hand, takes different relations :sometimes 1.5D=W+L, or 1.3D=W+L depending on the parameter choices. I have explained better in the following post http://talkchess.com/forum/viewtopic.ph ... 86&t=49393 Unfortunately, I lost my matlab scripts that I used to produce pdf plots of rating difference for a given number of WDL. It is natural to select p(d)=1/3 but that never happens in real games between two equally strong opponents. The models fit their parameters by looking at the draw ratios between various opponents (not exactly equal strength), so they pick different parameters everytime that they belive will cater well for all players. Some 'advanced' models actually hold a 'drawishness' parameter for _each_ player, but that is probably over doing it.

Daniel Shawul · Post by **Daniel Shawul** » Thu Jun 26, 2014 4:31 pm

I wrote a new script to generate plot data using python.

Code: Select all

import math

#logistic and gaussian elo models
def logist&#40;x&#41;&#58; return 1/&#40;1+math.pow&#40;10,-x/400&#41;)
def gaussian&#40;x&#41;&#58; return &#40;1+math.erf&#40;x/400&#41;)/2

#different draw models where default parameters are set
# in such a way that the draw ratio between equal strength players,
# delta=0,is D=1./3 . To compute parameters for different draw ratio
# and/or delta, use find_params&#40;D,delta&#41; below
def dv&#40;delta,nu=1&#41;&#58;
    d = nu * math.sqrt&#40;logist&#40;delta&#41; * logist&#40;-delta&#41;)
    W = logist&#40;delta&#41; / &#40;1 + d&#41;
    L = logist&#40;-delta&#41; / &#40;1 + d&#41;
    D = 1 - W - L
    return W,L,D

def rk&#40;delta,delta0=120.41199826559244&#41;&#58;
    W = logist&#40;delta - delta0&#41;
    L = logist&#40;-delta - delta0&#41;
    D = 1 - W - L
    return W,L,D

def gd&#40;delta,delta0=121.82807766959428&#41;&#58;
    W = gaussian&#40;delta - delta0&#41;
    L = gaussian&#40;-delta - delta0&#41;
    D = 1 - W - L
    return W,L,D


#compute parameters for a given draw ration D, and
#difference in strength of delta use bisection method
def find_params&#40;D,delta&#41;&#58;
    nu_min = 0
    nu_max = 2
    rk_delta0_min = 0
    rk_delta0_max = 240
    gd_delta0_min = 0
    gd_delta0_max = 240
    for j in range&#40;100&#41;&#58;
        nu = &#40;nu_min + nu_max&#41; / 2
        w,l,d = dv&#40;delta,nu&#41;
        if d < D&#58;
            nu_min = nu
        else&#58;
            nu_max = nu
            
        rk_delta0 = &#40;rk_delta0_min + rk_delta0_max&#41; / 2
        w,l,d = rk&#40;delta,rk_delta0&#41;
        if d < D&#58;
            rk_delta0_min = rk_delta0
        else&#58;
            rk_delta0_max = rk_delta0
            
        gd_delta0 = &#40;gd_delta0_min + gd_delta0_max&#41; / 2
        w,l,d = gd&#40;delta,gd_delta0&#41;
        if d < D&#58;
            gd_delta0_min = gd_delta0
        else&#58;
            gd_delta0_max = gd_delta0
 
    return nu,rk_delta0,gd_delta0
    

# Produce data for plotting probability density functions.
# Normalize it by the subtended area to get pdfs
def compare&#40;D=1./3,delta=0&#41;&#58;
    nu,rk_delta0,gd_delta0 = find_params&#40;D,delta&#41;
    print&#40;'paramters for dv, rk, and gd&#58;','%.6f'%nu,'%.6f'%rk_delta0,'%.6f'%gd_delta0&#41;
    print&#40;'------------------------------------------------------------')
    print&#40;'Delta  DV-W+L    DV-D     DV-2D   RK-W+L    RK-D     RK-2D   GD-W+L    GD-D     GD-2D')
    print&#40;'--------------------------------------------------------------------------------------')
    for delta in range&#40;-1000,1000,100&#41;&#58;
        wd,ld,dd = dv&#40;delta,nu&#41;
        wr,lr,dr = rk&#40;delta,rk_delta0&#41;
        wg,lg,dg = gd&#40;delta,gd_delta0&#41;
        print&#40;'%5.0f'%delta,
              '%.6f'%&#40;wd*ld&#41;,'%.6f'%&#40;dd&#41;,'%.6f'%&#40;dd*dd&#41;,
              '%.6f'%&#40;wr*lr&#41;,'%.6f'%&#40;dr&#41;,'%.6f'%&#40;dr*dr&#41;,
              '%.6f'%&#40;wg*lg&#41;,'%.6f'%&#40;dg&#41;,'%.6f'%&#40;dg*dg&#41;)

Laskos · Post by **Laskos** » Sun Jun 29, 2014 10:56 am

I performed (on my small database of games) chi-square test with fixed a=1 (Rao-Kupper) and a=2 (Davidson), and Davidson came much better. Also the R-squared came much better for Davidson (a=2)

Code: Select all

Rao-Kupper&#58;
a=1
PearsonChiSquareTest&#58; 0.00082
Rsquared&#58; 0.971

Code: Select all

Davidson&#58;
a=2
PearsonChiSquareTest&#58; 0.107
Rsquared&#58; 0.991

All indications are that the normalized Davidson is accepted, that is:
1 win + 1 loss = 2 draws.

Daniel Shawul · Post by **Daniel Shawul** » Sun Jun 29, 2014 9:50 pm

Laskos wrote:I performed (on my small database of games) chi-square test with fixed a=1 (Rao-Kupper) and a=2 (Davidson), and Davidson came much better. Also the R-squared came much better for Davidson (a=2)
Code: Select all
Rao-Kupper&#58;
a=1
PearsonChiSquareTest&#58; 0.00082
Rsquared&#58; 0.971
Code: Select all
Davidson&#58;
a=2
PearsonChiSquareTest&#58; 0.107
Rsquared&#58; 0.991
All indications are that the normalized Davidson is accepted, that is:
1 win + 1 loss = 2 draws.

Note that the deltas are important, as I mentioned in my previous post. You are using P(D)=C*P(W)*P(L) (average for all players) for RK model but the model says P(D/delta)=C*P(W/delta) * P(L/delta) should be used for each pairing. Since you computed average draw/winning ratios for all the players, it is hard to conclude anything about the performance of the models...

Laskos · Post by **Laskos** » Mon Jun 30, 2014 11:05 am

Daniel Shawul wrote:
Laskos wrote:I performed (on my small database of games) chi-square test with fixed a=1 (Rao-Kupper) and a=2 (Davidson), and Davidson came much better. Also the R-squared came much better for Davidson (a=2)
Code: Select all
Rao-Kupper&#58;
a=1
PearsonChiSquareTest&#58; 0.00082
Rsquared&#58; 0.971
Code: Select all
Davidson&#58;
a=2
PearsonChiSquareTest&#58; 0.107
Rsquared&#58; 0.991
All indications are that the normalized Davidson is accepted, that is:
1 win + 1 loss = 2 draws.
Note that the deltas are important, as I mentioned in my previous post. You are using P(D)=C*P(W)*P(L) (average for all players) for RK model but the model says P(D/delta)=C*P(W/delta) * P(L/delta) should be used for each pairing. Since you computed average draw/winning ratios for all the players, it is hard to conclude anything about the performance of the models...

I don't know what you are talking about, again. For Logistic F

Code: Select all

Rao-Kupper&#58;
W&#40;d,d0&#41; = F&#40;d-d0&#41;
L&#40;d,d0&#41; = F&#40;-d-d0&#41;
D&#40;d,d0&#41; = 1-W-L
And the Logistic has the peculiar property&#58;
1-F&#40;d-d0&#41;-F&#40;-d-d0&#41; == C&#40;d0&#41;*F&#40;d-d0&#41;*F&#40;-d-d0&#41;
So&#58;
D&#40;d,d0&#41; == C&#40;d0&#41;*W&#40;d,d0&#41;*L&#40;d,d0&#41; for _every_ &#123;d,d0&#125;.

Fitting for C(d0) I am fitting d0 for _every_ d. Even more, in my chi-square goodness of fit test, there are different C(d0) for each gauntlet. And I obtain a large difference in chi-square test, showing that Davidson fares MUCH better than Rao-Kupper (BayesElo).
Also, indications are that Ordo is compatible with 1 loss + 1 win = 2 draws (I would have to dig for a Miguel test showing that).

Daniel Shawul · Post by **Daniel Shawul** » Mon Jun 30, 2014 2:56 pm

You didn't do regressions on the 3 data sets but took numerical average to get the average a=2. Atleast, your later test followed my suggestion to fix a=2 and a=1, which is better but still doesn't fill up a glaring hole that this is no more than just an excersice in data selection to fit a favourite model. I am not sure that we would see the same difference, if I used half for training and half for prediction. I can even see from your one other plot that it is just 1-data point at the top that is making all the 'difference'. You need to predict results (draw/win ratios) from calculated elos, which is why the deltas are important. Your methodology is seriously flawed.

Please don't call your 'data' data. You have what, like 6 data points, to make your regressions with, and you call that more clinical?? And then they are all correlated? And then you didn't do predictions with your draw model, instead you just computed by how much it fits it. That is completely meaningless.

Bayeselo uses three different draw models now. Ordo didn't know shit about draw models before we even showed that Davidson was the better model for computer games. Then the bandwagoners jumped in how ordo is this or that...spare me.

Laskos · Post by **Laskos** » Tue Jul 01, 2014 9:53 pm

Daniel Shawul wrote:You didn't do regressions on the 3 data sets but took numerical average to get the average a=2. Atleast, your later test followed my suggestion to fix a=2 and a=1, which is better but still doesn't fill up a glaring hole that this is no more than just an excersice in data selection to fit a favourite model. I am not sure that we would see the same difference, if I used half for training and half for prediction. I can even see from your one other plot that it is just 1-data point at the top that is making all the 'difference'. You need to predict results (draw/win ratios) from calculated elos, which is why the deltas are important. Your methodology is seriously flawed.

Please don't call your 'data' data. You have what, like 6 data points, to make your regressions with, and you call that more clinical?? And then they are all correlated? And then you didn't do predictions with your draw model, instead you just computed by how much it fits it. That is completely meaningless.

Bayeselo uses three different draw models now. Ordo didn't know shit about draw models before we even showed that Davidson was the better model for computer games. Then the bandwagoners jumped in how ordo is this or that...spare me.

What you are bragging about? You again seem to not understand that I am not using Elos, LogisticElos, BayesElos, whatever. a=1 and a=2 was my first comparison between models, before your smart "advice" (check the other thread in "tournaments"). I do NOT have to train anything, I am NOT using ratings, do you understand that?
My database is smaller, some 50,000 games for some 50 points to fit. But you are again mistaken that the middle points are doing the difference, I was very careful to include the tails. My Elo span in games between engines is 1,500 Elo points in thousands of games, how many of those in CCRL or CEGT databases? Each my point has 1,000 games, how many games each database point has in CCRL?

Rao-Kupper (BayesElo) is completely ruled out by my tests, as chi-square tests show. Do you know how to interpret those tests? Overfitting again?

Then, until I have revealed that Davidson (1 win + 1 loss = 2 draws) does apply to computer chess draw models, you and Remi stayed silent as rats that BayesElo uses a wrong draw model, although you for months knew of that. I am very happy that BayeseElo from now on will have 3 draw models, and your advice (on which we agree) would be: use Davidson. BayesElo could finally offer a good challenge to Ordo. I will check the new BayesElo, Davidson should offer pretty different results from Rao-Kupper, previously used.

Empirically 1 win + 1 loss ~ 2 draws

Re: Empirically 1 win + 1 loss ~ 2 draws

Re: Empirically 1 win + 1 loss ~ 2 draws

Re: Empirically 1 win + 1 loss ~ 2 draws

Re: Empirically 1 win + 1 loss ~ 2 draws

python code

Re: Empirically 1 win + 1 loss ~ 2 draws

Re: Empirically 1 win + 1 loss ~ 2 draws

Re: Empirically 1 win + 1 loss ~ 2 draws

Re: Empirically 1 win + 1 loss ~ 2 draws

Re: Empirically 1 win + 1 loss ~ 2 draws