Advantage for White; Bayeselo (to Rémi Coulom)

Rémi Coulom · Post by **Rémi Coulom** » Sun Mar 04, 2012 6:32 pm

hgm wrote:In that light, what do you think of the fact that the data points seem to stipulate a steeper-rising curve than the green logistic on which they are supposed to be based? Is this an indication that BayesElo's default approach is not the optimal way to extract the ratings?

I am not really sure, but the steeper-rising curve might be an effect of the prior.

Rémi

Adam Hair · Post by **Adam Hair** » Sun Jul 01, 2012 1:00 am

hgm wrote:It is interesting that the Gaussian seems to give a better fit, despite the fact that the ratings were derived using the logistic. (Now hope the logistic doesn't give a better fit on ratings derived with the Gaussian...) It could be that for this large data set based mostly on low delta-Elo data the obtained ratings are not very sensitive to the model used.

What worries me is that the empirical data seems steeper than the curve of the model from which they were derived. That means that Elo differences you have to put into the logistic formula to get the true score percentage are larger than those spit out by BayesElo. In other words, BayesElo systematically underestimates rating differences, compressing the rating scale.

I wonder if this is an artifact caused by the prior. Do you calculate the ratings for this data set yourself? If so, could you recalculate them using a smaller prior (e.g. 0.1 in stead of the standard 2.0)?

I think that the compression is due more to the use of the default eloAdvantage and eloDraw values.

For the CCRL 40/40 database that Edmund was studying, the Elo ratings were computed using the default values (prior=2; eloAdvantage=32.8; eloDraw=97.3). The spread (Elo max - Elo min) of the Elo values is 1392. If, instead of prior=2, prior is set to 0.1, then the spread is 1400. A 0.57% difference. However, if the advantage and eloDraw are computed from the database (eloAdvantage=32.967 (not much difference); eloDraw=137.113 (big difference), then the spread with prior=2 becomes 1498, which is a 7.6% difference. Making prior=0.1 also makes the spread 1505 (8.1%).

Adam Hair · Post by **Adam Hair** » Sun Jul 01, 2012 5:48 am

Edmund wrote:Thanks to hgm and lucas for the suggestions.

Below you find updated graphs representing the same data.
1) bin-size is 4 elo-points
2) minimum bin-size is 4 samples
3) in the elo-delta graph I shifted all models by 20 elo points to compensate for the white to move advantage
4) I added the cdf of the normal distribution with sd=250
5) I added the function hgm suggested to estimate draws scaling by 40/25

Agreed, the gauss function is a better fit than either the linear function or the logistic function.
Looking at the new graphs I am not so sure about hgms suggestion regarding the progression of the avg-elo score function. You are right that the next step is to take elo-delta into the equation.

I have replicated Edmond's methods with the CCRL 40/4 database. I too used 4 Elo bins. I do have 4 bins with less than 4 samples, but I do have 294 bins with the mean number of samples of 2946 and only 9 samples total with less than 10 samples. With the weighted regression, I judge that these 4 bins have little effect. I shifted the models by 26.973 (which is the computed eloAdvantage). I also have compared the logistic model with a Gaussian cdf, though I have been forced to use an approximation. The site I am using to perform the regressions does not recognize erf(). However, the approximation is quite good (1/(1 + exp(-0.07056*(X**3)-1.5976*X))).

This is White Score versus Average Elo:

The regression equation is White Score = 46.27 + 0.00285 Average Elo

This is Draw Ratio versus Average Elo:

The regression equation is Draw Ratio = -17.23 + 0.0179 Average Elo . I checked to see if the outliers exerted much leverage on the regression line by trimming the points outside the interval (1900, 3100). The confidence interval for the slope of the regression line for the trimmed data includes 0.0179. Therefore, given the weighted data, I believe that the draw ratio regression line is not affected much by the outliers.

Here is Draw Ratio vs Elo Delta (Elo Diff):

Finally, here is White Score vs Elo Delta:

The logistic equation is 100/(1+10^((x+26.973)/~380)).

This is the data with the Gaussian model:

sd=278.18. The approximation used was 100/(1 + exp((-0.07056*(((X+26.973)/278.18)**3)-1.5976*((X+26.973)/278.18))))

The two equations model the the data equally well. The logistic model is compressed ~5%. I believe this is related to the fact that the eloDraw computed from the data is 102.647, which is slightly higher than the default value.

I believe my next step is to recompute the Elo ratings for the 40/40 database and examine the cause for its compression. It is compressed noticeably more than the 40/4 ratings. Any results will be more definitive with this database. As I said in the previous post, I believe the cause is the use of the default eloDraw value (the default eloAdvantage causes the shift). It will take me a day or two to do this. There is one part of the data extraction that I have to do by hand, and it took several hours to do this for the 40/4 database.

mcostalba · Post by **mcostalba** » Sun Jul 01, 2012 9:17 am

hgm wrote: A second point is that the draw probability vs Elo-difference looks like a parabola, i.e. it seems indeed proportional to the product score(deltaE)*(1-score(deltaE)). This indeed justifies the analysis of BayesElo, where a single draw is equivalent to one win + one loss (i.e. counts as 2 games).

I am not an expert but it seems to me there is an asymmetry in this graph.

It is like when white is stronger the draw ratio quickly decrease, while instead if black is stronger the draw ratio keeps around the maximum for longer.

It is like a weaker player, if playing with white pieces is able to keep the draw better than the opposite case.

IOW draw probability vs Elo-difference is correlated to stronger player's color.

Adam Hair · Post by **Adam Hair** » Sun Jul 01, 2012 10:47 pm

The following is a graph of the CCRL 40/40 data, prepared exactly how Edmund did. However, I have recomputed the ratings to take into account the eloAdvantage and eloDraw as computed from these games. I left prior equal to 2.

The equation for the logistic model seen in the graph is:

White Score = 100/(1+10^(-1.028*(X+32.976)/400))

where eloAdvantage = 32.976. Also, R²=0.972 for this model.

I guess if the prior was adjusted down, the 2.8% compression might completely go away. However, I believe that this shows the discrepancy between the the CCRL Elo ratings and the Bayeselo model is due to the fact that the default values for eloAdvantage and eloDraw are used to compute the ratings. If the computed values are used (which act like location and scale parameters for White Score), the resulting logistics model matches the data quite well (as it should).

Edmund · Post by **Edmund** » Sun Jul 01, 2012 11:55 pm

Interesting follow-up, Adam. Thanks for sharing.

I never played around much with the settings of bayeselo, so I wasn't aware what could be achieved by chainging the parameters and recaluclating the elo values.

You also showed that eloadvantage and elodraw are correlated with absolute elo, so the whole model could benefit from making these parameters dynamic.

Is CCRL planning to change its model to use your new adjustment parameters?

Adam Hair · Post by **Adam Hair** » Mon Jul 02, 2012 2:37 am

Edmund wrote:Interesting follow-up, Adam. Thanks for sharing.

I never played around much with the settings of bayeselo, so I wasn't aware what could be achieved by chainging the parameters and recaluclating the elo values.

You also showed that eloadvantage and elodraw are correlated with absolute elo, so the whole model could benefit from making these parameters dynamic.

Thanks, Edmund. I just happened to remember that changing the prior did not have a large effect on the CCRL ratings, but the eloDraw value did. The discussion you and HGM and Rémi had made the connection for me.

I do agree with you that making the parameters dynamic would improve the model.

Edmund wrote:Is CCRL planning to change its model to use your new adjustment parameters?

I have not brought this up with the group yet, but I plan to.

Adam Hair · Post by **Adam Hair** » Mon Jul 02, 2012 3:25 pm

One last thing to add. As I stated above, using 'mm 1 1' removes most of the compression from the CCRL ratings. I speculated that lowering the prior could remove the rest of the compression, given that prior=0.1 causes the ratings to spread a bit more. However, lowering the prior further has no effect on the spread of the CCRL 40/40 ratings. Even if prior=0, the ratings remain the same as if prior=0.1.

However, there is one thing that I forgot to report that may finish the question about compression. The logistic model I actually found for the CCRL 40/40 ratings with prior=2 and the computed values for eloAdvantage and eloDraw was :

100/(1+10^(-A*(X+32.976)/400))

where A = 1.028 (approximate)

The 95% confidence interval for A was (0.992, 1.067).

Setting the prior equal to 0.1 (or even 0) would increase our confidence that A = 1, but it already looks fairly likely that it should be 1.

Adam Hair · Post by **Adam Hair** » Mon Jul 02, 2012 3:27 pm

Adam Hair wrote:
Edmund wrote:Interesting follow-up, Adam. Thanks for sharing.

I never played around much with the settings of bayeselo, so I wasn't aware what could be achieved by chainging the parameters and recaluclating the elo values.

You also showed that eloadvantage and elodraw are correlated with absolute elo, so the whole model could benefit from making these parameters dynamic.
Thanks, Edmund. I just happened to remember that changing the prior did not have a large effect on the CCRL ratings, but the eloDraw value did. The discussion you and HGM and Rémi had made the connection for me.

I do agree with you that making the parameters dynamic would improve the model.

Edmund wrote:Is CCRL planning to change its model to use your new adjustment parameters?
I have not brought this up with the group yet, but I plan to.

We will be making the change.

Daniel Shawul · Post by **Daniel Shawul** » Wed Sep 12, 2012 2:18 am

In my tests I found a significant correlation with the average Elo of the two players.
I am getting the best fit with:
EloDraw = avg * 0.096 -135
EloAdvantage = avg * 0.0108 -2.4

I implemented this so that thetaD and thetaW vary linearly with average strength of players. While I get some improvement on some pgn files in most of the cases there is a lot of correlation between the parameters.

Code: Select all

thetaD = thetaDSlope *  &#40;gammaP + gammaO&#41; / 2 + thetaDY
thetaW = thetaWSlope * &#40;gammaP + gammaO&#41; / 2 + thetaWY

So thetaDSlope and thetaD tend to increase/decrease by the same amount hinting that one parameter may suffice. Thus I have huge problems making the method converge on bigger files. When it converges , I get better results with maximizing the likelihood. The relative magnitude of gammas with respect to thetaD (elos) has significant effect. If I form the the gamms by dividing with 4000 instead of 400 to make it more stable, the slope and y-intercept parameters will vary by the same amount from iteration to iteration but still with the same end result of thetaD.
Just letting you know of an attempt before I give up.

Advantage for White; Bayeselo (to Rémi Coulom)

Re: Advantage for White; Bayeselo (to Rémi Coulom)

Re: Advantage for White; Bayeselo (to Rémi Coulom)

Re: Advantage for White; Bayeselo (to Rémi Coulom)

Re: Advantage for White; Bayeselo (to Rémi Coulom)

Re: Advantage for White; Bayeselo (to Rémi Coulom)

Re: Advantage for White; Bayeselo (to Rémi Coulom)

Re: Advantage for White; Bayeselo (to Rémi Coulom)

Re: Advantage for White; Bayeselo (to Rémi Coulom)

Re: Advantage for White; Bayeselo (to Rémi Coulom)

Re: Advantage for White; Bayeselo (to Rémi Coulom)