Derivation of bayeselo formula

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Derivation of bayeselo formula

Post by Daniel Shawul »

Ok after speeding up the objective function by 1000x times I finally produced a result for Glenn-David model. When likelihood is computed for a player , incremental update is done on the original value to compute it faster. I didn't need to switch to another method with this change (so still CG). Here is the result and it looks similar to what I expected before. But I will check if the variance I used for the GD model is the correct one.
Later.
ImageImage
A summary of results so far. Glenn-david and Rao-Kupper seem to overlap a lot
ImageImage
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Derivation of bayeselo formula

Post by diep »

Rémi Coulom wrote:
diep wrote:What seems very popular nowadays is that all sorts of engines do learning in a rather hard manner. Hard i mean: difficult to turn off.
Hi Vincent,

Bayeselo assumes players of constant strength. Measuring changes in strength caused by learning is much more difficult. It may be possible to adapt WHR to do it:
http://remi.coulom.free.fr/WHR/
But if you want to measure accurately the change in strength caused by a change in your algorithm, it is considerably more efficient to use bayeselo with opponents that don't learn.

Rémi
The current system might work pretty well for human players; humans tend to also learn of course, yet they do not play long matches against another player usually.

In most engines you simply cannot turn off learning. They have learned simply they get higher at elolist if you can't turn it off.

Additionally we must realize that most human players elo also is not so accurate. Not seldom top players win 50 elopoints then drop again.

For engine-engine play we would like to measure very accurate and that accuracy is simply not there in the current system.
Uri Blass
Posts: 10309
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Derivation of bayeselo formula

Post by Uri Blass »

diep wrote:
In most engines you simply cannot turn off learning. They have learned simply they get higher at elolist if you can't turn it off.
Nonsense

1)I believe that most engines do not have learning
2)It is easy to turn off learning for engines that learn.
Simply delete the engine and the files that it generated from your computer and download it again.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Derivation of bayeselo formula

Post by diep »

Uri Blass wrote:
diep wrote:
In most engines you simply cannot turn off learning. They have learned simply they get higher at elolist if you can't turn it off.
Nonsense

1)I believe that most engines do not have learning
2)It is easy to turn off learning for engines that learn.
Simply delete the engine and the files that it generated from your computer and download it again.
Deleting of learnfiles helps indeed. No one is deleting them however.
In Fritz GUI, not sure how they do it in latest GUI's, if you click 'newgame' all learn values get reset and therefore turned on.

Everyone i know makes mistakes there.

"i turned off learning"

BS you can't turn it off if you play more than 1 game.

Most of the matches that i see public statistics about,
they play in Fritz GUI.

You cannot turn off learning there.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Derivation of bayeselo formula

Post by Daniel Shawul »

Even though it is fast now, I would like to speed it up more by using a radial basis function to fit the likelihood function. I did some experiments in the past using a multi-quadric rbf function for a black box optimization. The results were good and it significantly sped up my objective function for the project I worked on. Anyway I don't know if a different rbf such as gaussian would be more appropriate here, but I will try it sometime in the future.
Rémi Coulom
Posts: 438
Joined: Mon Apr 24, 2006 8:06 pm

Re: Derivation of bayeselo formula

Post by Rémi Coulom »

Daniel Shawul wrote:Even though it is fast now, I would like to speed it up more by using a radial basis function to fit the likelihood function. I did some experiments in the past using a multi-quadric rbf function for a black box optimization. The results were good and it significantly sped up my objective function for the project I worked on. Anyway I don't know if a different rbf such as gaussian would be more appropriate here, but I will try it sometime in the future.
I don't expect this could improve performance much.

What may be important, if you are not alreadying doing it, is to perform CG in the space of ratings (log(gamma)) instead of the space of gamma. The posterior distribution should have an approximately Gaussian shape in the space of ratings, so its logarithm is approximately quadratic. Maybe you can you the Hessian computed for the LOSS to do the optimization. But the Hessian does not scale with a huge number of players, so CG might be better.

I hope you can show that Davidson fits the data better than a Gaussian :-)

Rémi
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Derivation of bayeselo formula

Post by Daniel Shawul »

Rémi Coulom wrote:
Daniel Shawul wrote:Even though it is fast now, I would like to speed it up more by using a radial basis function to fit the likelihood function. I did some experiments in the past using a multi-quadric rbf function for a black box optimization. The results were good and it significantly sped up my objective function for the project I worked on. Anyway I don't know if a different rbf such as gaussian would be more appropriate here, but I will try it sometime in the future.
I don't expect this could improve performance much.

What may be important, if you are not alreadying doing it, is to perform CG in the space of ratings (log(gamma)) instead of the space of gamma. The posterior distribution should have an approximately Gaussian shape in the space of ratings, so its logarithm is approximately quadratic. Maybe you can you the Hessian computed for the LOSS to do the optimization. But the Hessian does not scale with a huge number of players, so CG might be better.

I hope you can show that Davidson fits the data better than a Gaussian :-)

Rémi
Ok after some break, I continued working on it. I did a goodness of fit test using pearson's chi squared test. The result shows Davidson is the best fit followed by Glenn David. Both seem to be significantly better than the default bayeselo model i.e Rao Kupper. I did the computation on two pgns , now including cegt blitz with about 1 million games.

Code: Select all

	         CCRL	  CEGT
Davidson	  692		628
GlennDavid	748		777
RaoKupper	1079	  1432
Next I will randomly select half the games for each player to construct a model and then test for goodness of fit again on the rest of the data.

Btw computing the hessian has become so cheap that I am going to try newton raphoson now. It can be incrementally updated very very well since the log-likelihood is a sum. I only do N full blown likelihood evaluations instead of O(N^3), the rest update one or two indexes to update it. And the result for the covariance matrix computed using finite difference approach is very close to the one analytically found even though I used a coarse delta of 1 elo.

Question: I don't quite understand how the A matrix matrix was derived. Can you explain please? I thought the covariance matrix was just inverse of the negative hessian C = -H^-1 but doing that clearly fails most of the time. Later on at the end of the page you also tried to use cholesky but it didn't work for me probably because the matrix is not positive definite. What are the properties of the covariance matrix ?
Last edited by Daniel Shawul on Thu Aug 23, 2012 1:23 pm, edited 1 time in total.
Rémi Coulom
Posts: 438
Joined: Mon Apr 24, 2006 8:06 pm

Re: Derivation of bayeselo formula

Post by Rémi Coulom »

Daniel Shawul wrote:Ok after some break, I continued working on it. I did a goodness of fit test using pearson't chi square. The result shows Davidson is the best fit followed by Glenn David. Both seem to be significantly better than the default bayeselo model i.e Rao Kupper. I did the computation on two pgns , now including cegt blitz with about 1 million games.

Code: Select all

	         CCRL	  CEGT
Davidson	  692		628
GlennDavid	748		777
RaoKupper	1079	  1432
Great !
Next I will randomly select half the games for each player to construct a model and then test for goodness of fit again on the rest of the data.
I recommended half of the games, but a more usual statistical procedure is 10-fold cross-validation.
http://en.wikipedia.org/wiki/Cross-vali ... validation
Anyway, it does not matter much.

I am very glad you did all that work. I had been waiting to find someone to do it for a long time. Please tell me if you are interested in finishing the paper I started.

Rémi
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Derivation of bayeselo formula

Post by Daniel Shawul »

Hi Remi
I am very glad you did all that work. I had been waiting to find someone to do it for a long time. Please tell me if you are interested in finishing the paper I started.
Sure that is what I had in mind. But you should know I am not so much knowledgeable about it so you have to tell me to do this and that :)
cheers
Daniel
Rémi Coulom
Posts: 438
Joined: Mon Apr 24, 2006 8:06 pm

Re: Derivation of bayeselo formula

Post by Rémi Coulom »

Daniel Shawul wrote:Question: I don't quite understand how the A matrix matrix was derived. Can you explain please? I thought the covariance matrix was just inverse of the negative hessian C = -H^-1 but doing that clearly fails most of the time. Later on at the end of the page you also tried to use cholesky but it didn't work for me probably because the matrix is not positive definite. What are the properties of the covariance matrix ?
I had not noticed that question. Because the ratings can be offset by a constant without changing the log-likelihood, the Hessian is singular. Its eigenvalue for direction (1, 1, 1, ..., 1) is zero, which would mean an infinite variance. So, what I do is fix the rating of one player, and get the covariance of all the others.

By the way, that is in part because I don't use a really proper prior. Bayeselo should be modified to use fixed-rating virtual opponent against which the others get virtual draws.

For the details of preparing the paper, I will contact you now with the email address you use for github.

Rémi