Houdini 2.0 running for the IPON

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Houdini 2.0 running for the IPON

Post by Laskos »

ernest wrote:Hi Kai,

I figure maybe you have an answer to this question, pertaining to Elo calculation:

If you look at Ingo's result
http://forum.computerschach.de/cgi-bin/ ... 1#pid41321
can you explain why Houdini's calculated Elo (3016, resulting from Elostat or Bayeselo) differs so much from the average of the individual matches Elo (the so called Perfs, at right), which I calculated to be 3045 ?
I myself do not entirely understand. The difference from an average would arise from the fact that a result 95:5 has larger errors than 55:45, therefore 95:5 will be weighted less in the Elostat or Bayeselo algorithm (does Ingo use a general offset Elo or a predefined Elo for each engine in Bayeselo?). Still, seems to me a little strange, maybe just an impression.

Kai

EDIT: I just saw Ingo's post regarding the calculation (general opposition and performance), if that is the way to calculate, then it seems extremely superficial. Better take the average, or even better, weight by hand (weights are inverse squares of errors).
Last edited by Laskos on Sun Sep 04, 2011 11:44 am, edited 1 time in total.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Houdini 2.0 running for the IPON

Post by IWB »

Hi
Laskos wrote:
ernest wrote:Hi Kai,

I figure maybe you have an answer to this question, pertaining to Elo calculation:

If you look at Ingo's result
http://forum.computerschach.de/cgi-bin/ ... 1#pid41321
can you explain why Houdini's calculated Elo (3016, resulting from Elostat or Bayeselo) differs so much from the average of the individual matches Elo (the so called Perfs, at right), which I calculated to be 3045 ?
I myself do not entirely understand. The difference from an average would arise from the fact that a result 95:5 has larger errors than 55:45, therefore 95:5 will be weighted less in the iterative Elostat or Bayeselo algorithm (does Ingo use a general offset Elo or a predefined Elo for each engine in Bayeselo?). Still, seems to me a little strange, maybe just an impression.

Kai
Predefined for every engine but the overall Elo is calculated by the GUI out of the average of all (predefined) engine Elo.

The overall result matches perfectly with Elostat and only slightly differs for Bayes (The GUI is using Elo formula inside)

Bye
Ingo
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Houdini 2.0 running for the IPON

Post by Laskos »

IWB wrote:Hi
Laskos wrote:
ernest wrote:Hi Kai,

I figure maybe you have an answer to this question, pertaining to Elo calculation:

If you look at Ingo's result
http://forum.computerschach.de/cgi-bin/ ... 1#pid41321
can you explain why Houdini's calculated Elo (3016, resulting from Elostat or Bayeselo) differs so much from the average of the individual matches Elo (the so called Perfs, at right), which I calculated to be 3045 ?
I myself do not entirely understand. The difference from an average would arise from the fact that a result 95:5 has larger errors than 55:45, therefore 95:5 will be weighted less in the iterative Elostat or Bayeselo algorithm (does Ingo use a general offset Elo or a predefined Elo for each engine in Bayeselo?). Still, seems to me a little strange, maybe just an impression.

Kai
Predefined for every engine but the overall Elo is calculated by the GUI out of the average of all (predefined) engine Elo.

The overall result matches perfectly with Elostat and only slightly differs for Bayes (The GUI is using Elo formula inside)

Bye
Ingo
Thanks, doesn't at least Elostat use some iterative routines to get the rating? Seems odd to take the average opposition and the average result to calculate the rating. Even I could provide a _much_ better algorithm.

Kai
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Houdini 2.0 running for the IPON

Post by IWB »

Laskos wrote: ...

Thanks, doesn't at least Elostat use some iterative routines to get the rating? Seems odd to take the average opposition and the average result to calculate the rating. Even I could provide a _much_ better algorithm.

Kai
No, take the average Elo of my test set (2768 Elo) and put that into the Elo formula with 80.67% and you get axactly 3016 Elo as Elostat itself when calculating with the games! Even Bayeselo, which seems to be more sophisticated and usually differs a few elo, gets exactly 3016 Elo when calculating with the real games this time.

Bye
Ingo
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Houdini 2.0 running for the IPON

Post by Don »

IWB wrote:
Laskos wrote: ...

Thanks, doesn't at least Elostat use some iterative routines to get the rating? Seems odd to take the average opposition and the average result to calculate the rating. Even I could provide a _much_ better algorithm.

Kai
No, take the average Elo of my test set (2768 Elo) and put that into the Elo formula with 80.67% and you get axactly 3016 Elo as Elostat itself when calculating with the games! Even Bayeselo, which seems to be more sophisticated and usually differs a few elo, gets exactly 3016 Elo when calculating with the real games this time.

Bye
Ingo
If you use Bayeselo on ALL the games of all the players and set Shredder to 2800 do you get ratings that are close to what you see now?
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Houdini 2.0 running for the IPON

Post by IWB »

Don wrote:
If you use Bayeselo on ALL the games of all the players and set Shredder to 2800 do you get ratings that are close to what you see now?
Yes! The "wrong" games I played yesterday I put into the full set of PGN games and made a BayesElo calculation. The result was 100% the same as the 3016 which are calculated automaticaly in the Classic GUI.
Of course this was just a coincident as usually the Bayeselo calculation differ slightly (depending on the draw rate). But this 'slightly' is just somewhere up to a max of 7 Elo.

Bye
Ingo
lkaufman
Posts: 6215
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Houdini 2.0 running for the IPON

Post by lkaufman »

IWB wrote:Hello Uri,

I know that your mathematical skills are far beyond my capabilities but I knoww as well that the calculation done by the Shredder-Classic-GUI is allways right for Elostat and differs just slightly for BayesElo. In the current discussion it is even right for both. When I take the games and throw them into Bayes or Elostat I get 3016.

If this is "right" or not is an intersting discussion, but it is the way Elos are calculated. Nonetheless I still hope for something better which will be accepted ... ! :-)

Bye
Ingo
Maybe I can clear things up a bit. I know something about ratings as I was a long-time chairman of the USCF ratings committee.
Uri is right that it is mathematically wrong to base anything on the average rating of the opponents. This is in fact what Elostat does, and I think this was a major reason for the creation of BayesElo. Using the Elostat averaging causes all the ratings to "contract" towards their average value, with the percentage contraction depending on the spread of the ratings of the players. This is what we observe in the present example.
I believe BayesElo handles this issue properly. However due to the use of a "prior" assumed result and perhaps also to the special treatment of draws, their ratings also contract towards the mean for entirely different reasons. It just so happens that given the spread of ratings of your field, the draw percentage, and the size of the sample the two methods produce very similar ratings. If you played a million games, or if you included engines a thousand points weaker than most, the two methods might not be close at all.
I think the fact that you have all the engines play the same field makes the BayesElo calculations fair. The resultant ratings are somewhat "contracted" from what they might be if bayeselo did things like elostat but without the use of th average of the field, but I regard that as a good thing since it's clear that computer vs. computer ratings overstate rating differences anyway. The contraction is fair to all, it doesn't distort the rankings. I think the use of bayeselo causes more problems for those testing organizations that have widely varying sample sizes and opposition strength for different engines, but elostat is worse.
Bottom line: don't change anything!

Larry
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Houdini 2.0 running for the IPON

Post by michiguel »

lkaufman wrote:
IWB wrote:Hello Uri,

I know that your mathematical skills are far beyond my capabilities but I knoww as well that the calculation done by the Shredder-Classic-GUI is allways right for Elostat and differs just slightly for BayesElo. In the current discussion it is even right for both. When I take the games and throw them into Bayes or Elostat I get 3016.

If this is "right" or not is an intersting discussion, but it is the way Elos are calculated. Nonetheless I still hope for something better which will be accepted ... ! :-)

Bye
Ingo
Maybe I can clear things up a bit. I know something about ratings as I was a long-time chairman of the USCF ratings committee.
Uri is right that it is mathematically wrong to base anything on the average rating of the opponents. This is in fact what Elostat does, and I think this was a major reason for the creation of BayesElo. Using the Elostat averaging causes all the ratings to "contract" towards their average value, with the percentage contraction depending on the spread of the ratings of the players. This is what we observe in the present example.
I believe BayesElo handles this issue properly. However due to the use of a "prior" assumed result and perhaps also to the special treatment of draws, their ratings also contract towards the mean for entirely different reasons. It just so happens that given the spread of ratings of your field, the draw percentage, and the size of the sample the two methods produce very similar ratings. If you played a million games, or if you included engines a thousand points weaker than most, the two methods might not be close at all.
I think the fact that you have all the engines play the same field makes the BayesElo calculations fair. The resultant ratings are somewhat "contracted" from what they might be if bayeselo did things like elostat but without the use of th average of the field, but I regard that as a good thing since it's clear that computer vs. computer ratings overstate rating differences anyway. The contraction is fair to all, it doesn't distort the rankings. I think the use of bayeselo causes more problems for those testing organizations that have widely varying sample sizes and opposition strength for different engines, but elostat is worse.
Bottom line: don't change anything!

Larry
Is the pgn of all the games available? I could not see it in the website.
I can run my rating program (which does a global, iterative analysis) to see what we get.

Miguel
ernest
Posts: 2045
Joined: Wed Mar 08, 2006 8:30 pm

Re: Houdini 2.0 running for the IPON

Post by ernest »

lkaufman wrote:Using the Elostat averaging causes all the ratings to "contract" towards their average value, with the percentage contraction depending on the spread of the ratings of the players. This is what we observe in the present example.
I believe BayesElo handles this issue properly. However due to the use of a "prior" assumed result and perhaps also to the special treatment of draws, their ratings also contract towards the mean for entirely different reasons.
Hi Larry,

I understand very well what you say.
But can this "contraction" go as far as to explain the difference between 3016 and 3045, which is what I asked in
http://www.talkchess.com/forum/viewtopi ... 196#422196
that is:
If you look at Ingo's result
http://forum.computerschach.de/cgi-bin/ ... 1#pid41321
can you explain why Houdini's calculated Elo (3016, resulting from Elostat or Bayeselo) differs so much from the average of the individual matches Elo (the so called Perfs, at right), which I calculated to be 3045 ?
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Houdini 2.0 running for the IPON

Post by Don »

michiguel wrote:
lkaufman wrote:
IWB wrote:Hello Uri,

I know that your mathematical skills are far beyond my capabilities but I knoww as well that the calculation done by the Shredder-Classic-GUI is allways right for Elostat and differs just slightly for BayesElo. In the current discussion it is even right for both. When I take the games and throw them into Bayes or Elostat I get 3016.

If this is "right" or not is an intersting discussion, but it is the way Elos are calculated. Nonetheless I still hope for something better which will be accepted ... ! :-)

Bye
Ingo
Maybe I can clear things up a bit. I know something about ratings as I was a long-time chairman of the USCF ratings committee.
Uri is right that it is mathematically wrong to base anything on the average rating of the opponents. This is in fact what Elostat does, and I think this was a major reason for the creation of BayesElo. Using the Elostat averaging causes all the ratings to "contract" towards their average value, with the percentage contraction depending on the spread of the ratings of the players. This is what we observe in the present example.
I believe BayesElo handles this issue properly. However due to the use of a "prior" assumed result and perhaps also to the special treatment of draws, their ratings also contract towards the mean for entirely different reasons. It just so happens that given the spread of ratings of your field, the draw percentage, and the size of the sample the two methods produce very similar ratings. If you played a million games, or if you included engines a thousand points weaker than most, the two methods might not be close at all.
I think the fact that you have all the engines play the same field makes the BayesElo calculations fair. The resultant ratings are somewhat "contracted" from what they might be if bayeselo did things like elostat but without the use of th average of the field, but I regard that as a good thing since it's clear that computer vs. computer ratings overstate rating differences anyway. The contraction is fair to all, it doesn't distort the rankings. I think the use of bayeselo causes more problems for those testing organizations that have widely varying sample sizes and opposition strength for different engines, but elostat is worse.
Bottom line: don't change anything!

Larry
Is the pgn of all the games available? I could not see it in the website.
I can run my rating program (which does a global, iterative analysis) to see what we get.

Miguel
Ingo does not make the games available which is his right. I assume that a big part of the reason for this is that he only uses a small number of openings and does not want players tuning for that, but this is just a guess.

If you look on the site you will see that he makes a clear statement that the games are NOT available for downloading.