Elostat Question

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Michael Sherwin
Posts: 3040
Joined: Fri May 26, 2006 1:00 am
Location: WY, USA
Full name: Michael Sherwin

Elostat Question

Post by Michael Sherwin » Fri Mar 30, 2018 5:37 pm

Code: Select all

  1 RomiChess                      : 2481  132 126    26    69.2 %   2340   23.1 %
  2 Horizon_4_4                    : 2444  316 342     6    41.7 %   2503   16.7 %
  3 Bitfoot-1.0.65acfcb-win64      : 2432  279 300     5    40.0 %   2503   40.0 %
  4 Yace                           : 2414  415 497     4    37.5 %   2503   25.0 %
  5 OliThink532_x64                : 2382  266 295     6    33.3 %   2503   33.3 %
  6 Tcb0052                        : 1903    0   0     5     0.0 %   2503    0.0 %
The average CCRL rating of the test engines is 2410. In this sample it says that the average opponent is 2340.

Would it be correct to add 70 points to the average and therefore also add 70 points to the Romichess performance? Thanks.
I hate if statements. Pawns demand if statements. Therefore I hate pawns.

MikeB
Posts: 3269
Joined: Thu Mar 09, 2006 5:34 am
Location: Pen Argyl, Pennsylvania

Re: Elostat Question

Post by MikeB » Sat Mar 31, 2018 2:42 am

Michael Sherwin wrote:

Code: Select all

  1 RomiChess                      : 2481  132 126    26    69.2 %   2340   23.1 %
  2 Horizon_4_4                    : 2444  316 342     6    41.7 %   2503   16.7 %
  3 Bitfoot-1.0.65acfcb-win64      : 2432  279 300     5    40.0 %   2503   40.0 %
  4 Yace                           : 2414  415 497     4    37.5 %   2503   25.0 %
  5 OliThink532_x64                : 2382  266 295     6    33.3 %   2503   33.3 %
  6 Tcb0052                        : 1903    0   0     5     0.0 %   2503    0.0 %
The average CCRL rating of the test engines is 2410. In this sample it says that the average opponent is 2340.

Would it be correct to add 70 points to the average and therefore also add 70 points to the Romichess performance? Thanks.
Not at all since you are not presenting the all of the participants, :Look at your won/lost record in total - wins do not equal losses. Look at the Elo of the "others" - they played against opponents with an Elo rating of 2503, but RomiChess rating is 2481. It is an incomplete data set. Basically , it does not add up.

Dann Corbit
Posts: 9894
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Elostat Question

Post by Dann Corbit » Sat Mar 31, 2018 2:56 am

There is an important difference between a TPR Elo and an Elo.
https://www.chess.com/forum/view/genera ... atings-tpr


Also, you have some very blunt rulers there. The windows are 600 Elo wide or some even worse.

Like building a desk with a ruler that has 17 inches snapped off, and worrying if you should add one millimeter to a dimension.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

Michael Sherwin
Posts: 3040
Joined: Fri May 26, 2006 1:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Elostat Question

Post by Michael Sherwin » Sat Mar 31, 2018 3:21 am

MikeB wrote:
Michael Sherwin wrote:

Code: Select all

  1 RomiChess                      : 2481  132 126    26    69.2 %   2340   23.1 %
  2 Horizon_4_4                    : 2444  316 342     6    41.7 %   2503   16.7 %
  3 Bitfoot-1.0.65acfcb-win64      : 2432  279 300     5    40.0 %   2503   40.0 %
  4 Yace                           : 2414  415 497     4    37.5 %   2503   25.0 %
  5 OliThink532_x64                : 2382  266 295     6    33.3 %   2503   33.3 %
  6 Tcb0052                        : 1903    0   0     5     0.0 %   2503    0.0 %
The average CCRL rating of the test engines is 2410. In this sample it says that the average opponent is 2340.

Would it be correct to add 70 points to the average and therefore also add 70 points to the Romichess performance? Thanks.
Not at all since you are not presenting the all of the participants, :Look at your won/lost record in total - wins do not equal losses. Look at the Elo of the "others" - they played against opponents with an Elo rating of 2503, but RomiChess rating is 2481. It is an incomplete data set. Basically , it does not add up.
Don't blame me, lol. Blame Arena! However all the games are presented as 6+5+4+6+5 = 26.

Maybe Tcb being whitewashed is causing a problem. Let's look at the latest results.

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws
  1 RomiChess                      : 2466   42  42   226    68.1 %   2334   23.0 %
  2 Tcb0052                        : 2380   93  95    45    37.8 %   2467   22.2 %
  3 Yace                           : 2370   84  86    44    36.4 %   2467   36.4 %
  4 Bitfoot-1.0.65acfcb-win64      : 2320  100 105    45    30.0 %   2467   15.6 %
  5 OliThink532_x64                : 2314   93  97    46    29.3 %   2467   23.9 %
  6 Horizon_4_4                    : 2286  101 106    46    26.1 %   2467   17.4 %
Edit: Okay I just noticed that Yace seems to be missing a result. I have no idea why that would be.
Last edited by Michael Sherwin on Sat Mar 31, 2018 3:34 am, edited 1 time in total.
I hate if statements. Pawns demand if statements. Therefore I hate pawns.

Michael Sherwin
Posts: 3040
Joined: Fri May 26, 2006 1:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Elostat Question

Post by Michael Sherwin » Sat Mar 31, 2018 3:30 am

Dann Corbit wrote:There is an important difference between a TPR Elo and an Elo.
https://www.chess.com/forum/view/genera ... atings-tpr


Also, you have some very blunt rulers there. The windows are 600 Elo wide or some even worse.

Like building a desk with a ruler that has 17 inches snapped off, and worrying if you should add one millimeter to a dimension.
Well it was early in the test and there were hundreds of games left to play so the margins were wide.

So after 500 games a tpr is not an accurate representation?

Should I have the test engines play a round robin against each other?

But all I want to do is compare test with a ballpark idea of Romi's true elo based on the CCRL ratings?
I hate if statements. Pawns demand if statements. Therefore I hate pawns.

Michael Sherwin
Posts: 3040
Joined: Fri May 26, 2006 1:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Elostat Question

Post by Michael Sherwin » Sat Mar 31, 2018 3:49 am

Michael Sherwin wrote: Edit: Okay I just noticed that Yace seems to be missing a result. I have no idea why that would be.
Solved. I had a false start and forgot to delete those games from the current games. This is correct.

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 RomiChess                      : 2466   42  41   231    68.0 %   2335   22.5 %
  2 Tcb0052                        : 2390   92  93    46    39.1 %   2467   21.7 %
  3 Yace                           : 2374   83  85    46    37.0 %   2467   34.8 %
  4 Bitfoot-1.0.65acfcb-win64      : 2318   97 102    47    29.8 %   2467   17.0 %
  5 OliThink532_x64                : 2305   96 100    46    28.3 %   2467   21.7 %
  6 Horizon_4_4                    : 2286  101 106    46    26.1 %   2467   17.4 %
But my original question remains, modified. Why can't I adjust the ratings of the test engines to reflect their average rating in CCRL and then add that amount to Romi's rating?
I hate if statements. Pawns demand if statements. Therefore I hate pawns.

Dann Corbit
Posts: 9894
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Elostat Question

Post by Dann Corbit » Sat Mar 31, 2018 6:44 am

Michael Sherwin wrote:
Dann Corbit wrote:There is an important difference between a TPR Elo and an Elo.
https://www.chess.com/forum/view/genera ... atings-tpr


Also, you have some very blunt rulers there. The windows are 600 Elo wide or some even worse.

Like building a desk with a ruler that has 17 inches snapped off, and worrying if you should add one millimeter to a dimension.
Well it was early in the test and there were hundreds of games left to play so the margins were wide.

So after 500 games a tpr is not an accurate representation?

Should I have the test engines play a round robin against each other?

But all I want to do is compare test with a ballpark idea of Romi's true elo based on the CCRL ratings?
It takes an absurd number of games to get an accurate answer. That is why testing frameworks play games at frightening velocity.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

Sven
Posts: 3819
Joined: Thu May 15, 2008 7:57 pm
Location: Berlin, Germany
Full name: Sven Schüle
Contact:

Re: Elostat Question

Post by Sven » Sat Mar 31, 2018 8:36 am

Michael Sherwin wrote:
Michael Sherwin wrote: Edit: Okay I just noticed that Yace seems to be missing a result. I have no idea why that would be.
Solved. I had a false start and forgot to delete those games from the current games. This is correct.

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 RomiChess                      : 2466   42  41   231    68.0 %   2335   22.5 %
  2 Tcb0052                        : 2390   92  93    46    39.1 %   2467   21.7 %
  3 Yace                           : 2374   83  85    46    37.0 %   2467   34.8 %
  4 Bitfoot-1.0.65acfcb-win64      : 2318   97 102    47    29.8 %   2467   17.0 %
  5 OliThink532_x64                : 2305   96 100    46    28.3 %   2467   21.7 %
  6 Horizon_4_4                    : 2286  101 106    46    26.1 %   2467   17.4 %
But my original question remains, modified. Why can't I adjust the ratings of the test engines to reflect their average rating in CCRL and then add that amount to Romi's rating?
Hi Mike, of course you can always scale up or down the ratings of your test engines. Ratings calculated by EloStat or any of the more modern (and better) programs BayesElo and Ordo are always relative to the pool of engines you deal with. You can add +3000 to all engines, and still you will get the same relative result, in this case Romi performing around +130 better than its average opponent.

The main point, however, is that the error margins for such a small number of games are way too high to derive *anything* from it. All that you can say is that Romi performs 130 +/- 100 better than the others, so it might as well be +30 or +230.

Regarding the question of TPR, I don't think this applies here at all since engine ratings are not calculated incrementally, like for human players, but always from scratch from the complete set of games of all players in the pool.

Michael Sherwin
Posts: 3040
Joined: Fri May 26, 2006 1:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Elostat Question

Post by Michael Sherwin » Sat Mar 31, 2018 9:51 am

Thanks Sven, That helped a lot. :) All I was looking for is an indication, a ballpark value, and I think that you confirmed that I can adjust to the CCRL scale.
I hate if statements. Pawns demand if statements. Therefore I hate pawns.

Michael Sherwin
Posts: 3040
Joined: Fri May 26, 2006 1:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Elostat Question

Post by Michael Sherwin » Sat Mar 31, 2018 6:45 pm

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 RomiChess                      : 2446   27  27   500    62.9 %   2354   23.8 %
  2 Yace                           : 2415   60  60   100    45.5 %   2447   25.0 %
  3 Tcb0052                        : 2365   60  61   100    38.5 %   2447   25.0 %
  4 Horizon_4_4                    : 2354   61  62   100    37.0 %   2447   24.0 %
  5 OliThink532_x64                : 2343   61  62   100    35.5 %   2447   25.0 %
  6 Bitfoot-1.0.65acfcb-win64      : 2291   65  67   100    29.0 %   2447   20.0 %
So this test is over. I can add 56 elo to the average Opponent to scale them to CCRL. That gives RomiChess an estimated CCRL rating of 2502. And even the error margin's worst still means an improvement for Romi. This was the second test and the first test was a bit better at 2510 however the second test results were far more consistent across all test engines and is an improvement over all engines.

And Sven, Romi is still a little miffed at Jumbo for leaving her behind in D7 so she cornered Jumbo 0.6.10-bb 64 into a I'm going to show you match. :lol: Of course she is challenging only one Jumbo and not the entire heard. How does Jumbo handle 10 sec with a 1 sec increment? Currently Romi 14W 9L 8D :D
I hate if statements. Pawns demand if statements. Therefore I hate pawns.

Post Reply