Discussion of anything and everything relating to chess playing software and machines.
Moderators: hgm, Harvey Williamson, bob
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
-
Michael Sherwin
- Posts: 2799
- Joined: Fri May 26, 2006 1:00 am
- Location: OH, USA
Post
by Michael Sherwin » Fri Mar 30, 2018 5:37 pm
Code: Select all
1 RomiChess : 2481 132 126 26 69.2 % 2340 23.1 %
2 Horizon_4_4 : 2444 316 342 6 41.7 % 2503 16.7 %
3 Bitfoot-1.0.65acfcb-win64 : 2432 279 300 5 40.0 % 2503 40.0 %
4 Yace : 2414 415 497 4 37.5 % 2503 25.0 %
5 OliThink532_x64 : 2382 266 295 6 33.3 % 2503 33.3 %
6 Tcb0052 : 1903 0 0 5 0.0 % 2503 0.0 %
The average CCRL rating of the test engines is 2410. In this sample it says that the average opponent is 2340.
Would it be correct to add 70 points to the average and therefore also add 70 points to the Romichess performance? Thanks.
Regards,
Mike
-
MikeB
- Posts: 2522
- Joined: Thu Mar 09, 2006 5:34 am
- Location: Pen Argyl, Pennsylvania
Post
by MikeB » Sat Mar 31, 2018 2:42 am
Michael Sherwin wrote:Code: Select all
1 RomiChess : 2481 132 126 26 69.2 % 2340 23.1 %
2 Horizon_4_4 : 2444 316 342 6 41.7 % 2503 16.7 %
3 Bitfoot-1.0.65acfcb-win64 : 2432 279 300 5 40.0 % 2503 40.0 %
4 Yace : 2414 415 497 4 37.5 % 2503 25.0 %
5 OliThink532_x64 : 2382 266 295 6 33.3 % 2503 33.3 %
6 Tcb0052 : 1903 0 0 5 0.0 % 2503 0.0 %
The average CCRL rating of the test engines is 2410. In this sample it says that the average opponent is 2340.
Would it be correct to add 70 points to the average and therefore also add 70 points to the Romichess performance? Thanks.
Not at all since you are not presenting the all of the participants, :Look at your won/lost record in total - wins do not equal losses. Look at the Elo of the "others" - they played against opponents with an Elo rating of 2503, but RomiChess rating is 2481. It is an incomplete data set. Basically , it does not add up.
-
Dann Corbit
- Posts: 8662
- Joined: Wed Mar 08, 2006 7:57 pm
- Location: Redmond, WA USA
-
Contact:
Post
by Dann Corbit » Sat Mar 31, 2018 2:56 am
There is an important difference between a TPR Elo and an Elo.
https://www.chess.com/forum/view/genera ... atings-tpr
Also, you have some very blunt rulers there. The windows are 600 Elo wide or some even worse.
Like building a desk with a ruler that has 17 inches snapped off, and worrying if you should add one millimeter to a dimension.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
Michael Sherwin
- Posts: 2799
- Joined: Fri May 26, 2006 1:00 am
- Location: OH, USA
Post
by Michael Sherwin » Sat Mar 31, 2018 3:21 am
MikeB wrote:Michael Sherwin wrote:Code: Select all
1 RomiChess : 2481 132 126 26 69.2 % 2340 23.1 %
2 Horizon_4_4 : 2444 316 342 6 41.7 % 2503 16.7 %
3 Bitfoot-1.0.65acfcb-win64 : 2432 279 300 5 40.0 % 2503 40.0 %
4 Yace : 2414 415 497 4 37.5 % 2503 25.0 %
5 OliThink532_x64 : 2382 266 295 6 33.3 % 2503 33.3 %
6 Tcb0052 : 1903 0 0 5 0.0 % 2503 0.0 %
The average CCRL rating of the test engines is 2410. In this sample it says that the average opponent is 2340.
Would it be correct to add 70 points to the average and therefore also add 70 points to the Romichess performance? Thanks.
Not at all since you are not presenting the all of the participants, :Look at your won/lost record in total - wins do not equal losses. Look at the Elo of the "others" - they played against opponents with an Elo rating of 2503, but RomiChess rating is 2481. It is an incomplete data set. Basically , it does not add up.
Don't blame me, lol. Blame Arena! However all the games are presented as 6+5+4+6+5 = 26.
Maybe Tcb being whitewashed is causing a problem. Let's look at the latest results.
Code: Select all
Program Elo + - Games Score Av.Op. Draws
1 RomiChess : 2466 42 42 226 68.1 % 2334 23.0 %
2 Tcb0052 : 2380 93 95 45 37.8 % 2467 22.2 %
3 Yace : 2370 84 86 44 36.4 % 2467 36.4 %
4 Bitfoot-1.0.65acfcb-win64 : 2320 100 105 45 30.0 % 2467 15.6 %
5 OliThink532_x64 : 2314 93 97 46 29.3 % 2467 23.9 %
6 Horizon_4_4 : 2286 101 106 46 26.1 % 2467 17.4 %
Edit: Okay I just noticed that Yace seems to be missing a result. I have no idea why that would be.
Last edited by
Michael Sherwin on Sat Mar 31, 2018 3:34 am, edited 1 time in total.
Regards,
Mike
-
Michael Sherwin
- Posts: 2799
- Joined: Fri May 26, 2006 1:00 am
- Location: OH, USA
Post
by Michael Sherwin » Sat Mar 31, 2018 3:30 am
Dann Corbit wrote:There is an important difference between a TPR Elo and an Elo.
https://www.chess.com/forum/view/genera ... atings-tpr
Also, you have some very blunt rulers there. The windows are 600 Elo wide or some even worse.
Like building a desk with a ruler that has 17 inches snapped off, and worrying if you should add one millimeter to a dimension.
Well it was early in the test and there were hundreds of games left to play so the margins were wide.
So after 500 games a tpr is not an accurate representation?
Should I have the test engines play a round robin against each other?
But all I want to do is compare test with a ballpark idea of Romi's true elo based on the CCRL ratings?
Regards,
Mike
-
Michael Sherwin
- Posts: 2799
- Joined: Fri May 26, 2006 1:00 am
- Location: OH, USA
Post
by Michael Sherwin » Sat Mar 31, 2018 3:49 am
Michael Sherwin wrote:
Edit: Okay I just noticed that Yace seems to be missing a result. I have no idea why that would be.
Solved. I had a false start and forgot to delete those games from the current games. This is correct.
Code: Select all
Program Elo + - Games Score Av.Op. Draws
1 RomiChess : 2466 42 41 231 68.0 % 2335 22.5 %
2 Tcb0052 : 2390 92 93 46 39.1 % 2467 21.7 %
3 Yace : 2374 83 85 46 37.0 % 2467 34.8 %
4 Bitfoot-1.0.65acfcb-win64 : 2318 97 102 47 29.8 % 2467 17.0 %
5 OliThink532_x64 : 2305 96 100 46 28.3 % 2467 21.7 %
6 Horizon_4_4 : 2286 101 106 46 26.1 % 2467 17.4 %
But my original question remains, modified. Why can't I adjust the ratings of the test engines to reflect their average rating in CCRL and then add that amount to Romi's rating?
Regards,
Mike
-
Dann Corbit
- Posts: 8662
- Joined: Wed Mar 08, 2006 7:57 pm
- Location: Redmond, WA USA
-
Contact:
Post
by Dann Corbit » Sat Mar 31, 2018 6:44 am
Michael Sherwin wrote:Dann Corbit wrote:There is an important difference between a TPR Elo and an Elo.
https://www.chess.com/forum/view/genera ... atings-tpr
Also, you have some very blunt rulers there. The windows are 600 Elo wide or some even worse.
Like building a desk with a ruler that has 17 inches snapped off, and worrying if you should add one millimeter to a dimension.
Well it was early in the test and there were hundreds of games left to play so the margins were wide.
So after 500 games a tpr is not an accurate representation?
Should I have the test engines play a round robin against each other?
But all I want to do is compare test with a ballpark idea of Romi's true elo based on the CCRL ratings?
It takes an absurd number of games to get an accurate answer. That is why testing frameworks play games at frightening velocity.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
Sven
- Posts: 3576
- Joined: Thu May 15, 2008 7:57 pm
- Location: Berlin, Germany
Post
by Sven » Sat Mar 31, 2018 8:36 am
Michael Sherwin wrote:Michael Sherwin wrote:
Edit: Okay I just noticed that Yace seems to be missing a result. I have no idea why that would be.
Solved. I had a false start and forgot to delete those games from the current games. This is correct.
Code: Select all
Program Elo + - Games Score Av.Op. Draws
1 RomiChess : 2466 42 41 231 68.0 % 2335 22.5 %
2 Tcb0052 : 2390 92 93 46 39.1 % 2467 21.7 %
3 Yace : 2374 83 85 46 37.0 % 2467 34.8 %
4 Bitfoot-1.0.65acfcb-win64 : 2318 97 102 47 29.8 % 2467 17.0 %
5 OliThink532_x64 : 2305 96 100 46 28.3 % 2467 21.7 %
6 Horizon_4_4 : 2286 101 106 46 26.1 % 2467 17.4 %
But my original question remains, modified. Why can't I adjust the ratings of the test engines to reflect their average rating in CCRL and then add that amount to Romi's rating?
Hi Mike, of course you can always scale up or down the ratings of your test engines. Ratings calculated by EloStat or any of the more modern (and better) programs BayesElo and Ordo are always relative to the pool of engines you deal with. You can add +3000 to all engines, and still you will get the same relative result, in this case Romi performing around +130 better than its average opponent.
The main point, however, is that the error margins for such a small number of games are way too high to derive *anything* from it. All that you can say is that Romi performs 130 +/- 100 better than the others, so it might as well be +30 or +230.
Regarding the question of TPR, I don't think this applies here at all since engine ratings are not calculated incrementally, like for human players, but always from scratch from the complete set of games of all players in the pool.
-
Michael Sherwin
- Posts: 2799
- Joined: Fri May 26, 2006 1:00 am
- Location: OH, USA
Post
by Michael Sherwin » Sat Mar 31, 2018 9:51 am
Thanks Sven, That helped a lot.

All I was looking for is an indication, a ballpark value, and I think that you confirmed that I can adjust to the CCRL scale.
Regards,
Mike
-
Michael Sherwin
- Posts: 2799
- Joined: Fri May 26, 2006 1:00 am
- Location: OH, USA
Post
by Michael Sherwin » Sat Mar 31, 2018 6:45 pm
Code: Select all
Program Elo + - Games Score Av.Op. Draws
1 RomiChess : 2446 27 27 500 62.9 % 2354 23.8 %
2 Yace : 2415 60 60 100 45.5 % 2447 25.0 %
3 Tcb0052 : 2365 60 61 100 38.5 % 2447 25.0 %
4 Horizon_4_4 : 2354 61 62 100 37.0 % 2447 24.0 %
5 OliThink532_x64 : 2343 61 62 100 35.5 % 2447 25.0 %
6 Bitfoot-1.0.65acfcb-win64 : 2291 65 67 100 29.0 % 2447 20.0 %
So this test is over. I can add 56 elo to the average Opponent to scale them to CCRL. That gives RomiChess an estimated CCRL rating of 2502. And even the error margin's worst still means an improvement for Romi. This was the second test and the first test was a bit better at 2510 however the second test results were far more consistent across all test engines and is an improvement over all engines.
And Sven, Romi is still a little miffed at Jumbo for leaving her behind in D7 so she cornered Jumbo 0.6.10-bb 64 into a I'm going to show you match.

Of course she is challenging only one Jumbo and not the entire heard. How does Jumbo handle 10 sec with a 1 sec increment? Currently Romi 14W 9L 8D

Regards,
Mike