The average CCRL rating of the test engines is 2410. In this sample it says that the average opponent is 2340.
Would it be correct to add 70 points to the average and therefore also add 70 points to the Romichess performance? Thanks.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
The average CCRL rating of the test engines is 2410. In this sample it says that the average opponent is 2340.
Would it be correct to add 70 points to the average and therefore also add 70 points to the Romichess performance? Thanks.
Not at all since you are not presenting the all of the participants, :Look at your won/lost record in total - wins do not equal losses. Look at the Elo of the "others" - they played against opponents with an Elo rating of 2503, but RomiChess rating is 2481. It is an incomplete data set. Basically , it does not add up.
Also, you have some very blunt rulers there. The windows are 600 Elo wide or some even worse.
Like building a desk with a ruler that has 17 inches snapped off, and worrying if you should add one millimeter to a dimension.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
The average CCRL rating of the test engines is 2410. In this sample it says that the average opponent is 2340.
Would it be correct to add 70 points to the average and therefore also add 70 points to the Romichess performance? Thanks.
Not at all since you are not presenting the all of the participants, :Look at your won/lost record in total - wins do not equal losses. Look at the Elo of the "others" - they played against opponents with an Elo rating of 2503, but RomiChess rating is 2481. It is an incomplete data set. Basically , it does not add up.
Don't blame me, lol. Blame Arena! However all the games are presented as 6+5+4+6+5 = 26.
Maybe Tcb being whitewashed is causing a problem. Let's look at the latest results.
Edit: Okay I just noticed that Yace seems to be missing a result. I have no idea why that would be.
Last edited by Michael Sherwin on Sat Mar 31, 2018 5:34 am, edited 1 time in total.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Also, you have some very blunt rulers there. The windows are 600 Elo wide or some even worse.
Like building a desk with a ruler that has 17 inches snapped off, and worrying if you should add one millimeter to a dimension.
Well it was early in the test and there were hundreds of games left to play so the margins were wide.
So after 500 games a tpr is not an accurate representation?
Should I have the test engines play a round robin against each other?
But all I want to do is compare test with a ballpark idea of Romi's true elo based on the CCRL ratings?
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
But my original question remains, modified. Why can't I adjust the ratings of the test engines to reflect their average rating in CCRL and then add that amount to Romi's rating?
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Also, you have some very blunt rulers there. The windows are 600 Elo wide or some even worse.
Like building a desk with a ruler that has 17 inches snapped off, and worrying if you should add one millimeter to a dimension.
Well it was early in the test and there were hundreds of games left to play so the margins were wide.
So after 500 games a tpr is not an accurate representation?
Should I have the test engines play a round robin against each other?
But all I want to do is compare test with a ballpark idea of Romi's true elo based on the CCRL ratings?
It takes an absurd number of games to get an accurate answer. That is why testing frameworks play games at frightening velocity.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But my original question remains, modified. Why can't I adjust the ratings of the test engines to reflect their average rating in CCRL and then add that amount to Romi's rating?
Hi Mike, of course you can always scale up or down the ratings of your test engines. Ratings calculated by EloStat or any of the more modern (and better) programs BayesElo and Ordo are always relative to the pool of engines you deal with. You can add +3000 to all engines, and still you will get the same relative result, in this case Romi performing around +130 better than its average opponent.
The main point, however, is that the error margins for such a small number of games are way too high to derive *anything* from it. All that you can say is that Romi performs 130 +/- 100 better than the others, so it might as well be +30 or +230.
Regarding the question of TPR, I don't think this applies here at all since engine ratings are not calculated incrementally, like for human players, but always from scratch from the complete set of games of all players in the pool.
Thanks Sven, That helped a lot. All I was looking for is an indication, a ballpark value, and I think that you confirmed that I can adjust to the CCRL scale.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
So this test is over. I can add 56 elo to the average Opponent to scale them to CCRL. That gives RomiChess an estimated CCRL rating of 2502. And even the error margin's worst still means an improvement for Romi. This was the second test and the first test was a bit better at 2510 however the second test results were far more consistent across all test engines and is an improvement over all engines.
And Sven, Romi is still a little miffed at Jumbo for leaving her behind in D7 so she cornered Jumbo 0.6.10-bb 64 into a I'm going to show you match. Of course she is challenging only one Jumbo and not the entire heard. How does Jumbo handle 10 sec with a 1 sec increment? Currently Romi 14W 9L 8D
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through