### Measuring elo rating engine outside tournament

Posted:

**Tue Jul 04, 2017 12:58 pm**This must be an old topic, but I cannot find the answer I am seeking on the forum nor on google.

I would like to know how to test the elo rating of an engine based on testsets and not on tournament results.

In tournaments, elo rating is calculated based on the initial elo rating of the participating players. This has 2 drawbacks, if the initial rating is not accurate, and thus the results are not accurate. Secondly, you need a lot of games and a lot of games in between almost equal strength players to have a really accurate result. A player winning or losing all his games is not contributing a lot to calculate the real elo of itself or of his adversaries.

To illustrate, my engine has a difference of about 600 elo in between its lowest and its highest calculated... As a human player, doing tests positions, my actual elo is maximum 100 off my real elo.

Are there any epd test files for different elo ratings that will give you estimated elo based on its score. e.g. thinking time 60 seconds, score 30% means this estimated elo, score 40% gives this....

Given the large span in between the engines, the ones that interest me the most are those that are sub-master level. The top playing programs are already calculated in tournaments. Additional requirement, those engines need to be WB or UCI so we can automate the tests. Any suggestions for good linux engines in the range 1200-2200.

Anyone having usable testsets?

If we have enough engines for which the score calculated is fairly correct, we can use their test results in order to ponder the test sets. Anyone having done this?

If you tried to answer the same question, how have you organised the testing?

Regards,

I would like to know how to test the elo rating of an engine based on testsets and not on tournament results.

In tournaments, elo rating is calculated based on the initial elo rating of the participating players. This has 2 drawbacks, if the initial rating is not accurate, and thus the results are not accurate. Secondly, you need a lot of games and a lot of games in between almost equal strength players to have a really accurate result. A player winning or losing all his games is not contributing a lot to calculate the real elo of itself or of his adversaries.

To illustrate, my engine has a difference of about 600 elo in between its lowest and its highest calculated... As a human player, doing tests positions, my actual elo is maximum 100 off my real elo.

Are there any epd test files for different elo ratings that will give you estimated elo based on its score. e.g. thinking time 60 seconds, score 30% means this estimated elo, score 40% gives this....

Given the large span in between the engines, the ones that interest me the most are those that are sub-master level. The top playing programs are already calculated in tournaments. Additional requirement, those engines need to be WB or UCI so we can automate the tests. Any suggestions for good linux engines in the range 1200-2200.

Anyone having usable testsets?

If we have enough engines for which the score calculated is fairly correct, we can use their test results in order to ponder the test sets. Anyone having done this?

If you tried to answer the same question, how have you organised the testing?

Regards,