Hi there,
So far I have been playing matches against the bots on Chess.com to determine playing strength. It is quite a time-consuming process. Is there a way to quickly (perhaps automatically) test the ELO strength of an engine?
Method for testing ELO strength automatically
Moderator: Ras
-
- Posts: 84
- Joined: Wed Aug 04, 2021 12:42 am
- Full name: Ellie Moore
-
- Posts: 3233
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Method for testing ELO strength automatically
Test suites can give you a rough estimate, e.g.:
https://talkchess.com/forum3/viewtopic.php?t=56653&
I still use them with my engine (~2000 Elo on CCRL).
--
Srdja
https://talkchess.com/forum3/viewtopic.php?t=56653&
I still use them with my engine (~2000 Elo on CCRL).
--
Srdja
-
- Posts: 48
- Joined: Wed Sep 22, 2021 9:20 pm
- Full name: Jeremy Wright
Re: Method for testing ELO strength automatically
The most common method I know of is to do a lot of fast games against an engine of known strength (usually 1000+ or more depending on the size of the Elo difference) to get an Elo estimate. You can use cutechess/cutechess-cli or other tools to automate this.
Once you know your own engine's strength, this can be a reference version of your own engine. While it's a bit noisy as a measurement, since engines have differing strengths and weaknesses, this can usually give a surprisingly solid estimate. e.g. I use this for my own estimates. Mantissa 3.7.2 had a performance of 150-160 Elo over 3.3.0 in selfplay at fast time controls. CCRL testing showed it jumped by about 170 Elo on the Blitz lists, which was pretty close to what my estimate was from selfplay.
If you want to go even more legit than that, you can download several engines of similar strength to your guess ("similar" can be pretty generous here), and set up a whole gauntlet in cutechess. Same idea, really, just with a variety of engines.
Granted, either way this still takes a while. If you only have one core, depending on time control, this could take more than a day for clear results. Should be much faster than trying to do the same via chess.com/lichess.
Once you know your own engine's strength, this can be a reference version of your own engine. While it's a bit noisy as a measurement, since engines have differing strengths and weaknesses, this can usually give a surprisingly solid estimate. e.g. I use this for my own estimates. Mantissa 3.7.2 had a performance of 150-160 Elo over 3.3.0 in selfplay at fast time controls. CCRL testing showed it jumped by about 170 Elo on the Blitz lists, which was pretty close to what my estimate was from selfplay.
If you want to go even more legit than that, you can download several engines of similar strength to your guess ("similar" can be pretty generous here), and set up a whole gauntlet in cutechess. Same idea, really, just with a variety of engines.
Granted, either way this still takes a while. If you only have one core, depending on time control, this could take more than a day for clear results. Should be much faster than trying to do the same via chess.com/lichess.
Mantissa: https://github.com/jtheardw/mantissa