Method for testing ELO strength automatically

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
RedBedHed
Posts: 84
Joined: Wed Aug 04, 2021 12:42 am
Full name: Ellie Moore

Method for testing ELO strength automatically

Post by RedBedHed »

Hi there,

So far I have been playing matches against the bots on Chess.com to determine playing strength. It is quite a time-consuming process. Is there a way to quickly (perhaps automatically) test the ELO strength of an engine?
>> Move Generator: Charon
>> Engine: Homura
void life() { playCapitalism(); return; live(); }
smatovic
Posts: 3233
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Method for testing ELO strength automatically

Post by smatovic »

Test suites can give you a rough estimate, e.g.:

https://talkchess.com/forum3/viewtopic.php?t=56653&

I still use them with my engine (~2000 Elo on CCRL).

--
Srdja
jtwright
Posts: 48
Joined: Wed Sep 22, 2021 9:20 pm
Full name: Jeremy Wright

Re: Method for testing ELO strength automatically

Post by jtwright »

The most common method I know of is to do a lot of fast games against an engine of known strength (usually 1000+ or more depending on the size of the Elo difference) to get an Elo estimate. You can use cutechess/cutechess-cli or other tools to automate this.

Once you know your own engine's strength, this can be a reference version of your own engine. While it's a bit noisy as a measurement, since engines have differing strengths and weaknesses, this can usually give a surprisingly solid estimate. e.g. I use this for my own estimates. Mantissa 3.7.2 had a performance of 150-160 Elo over 3.3.0 in selfplay at fast time controls. CCRL testing showed it jumped by about 170 Elo on the Blitz lists, which was pretty close to what my estimate was from selfplay.

If you want to go even more legit than that, you can download several engines of similar strength to your guess ("similar" can be pretty generous here), and set up a whole gauntlet in cutechess. Same idea, really, just with a variety of engines.

Granted, either way this still takes a while. If you only have one core, depending on time control, this could take more than a day for clear results. Should be much faster than trying to do the same via chess.com/lichess.