how to properly test the changes to the engine ?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

MahmoudUthman
Posts: 234
Joined: Sat Jan 17, 2015 11:54 pm

how to properly test the changes to the engine ?

Post by MahmoudUthman »

1-how many opponents to use ?
2-how much should the rating difference between the weakest and the strongest opponents be ?
3-how many different starting positions should I use ?
4-smallest sufficient time control ? and should I test at different time control or should a single fixed time control be enough ?
5-How many games to play ?
7-Is it okay to use concurrent games ?
6-anything else to consider ?
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: how to properly test the changes to the engine ?

Post by jdart »

MahmoudUthman wrote:1-how many opponents to use ?
2-how much should the rating difference between the weakest and the strongest opponents be ?
I think 4-6 opponents is a reasonable number and they should be not too far away from your current rating.
3-how many different starting positions should I use ?
I use a large collection of 8-move PGN openings. I think these came from the Stockfish team originally.
4-smallest sufficient time control ? and should I test at different time control or should a single fixed time control be enough ?
I use 0:04+0.1 (game in 4 sec. + 0.1 sec increment). Note that many engines do not run reliably at this time control (they will lose on time). For this to the work the engine has to use a high-resolution timer.

Ideally you should also test at time controls that are closer to a realistic game speed, but that requires a lot of CPU tme.
5-How many games to play ?
This depends on how much accuracy you need. But generally measuring the effects of a small change requires tens of thousands of games. My standard test run is 36,000 games.
7-Is it okay to use concurrent games ?
Yes, but IMO you should not have more concurrent CPU usage than you have physical cores.

--Jon
Daniel Anulliero
Posts: 759
Joined: Fri Jan 04, 2013 4:55 pm
Location: Nice

Re: how to properly test the changes to the engine ?

Post by Daniel Anulliero »

1-2 : I use 10 oponents not too far away from Isa in strenght
3: I use 100 diférents pgn opennings played twice (with black and white) So 200 games against each over
4 : TC 1 minute + 250 miliseconds so lot of time ok but I think using very fast time control is relevant if your engine can reach big depth (like arasan)
5 : so with 200 games and 10 oponents = 2000 games - I think you can only mesure 20-30 elo improvments with 2000 games but if your engine is in développement , you can win 30-40 elo at one time
7: I play one game at a time
6: be patient , have a solid methodology , and do a lot of backuPs
NOT LIKE ME
:wink:
Isa download :
ZirconiumX
Posts: 1334
Joined: Sun Jul 17, 2011 11:14 am

Re: how to properly test the changes to the engine ?

Post by ZirconiumX »

1: I use self-play, so the only opponent is the currently strongest version of my program.
2: See above. Though sometimes I will run a verification match against a reasonably stronger opponent.
3: I use the winboard default_book.bin limited to 4 moves.
4: I play my games at 5 seconds plus 0.05 seconds a move. I know I should probably test my games at long time controls, but I do not have the resources for it.
5: I use something called SPRT, which is a mathematical formula that allows cutechess to stop the match when enough games have been run.
7: Concurrent games can and should be run, if only so you get a reasonably fast result. Though I'm not as strong as Arasan, I use running one less concurrent match than I have cores as a guideline.
6: I think for something like this, you have to place your faith in the test - don't finish the match before the test has completed, for example.
Some believe in the almighty dollar.

I believe in the almighty printf statement.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: how to properly test the changes to the engine ?

Post by Ferdy »

MahmoudUthman wrote:1-how many opponents to use ?
Collect as many engines as possible and select those with different style from your engine.
MahmoudUthman wrote: 2-how much should the rating difference between the weakest and the strongest opponents be ?
20 rating points.
MahmoudUthman wrote:3-how many different starting positions should I use ?
Start at 1000 positions.
MahmoudUthman wrote: 4-smallest sufficient time control ?
Test the scoring reliability of your engine with the computer your are testing. How?
run a match between engineA vs engineB where engineA is actually the same as engineB. Test at TC 30s + 100ms inc/move (with same opening suite): Can your get 50% after 1000 games? or close to 50%?
If you cannot get 50% or close to it, test it at TC 60s + 200ms inc/move and check the score. Increase it until you get 50% or close to it.
MahmoudUthman wrote: and should I test at different time control or should a single fixed time control be enough ?
Test it at 2 TC to determine if the changes scales with time.
MahmoudUthman wrote: 5-How many games to play ?
Depends on what you are trying to test for. For strength test, normally you need to use bayeselo or ordo tools and check the error margin. Or for a quick estimate, an engine that leads by +100 wins in a 1000 game test would be good.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: how to properly test the changes to the engine ?

Post by Sven »

jdart wrote:
7-Is it okay to use concurrent games ?
Yes, but IMO you should not have more concurrent CPU usage than you have physical cores.
I use cutechess-cli but an older version of it, and there I get a lot of trouble when playing concurrent games (matches are stopped prematurely due to arising communication problems - playing one game at a time works fine). Is there a version of cutechess-cli that has a reliable support of concurrent games?
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: how to properly test the changes to the engine ?

Post by Adam Hair »

Sven Schüle wrote:
jdart wrote:
7-Is it okay to use concurrent games ?
Yes, but IMO you should not have more concurrent CPU usage than you have physical cores.
I use cutechess-cli but an older version of it, and there I get a lot of trouble when playing concurrent games (matches are stopped prematurely due to arising communication problems - playing one game at a time works fine). Is there a version of cutechess-cli that has a reliable support of concurrent games?
Cutechess-cli 0.6.0 works well on my 8 core Linux server with concurrency=6 even for super fast games. The current version of Cutechess-cli works well on my Windows laptop.
PK
Posts: 893
Joined: Mon Jan 15, 2007 11:23 am
Location: Warsza

Re: how to properly test the changes to the engine ?

Post by PK »

For UCU engines, LittleBlitzer is a decent tool. No problems with using several threads.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: how to properly test the changes to the engine ?

Post by Sven »

Adam Hair wrote:
Sven Schüle wrote:
jdart wrote:
7-Is it okay to use concurrent games ?
Yes, but IMO you should not have more concurrent CPU usage than you have physical cores.
I use cutechess-cli but an older version of it, and there I get a lot of trouble when playing concurrent games (matches are stopped prematurely due to arising communication problems - playing one game at a time works fine). Is there a version of cutechess-cli that has a reliable support of concurrent games?
Cutechess-cli 0.6.0 works well on my 8 core Linux server with concurrency=6 even for super fast games. The current version of Cutechess-cli works well on my Windows laptop.
Thanks. I think the version I use is 0.8.x (0.8.2 probably), and there is already a 0.9.x. I can give 0.6.0 a try but hmmmmm should I really use such an old version ...

Did you also test WB engines under Windows with it?
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: how to properly test the changes to the engine ?

Post by jdart »

I don't actually use the concurrency feature. I just run a lot of processes.

--Jon