1-how many opponents to use ?
2-how much should the rating difference between the weakest and the strongest opponents be ?
3-how many different starting positions should I use ?
4-smallest sufficient time control ? and should I test at different time control or should a single fixed time control be enough ?
5-How many games to play ?
7-Is it okay to use concurrent games ?
6-anything else to consider ?
how to properly test the changes to the engine ?
Moderators: hgm, Rebel, chrisw
-
- Posts: 234
- Joined: Sat Jan 17, 2015 11:54 pm
-
- Posts: 4367
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: how to properly test the changes to the engine ?
I think 4-6 opponents is a reasonable number and they should be not too far away from your current rating.MahmoudUthman wrote:1-how many opponents to use ?
2-how much should the rating difference between the weakest and the strongest opponents be ?
I use a large collection of 8-move PGN openings. I think these came from the Stockfish team originally.3-how many different starting positions should I use ?
I use 0:04+0.1 (game in 4 sec. + 0.1 sec increment). Note that many engines do not run reliably at this time control (they will lose on time). For this to the work the engine has to use a high-resolution timer.4-smallest sufficient time control ? and should I test at different time control or should a single fixed time control be enough ?
Ideally you should also test at time controls that are closer to a realistic game speed, but that requires a lot of CPU tme.
This depends on how much accuracy you need. But generally measuring the effects of a small change requires tens of thousands of games. My standard test run is 36,000 games.5-How many games to play ?
Yes, but IMO you should not have more concurrent CPU usage than you have physical cores.7-Is it okay to use concurrent games ?
--Jon
-
- Posts: 759
- Joined: Fri Jan 04, 2013 4:55 pm
- Location: Nice
Re: how to properly test the changes to the engine ?
1-2 : I use 10 oponents not too far away from Isa in strenght
3: I use 100 diférents pgn opennings played twice (with black and white) So 200 games against each over
4 : TC 1 minute + 250 miliseconds so lot of time ok but I think using very fast time control is relevant if your engine can reach big depth (like arasan)
5 : so with 200 games and 10 oponents = 2000 games - I think you can only mesure 20-30 elo improvments with 2000 games but if your engine is in développement , you can win 30-40 elo at one time
7: I play one game at a time
6: be patient , have a solid methodology , and do a lot of backuPs
NOT LIKE ME
3: I use 100 diférents pgn opennings played twice (with black and white) So 200 games against each over
4 : TC 1 minute + 250 miliseconds so lot of time ok but I think using very fast time control is relevant if your engine can reach big depth (like arasan)
5 : so with 200 games and 10 oponents = 2000 games - I think you can only mesure 20-30 elo improvments with 2000 games but if your engine is in développement , you can win 30-40 elo at one time
7: I play one game at a time
6: be patient , have a solid methodology , and do a lot of backuPs
NOT LIKE ME
Isa download :
-
- Posts: 1334
- Joined: Sun Jul 17, 2011 11:14 am
Re: how to properly test the changes to the engine ?
1: I use self-play, so the only opponent is the currently strongest version of my program.
2: See above. Though sometimes I will run a verification match against a reasonably stronger opponent.
3: I use the winboard default_book.bin limited to 4 moves.
4: I play my games at 5 seconds plus 0.05 seconds a move. I know I should probably test my games at long time controls, but I do not have the resources for it.
5: I use something called SPRT, which is a mathematical formula that allows cutechess to stop the match when enough games have been run.
7: Concurrent games can and should be run, if only so you get a reasonably fast result. Though I'm not as strong as Arasan, I use running one less concurrent match than I have cores as a guideline.
6: I think for something like this, you have to place your faith in the test - don't finish the match before the test has completed, for example.
2: See above. Though sometimes I will run a verification match against a reasonably stronger opponent.
3: I use the winboard default_book.bin limited to 4 moves.
4: I play my games at 5 seconds plus 0.05 seconds a move. I know I should probably test my games at long time controls, but I do not have the resources for it.
5: I use something called SPRT, which is a mathematical formula that allows cutechess to stop the match when enough games have been run.
7: Concurrent games can and should be run, if only so you get a reasonably fast result. Though I'm not as strong as Arasan, I use running one less concurrent match than I have cores as a guideline.
6: I think for something like this, you have to place your faith in the test - don't finish the match before the test has completed, for example.
Some believe in the almighty dollar.
I believe in the almighty printf statement.
I believe in the almighty printf statement.
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: how to properly test the changes to the engine ?
Collect as many engines as possible and select those with different style from your engine.MahmoudUthman wrote:1-how many opponents to use ?
20 rating points.MahmoudUthman wrote: 2-how much should the rating difference between the weakest and the strongest opponents be ?
Start at 1000 positions.MahmoudUthman wrote:3-how many different starting positions should I use ?
Test the scoring reliability of your engine with the computer your are testing. How?MahmoudUthman wrote: 4-smallest sufficient time control ?
run a match between engineA vs engineB where engineA is actually the same as engineB. Test at TC 30s + 100ms inc/move (with same opening suite): Can your get 50% after 1000 games? or close to 50%?
If you cannot get 50% or close to it, test it at TC 60s + 200ms inc/move and check the score. Increase it until you get 50% or close to it.
Test it at 2 TC to determine if the changes scales with time.MahmoudUthman wrote: and should I test at different time control or should a single fixed time control be enough ?
Depends on what you are trying to test for. For strength test, normally you need to use bayeselo or ordo tools and check the error margin. Or for a quick estimate, an engine that leads by +100 wins in a 1000 game test would be good.MahmoudUthman wrote: 5-How many games to play ?
-
- Posts: 4052
- Joined: Thu May 15, 2008 9:57 pm
- Location: Berlin, Germany
- Full name: Sven Schüle
Re: how to properly test the changes to the engine ?
I use cutechess-cli but an older version of it, and there I get a lot of trouble when playing concurrent games (matches are stopped prematurely due to arising communication problems - playing one game at a time works fine). Is there a version of cutechess-cli that has a reliable support of concurrent games?jdart wrote:Yes, but IMO you should not have more concurrent CPU usage than you have physical cores.7-Is it okay to use concurrent games ?
-
- Posts: 3226
- Joined: Wed May 06, 2009 10:31 pm
- Location: Fuquay-Varina, North Carolina
Re: how to properly test the changes to the engine ?
Cutechess-cli 0.6.0 works well on my 8 core Linux server with concurrency=6 even for super fast games. The current version of Cutechess-cli works well on my Windows laptop.Sven Schüle wrote:I use cutechess-cli but an older version of it, and there I get a lot of trouble when playing concurrent games (matches are stopped prematurely due to arising communication problems - playing one game at a time works fine). Is there a version of cutechess-cli that has a reliable support of concurrent games?jdart wrote:Yes, but IMO you should not have more concurrent CPU usage than you have physical cores.7-Is it okay to use concurrent games ?
-
- Posts: 893
- Joined: Mon Jan 15, 2007 11:23 am
- Location: Warsza
Re: how to properly test the changes to the engine ?
For UCU engines, LittleBlitzer is a decent tool. No problems with using several threads.
Pawel Koziol
http://www.pkoziol.cal24.pl/rodent/rodent.htm
http://www.pkoziol.cal24.pl/rodent/rodent.htm
-
- Posts: 4052
- Joined: Thu May 15, 2008 9:57 pm
- Location: Berlin, Germany
- Full name: Sven Schüle
Re: how to properly test the changes to the engine ?
Thanks. I think the version I use is 0.8.x (0.8.2 probably), and there is already a 0.9.x. I can give 0.6.0 a try but hmmmmm should I really use such an old version ...Adam Hair wrote:Cutechess-cli 0.6.0 works well on my 8 core Linux server with concurrency=6 even for super fast games. The current version of Cutechess-cli works well on my Windows laptop.Sven Schüle wrote:I use cutechess-cli but an older version of it, and there I get a lot of trouble when playing concurrent games (matches are stopped prematurely due to arising communication problems - playing one game at a time works fine). Is there a version of cutechess-cli that has a reliable support of concurrent games?jdart wrote:Yes, but IMO you should not have more concurrent CPU usage than you have physical cores.7-Is it okay to use concurrent games ?
Did you also test WB engines under Windows with it?
-
- Posts: 4367
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: how to properly test the changes to the engine ?
I don't actually use the concurrency feature. I just run a lot of processes.
--Jon
--Jon