Page 1 of 4

Poor mans testing process please

Posted: Wed Mar 18, 2020 11:41 pm
by lauriet
Hi all,
I have pretty much no facilities for testing changes in my program for gains or loses.
What is the best compromise that will get me in the ball park? I can, and have a lot of stats, but is there a most logical/reliable way to determine if a change is a plus or a minus ?

Time to depth ?
Node counts ?
Cut offs ?
TT hits.
etc, etc.

Thanks
Laurie (LTchess2)

Re: Poor mans testing process please

Posted: Thu Mar 19, 2020 1:14 am
by brianr
Depending on the current strength of your engine, early changes that result in large improvements (hopefully) are relatively easy to test. As the engine gets stronger, it gets increasingly more difficult to measure smaller improvements. Fast time control games are helpful for search related changes initially, and fixed node games are OK for evaluation changes. Suggest using the Ordo 1.2.6 release. The smaller the improvements become, the harder it becomes to measure them; many more games will be required. Position and test sets can be useful once you get a feel for how your engine behaves, but the best thing will be match games. From time to time also suggest testing your test methodology by matching a copy of the engine against itself to make sure things are still very close to 50/50. I cannot tell you how many times I have messed that up over the years. Finally, self-play is fine, but from time to time play against a pool of opponents. Self-play tends to find things that your engine can exploit and those areas will be very different with a pool of engines.

Re: Poor mans testing process please

Posted: Thu Mar 19, 2020 7:05 am
by lauriet
I cant really play 10,000 self play games or that many games against an opponent.
Are the test set positions useful ? Do they work. Are they accurate?
I really need a simple, over arching, ball park test that can help me elliminate dumb ideas and illuminate potention good ideas.

My ideal would be "time to depth" gives an indication of how fast Im moving through nodes.

Re: Poor mans testing process please

Posted: Thu Mar 19, 2020 9:19 am
by hgm
Unfortunately there is no real alternative for playing games. Other methods will do more harm than good.

Upside is that initially you won't need nearly as many games as 10,000 to weed out dumb ideas. 1000 games can be enough.

It is also not clear to me why you cannot play 10,000 games. You could play 10,000 games in a few minutes, if you wanted.

Re: Poor mans testing process please

Posted: Thu Mar 19, 2020 10:40 am
by Ratosh
Playing games is the best way to validate improvements, even better if you use SPRT. I'm using OpenBench, and it is pretty easy to setup, start tests and let them running while you do other stuff.

You can have different elo bounds:
You can try to use bigger bounds like [0,10] should be safe to reach a decent CCRL rating. With this bound you have at least 50% chance of a +3 elo patch to pass and all tests should take less than 10k games. Of course small changes should fail, but you have a decent confidence that all passing tests are improvements.

You can have different alpha and beta bounds:
Default SPRT bound is 5%, giving you a 95% confidence that the test was correct. You can set it to 10% and tests will finish faster with 90% confidence. I used 20% until my engine was somewhat strong (both STC and LTC).

Notes:
- Changes have bigger impact on weaker engines (you don't need many games to validate it).
- You don't really need to test 2 different TC to validate a change (especially on a weak engine).
- It is fine to have lower confidence that is a STC improvement, but you want to have a decent confidence that it is a improvement on LTC.

Check this to have an estimation of how many games you need: http://chess-sprt-calc.azurewebsites.net/

Re: Poor mans testing process please

Posted: Fri Mar 20, 2020 7:22 am
by xr_a_y
lauriet wrote:
Thu Mar 19, 2020 7:05 am
I cant really play 10,000 self play games or that many games against an opponent.
May I ask what are your constrain ?

Using only xboard/winboard on a single low tech core, you can easily test engine versus engine 10 sec games, 1000 games.
If you are looking for +20/30elo gain, this will be ok, and you can run many of those tests each days (even 20 sec games).

Re: Poor mans testing process please

Posted: Fri Mar 20, 2020 7:33 am
by lauriet
I'm afraid I have very limited knowledge and faciitiy to do any "hitech" testing.
I think the best I can do is to use test suites.
I have looked at 'STS'.
Can anyone explain how I can use this to give me a 'Ball Park' idea.
I'm not looking for +/- 10 elo resolution, but would be happy to know if my engine is 1800 or 2200.

Re: Poor mans testing process please

Posted: Fri Mar 20, 2020 9:40 am
by brianr
Read from here and follow the various links:
https://www.chessprogramming.org/Engine_Testing

Re: Poor mans testing process please

Posted: Fri Mar 20, 2020 11:01 am
by Henk
lauriet wrote:
Wed Mar 18, 2020 11:41 pm
Hi all,
I have pretty much no facilities for testing changes in my program for gains or loses.
What is the best compromise that will get me in the ball park? I can, and have a lot of stats, but is there a most logical/reliable way to determine if a change is a plus or a minus ?

Time to depth ?
Node counts ?
Cut offs ?
TT hits.
etc, etc.

Thanks
Laurie (LTchess2)
They say I am a bad tester. Play one game and see what errors it made. Try correct these errors in your software. Extract positions. Always make sure that errors are reproducable.

Re: Poor mans testing process please

Posted: Fri Mar 20, 2020 11:19 am
by hgm
lauriet wrote:
Fri Mar 20, 2020 7:33 am
I'm afraid I have very limited knowledge and faciitiy to do any "hitech" testing.
I think the best I can do is to use test suites.
I have looked at 'STS'.
Can anyone explain how I can use this to give me a 'Ball Park' idea.
I'm not looking for +/- 10 elo resolution, but would be happy to know if my engine is 1800 or 2200.
Well, you won't ever get to know that from a test suite. You could know it, however, by playing a dozen games.