Page 1 of 6

A poor man's testing environment

Posted: Fri Jan 04, 2013 1:32 pm
by Rebel
I like to present a page for starters how to test a chess engine with limited hardware. I am interested in some feedback for further improvement.

http://www.top-5000.nl/tuning.htm

Re: A poor man's testing environment

Posted: Fri Jan 04, 2013 1:40 pm
by JuLieN
Thanks Ed! Very nice starters. I have started to work on my engine again, and I was planning to use cutechess-cli for the first time. Your guide will prove useful!

Re: A poor man's testing environment

Posted: Fri Jan 04, 2013 1:54 pm
by Houdini
Rebel wrote:I like to present a page for starters how to test a chess engine with limited hardware. I am interested in some feedback for further improvement.

http://www.top-5000.nl/tuning.htm
- I don't understand the need for making 8 copies of the engine in different folders.
- Why don't you use the "-concurrency" option of cutechess-cli?

Re: A poor man's testing environment

Posted: Fri Jan 04, 2013 2:09 pm
by Rebel
Houdini wrote:
Rebel wrote:I like to present a page for starters how to test a chess engine with limited hardware. I am interested in some feedback for further improvement.

http://www.top-5000.nl/tuning.htm
- I don't understand the need for making 8 copies of the engine in different folders.
- Why don't you use the "-concurrency" option of cutechess-cli?
Are the 2 questions related?

Code: Select all

-concurrency <n>&#58;&#58;
	Set the maximum number of concurrent games to <n>.
I never understood the meaning of this option.

The need to make 8 separate copies is related to file usage. I keep statistics and 8 engines writing in the same file is a receipt for trouble, a crash eventually.

Re: A poor man's testing environment.

Posted: Fri Jan 04, 2013 2:30 pm
by Ajedrecista
Hello Ed:
Rebel wrote:

Code: Select all

-concurrency <n>&#58;&#58;
	Set the maximum number of concurrent games to <n>.
I never understood the meaning of this option.
AFAIK, -concurrency means the number of parallel games played at the same time. I briefly used cutechess-cli 0.5.1 at the end of last summer... I have an old dual core, so I could only choose between -concurrency 1 and -concurrency 2 with single core engines, such as Quazar 0.4 w32 (I choosed -concurrency 2). If you have a quad, then you can go up to -concurrency 4 with single core engines.

I am sure that more people will thank you for this 'tuning guide'.

Regards from Spain.

Ajedrecista.

Re: A poor man's testing environment

Posted: Fri Jan 04, 2013 2:33 pm
by Houdini
Rebel wrote:Are the 2 questions related?
Possibly, that depends on you...

The concurrency argument will tell cutechess-cli to play multiple simultaneous games. If you want to use 8 cores, set "-concurrency 8" and cutechess-cli will play 8 simultaneous games.

If I understand correctly, you create 8 engine files in 8 folders and run 8 separate testing processes (clicking 8 times on the batch file) creating 8 PGN output files that need to be combined.

I have 1 engine file and run a single testing process using "-concurrency 8" creating a single PGN output file. KISS!

Re: A poor man's testing environment

Posted: Fri Jan 04, 2013 2:50 pm
by mar
Houdini wrote:If I understand correctly, you create 8 engine files in 8 folders and run 8 separate testing processes (clicking 8 times on the batch file) creating 8 PGN output files that need to be combined.
Well you can run one batch/script to handle this, Ed said he needs to log debug output for each engine separately.
Houdini wrote: KISS!
So true, nothing can beat cut'n'paste :wink:

Re: A poor man's testing environment

Posted: Fri Jan 04, 2013 3:00 pm
by lucasart
Rebel wrote:
Houdini wrote:
Rebel wrote:I like to present a page for starters how to test a chess engine with limited hardware. I am interested in some feedback for further improvement.

http://www.top-5000.nl/tuning.htm
- I don't understand the need for making 8 copies of the engine in different folders.
- Why don't you use the "-concurrency" option of cutechess-cli?
Are the 2 questions related?

Code: Select all

-concurrency <n>&#58;&#58;
	Set the maximum number of concurrent games to <n>.
I never understood the meaning of this option.

The need to make 8 separate copies is related to file usage. I keep statistics and 8 engines writing in the same file is a receipt for trouble, a crash eventually.
Cutechess-cli is perfectly capable of handling concurrent writing into the resulting PGN if that's your point.

An engine is spawn from a single file (the executable), and cutechess-cli will spawn 8 processes from it (all using their own ring fenced adress space). So you can perfectly have 8 instances running of the same engine, all coming form the same executable file. Just like when you open notepad 8 times by clicking on it 8 times, if you prefer a concrete example.

Using the concurrency option is the most efficient way. It means you only have one instance of the cutechess-cli.

Re: A poor man's testing environment

Posted: Fri Jan 04, 2013 3:44 pm
by Rebel
I will investigate if -concurrency also works for me, thanks for pointing out.

Re: A poor man's testing environment

Posted: Fri Jan 04, 2013 3:46 pm
by mcostalba
Rebel wrote:I am interested in some feedback for further improvement.
Thanks!

The interesting part for me is the test from not out-of the book starting positions.

In your start-positions pool you have different kind of late-end game positions. Normally I always test from opening books, eventually truncated at say 10 moves.

But your approach is interesting; one one side you probably enhance the test sensitivity toward the patch you are interested to verify, for instance if it is an evaluation's endgame tweak I guess starting just from late-midgame positions enhances the signal to noise ratio and so would require less games to have reliable result (although current ELO estimators still do not take in account noise level, but this is a different topic).

On the other hand it is easier to introduce artifacts and unwanted bias if you apply it blindly, without understanding what you are doing.

As a general approach I'd still prefer to start out-of-the-book, seems safer to me and, most important, better approximates the "real world" conditions you'll find in public list tests.