cutechess-cli question

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

lech
Posts: 1136
Joined: Sun Feb 14, 2010 10:02 pm

Re: cutechess-cli question

Post by lech »

I try to explain why it is worth to use only one process to test some changes of engine.
If you starts a test between two the same (identical) engines in conditions:
1 CPU, time per move (not too short !), clear hash before each game and e.g. 10 games (5 random position x 2) you should receive the result 5:5. If not; it means that the conditions for both engines were not same (hardware, software (OS) or cutechess-cli). Rather Opereting System ? If you use the same engine in one process (the same process plays as WHITE and BLACK), it is a chance that you get the result closer to (or) 5:5.
Maybe, I can't be friendly, but let me be useful.
User avatar
ilari
Posts: 750
Joined: Mon Mar 27, 2006 7:45 pm
Location: Finland

Re: cutechess-cli question

Post by ilari »

lech wrote:An engine is not a dice (rather Rubik's cube).
The second question is (if I may):
I try to test with option: st=x (time per move),
Why cutechess-cli returns <x loses on time>? I tested it at two original Stockfish-23-32-ja as opponents.
The engines lose on time because they use more than <x> time per move. It happens a lot with "time per move" time controls. You should use the "timemargin=N" option to allow engines to go N msec over the limit.
The third question is (if I may):
To get less random results it would be good to clear hash before each game. Can I do it?
Unfortunately cutechess-cli doesn't yet support setting options between games, so you'll have to clear the hash by yourself when a new game starts.
I try to explain why it is worth to use only one process to test some changes of engine.
If you starts a test between two the same (identical) engines in conditions:
1 CPU, time per move (not too short !), clear hash before each game and e.g. 10 games (5 random position x 2) you should receive the result 5:5.
It doesn't work quite like that. Even if you set a fixed amount of time per move, the amount of CPU cycles used per move will vary, even if you used only one engine process. If you want the results to be as reproducible as possible you should set a node and/or depth limit instead of a time limit.
lech
Posts: 1136
Joined: Sun Feb 14, 2010 10:02 pm

Re: cutechess-cli question

Post by lech »

Thanks for you explain. :D
ilari wrote: It doesn't work quite like that. Even if you set a fixed amount of time per move, the amount of CPU cycles used per move will vary, even if you used only one engine process.
Maybe somenoe has an experience in one process testing? :D
Maybe, I can't be friendly, but let me be useful.
User avatar
hgm
Posts: 27819
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: cutechess-cli question

Post by hgm »

When I developed micro-Max I was testing it in self-play as a single process. The results were just as random as normal, of course. The chances that self-play would end in 5-5 are quite small with any engine I know. Even highly deterministic engines like micro-Max and Eden usually don't play the same game twice. Let alone engines that have hash tables and more advanced time management.
Tom Likens
Posts: 303
Joined: Sat Apr 28, 2012 6:18 pm
Location: Austin, TX

Re: cutechess-cli question

Post by Tom Likens »

lech wrote:If you starts a test between two the same (identical) engines in conditions:
1 CPU, time per move (not too short !), clear hash before each game and e.g. 10 games (5 random position x 2) you should receive the result 5:5. If not; it means that the conditions for both engines were not same (hardware, software (OS) or cutechess-cli).
I know it seems like this should be the case, but in reality it won't be. The timer resolution
simply isn't accurate enough and you will see time jitter between the game runs. Even if the
difference is one or two extra nodes in the search it will eventually result in one of the
engines playing a different move--and BANG! you're done.

To get the 50% result you're looking for, you will need to play *thousands* of games and
the result will asymptotically approach 50%, jittering slightly above and below it. In the
old days we used to play a 100 games and declare victory. Remi's analysis and Bob's
cluster disabused us of that naive notion.

Unfortunately, it's the required sacrifice we all make to the Goddess of Statistics and Chaos.

regards,
--tom
lech
Posts: 1136
Joined: Sun Feb 14, 2010 10:02 pm

Re: cutechess-cli question

Post by lech »

I think it is a problrm of calibration only.
If some progranners need more than 100 games to know whcih version is better, it means that they use engines and computers as dice, not Rubik's cube.
Maybe, I can't be friendly, but let me be useful.