I try to explain why it is worth to use only one process to test some changes of engine.
If you starts a test between two the same (identical) engines in conditions:
1 CPU, time per move (not too short !), clear hash before each game and e.g. 10 games (5 random position x 2) you should receive the result 5:5. If not; it means that the conditions for both engines were not same (hardware, software (OS) or cutechess-cli). Rather Opereting System ? If you use the same engine in one process (the same process plays as WHITE and BLACK), it is a chance that you get the result closer to (or) 5:5.
cutechess-cli question
Moderators: hgm, Rebel, chrisw
-
- Posts: 1136
- Joined: Sun Feb 14, 2010 10:02 pm
Re: cutechess-cli question
Maybe, I can't be friendly, but let me be useful.
-
- Posts: 750
- Joined: Mon Mar 27, 2006 7:45 pm
- Location: Finland
Re: cutechess-cli question
The engines lose on time because they use more than <x> time per move. It happens a lot with "time per move" time controls. You should use the "timemargin=N" option to allow engines to go N msec over the limit.lech wrote:An engine is not a dice (rather Rubik's cube).
The second question is (if I may):
I try to test with option: st=x (time per move),
Why cutechess-cli returns <x loses on time>? I tested it at two original Stockfish-23-32-ja as opponents.
Unfortunately cutechess-cli doesn't yet support setting options between games, so you'll have to clear the hash by yourself when a new game starts.The third question is (if I may):
To get less random results it would be good to clear hash before each game. Can I do it?
It doesn't work quite like that. Even if you set a fixed amount of time per move, the amount of CPU cycles used per move will vary, even if you used only one engine process. If you want the results to be as reproducible as possible you should set a node and/or depth limit instead of a time limit.I try to explain why it is worth to use only one process to test some changes of engine.
If you starts a test between two the same (identical) engines in conditions:
1 CPU, time per move (not too short !), clear hash before each game and e.g. 10 games (5 random position x 2) you should receive the result 5:5.
-
- Posts: 1136
- Joined: Sun Feb 14, 2010 10:02 pm
Re: cutechess-cli question
Thanks for you explain.
Maybe somenoe has an experience in one process testing?ilari wrote: It doesn't work quite like that. Even if you set a fixed amount of time per move, the amount of CPU cycles used per move will vary, even if you used only one engine process.
Maybe, I can't be friendly, but let me be useful.
-
- Posts: 27819
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: cutechess-cli question
When I developed micro-Max I was testing it in self-play as a single process. The results were just as random as normal, of course. The chances that self-play would end in 5-5 are quite small with any engine I know. Even highly deterministic engines like micro-Max and Eden usually don't play the same game twice. Let alone engines that have hash tables and more advanced time management.
-
- Posts: 303
- Joined: Sat Apr 28, 2012 6:18 pm
- Location: Austin, TX
Re: cutechess-cli question
I know it seems like this should be the case, but in reality it won't be. The timer resolutionlech wrote:If you starts a test between two the same (identical) engines in conditions:
1 CPU, time per move (not too short !), clear hash before each game and e.g. 10 games (5 random position x 2) you should receive the result 5:5. If not; it means that the conditions for both engines were not same (hardware, software (OS) or cutechess-cli).
simply isn't accurate enough and you will see time jitter between the game runs. Even if the
difference is one or two extra nodes in the search it will eventually result in one of the
engines playing a different move--and BANG! you're done.
To get the 50% result you're looking for, you will need to play *thousands* of games and
the result will asymptotically approach 50%, jittering slightly above and below it. In the
old days we used to play a 100 games and declare victory. Remi's analysis and Bob's
cluster disabused us of that naive notion.
Unfortunately, it's the required sacrifice we all make to the Goddess of Statistics and Chaos.
regards,
--tom
-
- Posts: 1136
- Joined: Sun Feb 14, 2010 10:02 pm
Re: cutechess-cli question
I think it is a problrm of calibration only.
If some progranners need more than 100 games to know whcih version is better, it means that they use engines and computers as dice, not Rubik's cube.
If some progranners need more than 100 games to know whcih version is better, it means that they use engines and computers as dice, not Rubik's cube.
Maybe, I can't be friendly, but let me be useful.