Cutechess-cli questions

Rebel · Post by **Rebel** » Fri May 11, 2012 1:08 pm

Found cutechess-cli yesterday, got it to work and like it. The readme file states version 6. Three issues:

-resign <n> <score>&#58;&#58;
	Adjudicate the game as a loss if an engine's score is at least
	<score> centipawns below zero for at least <n> consecutive moves.

Say I use -resign 3 -800 then the interface will terminate a game when the score is below 8 pawns for 3 moves?

Code: Select all

-draw <n> <score>&#58;&#58;
	Adjudicate the game as a draw if the score of both engines is
	within <score> centipawns from zero after <n> full moves have been played.

Any experiences with this option?

Third and last question, can't find an option to terminate a game as a draw after xxx moves. I assume you are supposed to solve that with the above -draw option?

lucasart · Post by **lucasart** » Fri May 11, 2012 1:41 pm

Rebel wrote:Found cutechess-cli yesterday, got it to work and like it.

Me too. I'm a *huge fan* of cutechess-cli

Rebel wrote: Say I use -resign 3 -800 then the interface will terminate a game when the score is below 8 pawns for 3 moves?

Yes, without the "-" which is a mistype I guess? Use "-resign 3 800". It should be 3 real moves (not half moves) and during these 6 half moves, both engines should have a score > 800 or < -800 if it's been coded correctly (which I'm sure it is although haven't verified).

Rebel wrote:

Code: Select all

-draw <n> <score>&#58;&#58;
	Adjudicate the game as a draw if the score of both engines is
	within <score> centipawns from zero after <n> full moves have been played.

Any experiences with this option?

I never use this. The very idea of stopping a game after X moves regardless of any score consideration seems totally crazy. At least there should be a pre-conditions that checks that both engines have more or less stalled scores for a long period before or sth like that.

But generally I don't like using these options and don't use them anymore. The resign feature, even with a high threshold makes blunders in crazy rook themes for example. And in my experience this happens more often than people care to admit.

As for declaring the game a draw after X moves without any prior conditions, I think it's a pretty silly idea. But anyway, the feature does work. Maybe Ilari put some preconditions in his code and didn't document them or had it in his todo list somewhere.

Rebel wrote: Third and last question, can't find an option to terminate a game as a draw after xxx moves. I assume you are supposed to solve that with the above -draw option?

That's quite a strange feature. I don't see how it could make any sense. Do you want to terminate and not declare the game a draw simply remove it ? So as to create a selection biais in your sample. For example is A is better in endgames than B, you'll see that the average game lengh when A wins is higher than when B wins (B wins tactical shots, A if it resists outplays B in endgame let's say). Then removing all games lasting more than say 100 moves you remove more games where A would have won than B, no ?

Rebel · Post by **Rebel** » Fri May 11, 2012 3:36 pm

A couple of examples:

1. An endgame with unequal bishops and a handful of pawns for each side. It goes like this: for 50 moves move the king and bishop around then play a pawn move and the circus starts again.

2. Queen ending. Same pattern as (1).

Experience and thorough testing has learned me that after 160 moves it's pretty safe to declare the game as a draw in combination with a -resign 5 pawn threshold. If there are exceptions after all these are equally divided on both engines most of the time or are statistically insignificant on the total amount of the games you play.

I just did a comparison (reliability check) with Arena and the output is exactly (100%) the same. Arena with 160 moves=draw and resign=500 and cutechess-cli without those checks.

So I think I will use -draw 160 100 and -resign 5 500 for a good speed-up.

Thanks for answers.

ilari · Post by **ilari** » Fri May 11, 2012 5:19 pm

lucasart wrote:
Rebel wrote: Say I use -resign 3 -800 then the interface will terminate a game when the score is below 8 pawns for 3 moves?
Yes, without the "-" which is a mistype I guess? Use "-resign 3 800". It should be 3 real moves (not half moves) and during these 6 half moves, both engines should have a score > 800 or < -800 if it's been coded correctly (which I'm sure it is although haven't verified).

Actually the engines don't have to agree on the score, only the losing engine's score matters.

lucasart wrote:
Rebel wrote:
Code: Select all
-draw <n> <score>&#58;&#58;
	Adjudicate the game as a draw if the score of both engines is
	within <score> centipawns from zero after <n> full moves have been played.
Any experiences with this option?
I never use this. The very idea of stopping a game after X moves regardless of any score consideration seems totally crazy. At least there should be a pre-conditions that checks that both engines have more or less stalled scores for a long period before or sth like that.

It's on the agenda to change the "-draw" option so that the user could specify a number of consecutive scores within the treshold. Right now it's safest to use a large move number and a small treshold.

But generally I don't like using these options and don't use them anymore. The resign feature, even with a high threshold makes blunders in crazy rook themes for example. And in my experience this happens more often than people care to admit.

True, there's always a compromise involved when adjudications are used. When testing an engine one should always first make sure that the engine really is capable of winning almost all of those "won" games and only then start using adjudications. But if used correctly adjudications can save a lot of time.

jdart · Post by **jdart** » Fri May 11, 2012 6:34 pm

I have seen cases where the score gets above +5 or below -5 and is still drawn. So I use 700 as a threshold for declaring a win/loss.

--Jon

syzygy · Post by **syzygy** » Fri May 11, 2012 9:39 pm

jdart wrote:I have seen cases where the score gets above +5 or below -5 and is still drawn. So I use 700 as a threshold for declaring a win/loss.

But maybe in such cases the engine with +5 did play a much better game than the other engine, so that for the purpose of getting accurate statistical results, awarding it a full point through adjudication might be better than counting the game as a draw. (And then there's the time gained, which means more games to be played and better statistics to be gathered.)

Of course it depends on what you're testing.

lucasart · Post by **lucasart** » Sat May 12, 2012 3:37 am

jdart wrote:I have seen cases where the score gets above +5 or below -5 and is still drawn. So I use 700 as a threshold for declaring a win/loss.

--Jon

And I have seen cases where the score is over +10 and it's still a draw. They were all crazy rook themes: this pattern is a real nightmare, and I can't think of a way to fix this in my search w/o incurring any cost that would have a negative tradeoff.

One could argue that these cases are rare and their occurrence would be equally split between each engine afterall, as Ed mentionned, so it shouldn't create any measurable biais.

hgm · Post by **hgm** » Sat May 12, 2012 9:00 am

You could make the engine suspicious of branches that have a large number of (reversible) consecutive checks in them at the end. You could apply this at the last level where you still consider evations (e.g. in an check-evasion node of QS). Normally you would do a repetition test, by comparing hash key with that of previous nodes, upto the last reversible move. But you could also test the in-check variable of these nodes, and then multiply the evaluation by a factor < 1 depending on the number of checks. (E.g. 1 or 2 checks would probably not be significant, and still use factor 1, and for 3, 4, 5, ... checks you could use factors 0.6, 0.3, 0.1, ...)

This should hadly take any time, as check-evasion nodes will not be very common.

To improve reliability of hashing you could also keep a running count of the number of consecutive checks (i.e. set the in-check variable to that of two ply earlier + 1 when you are in check, and reset to 0 when not), and add a spoiler key to the hash key whenever the in-check is above 2.

Cutechess-cli questions

Cutechess-cli questions

Re: Cutechess-cli questions

Re: Cutechess-cli questions

Re: Cutechess-cli questions

Re: Cutechess-cli questions

Re: Cutechess-cli questions

Re: Cutechess-cli questions

Re: Cutechess-cli questions