testing eval

outAtime · Post by **outAtime** » Sat Apr 23, 2011 8:57 pm

How should I go about testing the output from a mobility score in eval is correct? How would I be able to see just this score while the engine is running?
Thanks!

bob · Post by **bob** » Sat Apr 23, 2011 10:28 pm

outAtime wrote:How should I go about testing the output from a mobility score in eval is correct? How would I be able to see just this score while the engine is running?
Thanks!

I have a procedure "evtest" that just sucks in a set of FEN positions and calls Evaluate() after each new position. Doing that, inside Evaluate() you can print whatever you want. You only get output once per position (no search is done). You can find a couple of dozen positions and look at the resulting mobility scores to see if you are happy. And particularly, you can look for really big or small numbers to highlight errors such as subscripts out of range, uninitialized data, etc.

My usage is simply "evtest filename." Another good idea is, if you have a fully symmetric evaluation as I do, Call evaluate. Then reflect the board (interchange 1st and 8th ranks and flip color of all pieces, then 2nd and 7th, etc. and call Evaluate again. Then mirror by doing the same thing with a and h files, then b and g, etc.

That gives you 4 positions that are identical except for perspective. Your eval should not vary except that when you reflect the sign should change since white pieces become black and vice versa. The four positions are normal, reflected, normal-mirrored, reflected-mirrored. Any change in number (other than the sign) shows an asymmetry in the evaluation that should be fixed.

sje · Post by **sje** » Sun Apr 24, 2011 12:38 am

I agree with Bob's suggestion to use a white/black reversal that includes a reflection of the position about the x-axis. This simple test can uncover a lot of bugs and I've used it in my code.

However, a reversal about the y-axis may not work in all cases. For example, an evaluation function can reward/penalize pawn advancement asymmetrically with respect to flanks and still be valid. The Northwestern Chess 4.x program did this for bishop pawns and perhaps other programs have similar code.

michiguel · Post by **michiguel** » Sun Apr 24, 2011 12:50 am

sje wrote:I agree with Bob's suggestion to use a white/black reversal that includes a reflection of the position about the x-axis. This simple test can uncover a lot of bugs and I've used it in my code.

I suggest to have that code in every node in search (debug mode, of course). Using few positions is not enough to guarantee the maximum sanitization of bugs. That has helped me in the past and I routinely turn it on whenever I modify eval.

However, a reversal about the y-axis may not work in all cases. For example, an evaluation function can reward/penalize pawn advancement asymmetrically with respect to flanks and still be valid. The Northwestern Chess 4.x program did this for bishop pawns and perhaps other programs have similar code.

I think this is philosophically flawed, but that is another issue.

Miguel

JVMerlino · Post by **JVMerlino** » Sun Apr 24, 2011 1:27 am

bob wrote:
outAtime wrote:How should I go about testing the output from a mobility score in eval is correct? How would I be able to see just this score while the engine is running?
Thanks!
Another good idea is, if you have a fully symmetric evaluation as I do, Call evaluate. Then reflect the board (interchange 1st and 8th ranks and flip color of all pieces, then 2nd and 7th, etc. and call Evaluate again. Then mirror by doing the same thing with a and h files, then b and g, etc.

That gives you 4 positions that are identical except for perspective. Your eval should not vary except that when you reflect the sign should change since white pieces become black and vice versa. The four positions are normal, reflected, normal-mirrored, reflected-mirrored. Any change in number (other than the sign) shows an asymmetry in the evaluation that should be fixed.

Agreed. Doing this helped me find several symmetry bugs in my engine. The only asymmetrical evaluation that I decided to keep was regarding castling kingside vs. queenside. So reflecting a castled king will produce a slightly different score.

jm

bob · Post by **bob** » Sun Apr 24, 2011 2:29 am

sje wrote:I agree with Bob's suggestion to use a white/black reversal that includes a reflection of the position about the x-axis. This simple test can uncover a lot of bugs and I've used it in my code.

However, a reversal about the y-axis may not work in all cases. For example, an evaluation function can reward/penalize pawn advancement asymmetrically with respect to flanks and still be valid. The Northwestern Chess 4.x program did this for bishop pawns and perhaps other programs have similar code.

I'm not sure why one would be asymmetric. If you flip about the Y-axis (mirror) the positions are identical, assuming no castling rights of course. Castling is certainly asymmetric in nature, but I always test positions after castling to avoid that issue.

bob · Post by **bob** » Sun Apr 24, 2011 2:30 am

JVMerlino wrote:
bob wrote:
outAtime wrote:How should I go about testing the output from a mobility score in eval is correct? How would I be able to see just this score while the engine is running?
Thanks!
Another good idea is, if you have a fully symmetric evaluation as I do, Call evaluate. Then reflect the board (interchange 1st and 8th ranks and flip color of all pieces, then 2nd and 7th, etc. and call Evaluate again. Then mirror by doing the same thing with a and h files, then b and g, etc.

That gives you 4 positions that are identical except for perspective. Your eval should not vary except that when you reflect the sign should change since white pieces become black and vice versa. The four positions are normal, reflected, normal-mirrored, reflected-mirrored. Any change in number (other than the sign) shows an asymmetry in the evaluation that should be fixed.
Agreed. Doing this helped me find several symmetry bugs in my engine. The only asymmetrical evaluation that I decided to keep was regarding castling kingside vs. queenside. So reflecting a castled king will produce a slightly different score.

jm

It almost has to there, but I test with no castling rights to solve that. Then the mirrored positions really are identically equal.

sje · Post by **sje** » Sun Apr 24, 2011 3:26 am

bob wrote:I'm not sure why one would be asymmetric. If you flip about the Y-axis (mirror) the positions are identical, assuming no castling rights of course. Castling is certainly asymmetric in nature, but I always test positions after castling to avoid that issue.

My memory certainly isn't as good as it used to be, so I had to check my copy of Frey's Chess Skill in Man and Machine (1st edition). And lo and behold, on page 96 in the discussion of the pawn evaluation code of Chess 4.x we see the pawn advancement bonus multiplier vector, indexed by file:

Code: Select all

&#91;0 0 3.9 5.4 7.0 2.3 0 0&#93;

And that's asymmetric and would have to be reflected for a y-axis rotation score test to work.

sje · Post by **sje** » Sun Apr 24, 2011 3:32 am

Also, for positions with no pawns it might be useful to reflect the board about the lines y = x and y = -x to see if the evaluation changes.

bob · Post by **bob** » Sun Apr 24, 2011 5:02 pm

sje wrote:
bob wrote:I'm not sure why one would be asymmetric. If you flip about the Y-axis (mirror) the positions are identical, assuming no castling rights of course. Castling is certainly asymmetric in nature, but I always test positions after castling to avoid that issue.
My memory certainly isn't as good as it used to be, so I had to check my copy of Frey's Chess Skill in Man and Machine (1st edition). And lo and behold, on page 96 in the discussion of the pawn evaluation code of Chess 4.x we see the pawn advancement bonus multiplier vector, indexed by file:
Code: Select all
&#91;0 0 3.9 5.4 7.0 2.3 0 0&#93;
And that's asymmetric and would have to be reflected for a y-axis rotation score test to work.

Won't disagree, although it is a bad idea to do things that way. If you were to play a game and actually mirror left-to-right, would _your_ evaluation of the position really change? Mine wouldn't. not one point.

testing eval

testing eval

Re: testing eval

Re: testing eval

Re: testing eval

Re: testing eval

Re: testing eval

Re: testing eval

Re: testing eval

Re: testing eval

Re: testing eval