testing eval

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

outAtime
Posts: 226
Joined: Sun Mar 08, 2009 3:08 pm
Location: Canada

testing eval

Post by outAtime »

How should I go about testing the output from a mobility score in eval is correct? How would I be able to see just this score while the engine is running?
Thanks!
outAtime
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: testing eval

Post by bob »

outAtime wrote:How should I go about testing the output from a mobility score in eval is correct? How would I be able to see just this score while the engine is running?
Thanks!
I have a procedure "evtest" that just sucks in a set of FEN positions and calls Evaluate() after each new position. Doing that, inside Evaluate() you can print whatever you want. You only get output once per position (no search is done). You can find a couple of dozen positions and look at the resulting mobility scores to see if you are happy. And particularly, you can look for really big or small numbers to highlight errors such as subscripts out of range, uninitialized data, etc.

My usage is simply "evtest filename." Another good idea is, if you have a fully symmetric evaluation as I do, Call evaluate. Then reflect the board (interchange 1st and 8th ranks and flip color of all pieces, then 2nd and 7th, etc. and call Evaluate again. Then mirror by doing the same thing with a and h files, then b and g, etc.

That gives you 4 positions that are identical except for perspective. Your eval should not vary except that when you reflect the sign should change since white pieces become black and vice versa. The four positions are normal, reflected, normal-mirrored, reflected-mirrored. Any change in number (other than the sign) shows an asymmetry in the evaluation that should be fixed.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: testing eval

Post by sje »

I agree with Bob's suggestion to use a white/black reversal that includes a reflection of the position about the x-axis. This simple test can uncover a lot of bugs and I've used it in my code.

However, a reversal about the y-axis may not work in all cases. For example, an evaluation function can reward/penalize pawn advancement asymmetrically with respect to flanks and still be valid. The Northwestern Chess 4.x program did this for bishop pawns and perhaps other programs have similar code.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: testing eval

Post by michiguel »

sje wrote:I agree with Bob's suggestion to use a white/black reversal that includes a reflection of the position about the x-axis. This simple test can uncover a lot of bugs and I've used it in my code.
I suggest to have that code in every node in search (debug mode, of course). Using few positions is not enough to guarantee the maximum sanitization of bugs. That has helped me in the past and I routinely turn it on whenever I modify eval.

However, a reversal about the y-axis may not work in all cases. For example, an evaluation function can reward/penalize pawn advancement asymmetrically with respect to flanks and still be valid. The Northwestern Chess 4.x program did this for bishop pawns and perhaps other programs have similar code.
I think this is philosophically flawed, but that is another issue.

Miguel
JVMerlino
Posts: 1357
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Re: testing eval

Post by JVMerlino »

bob wrote:
outAtime wrote:How should I go about testing the output from a mobility score in eval is correct? How would I be able to see just this score while the engine is running?
Thanks!
Another good idea is, if you have a fully symmetric evaluation as I do, Call evaluate. Then reflect the board (interchange 1st and 8th ranks and flip color of all pieces, then 2nd and 7th, etc. and call Evaluate again. Then mirror by doing the same thing with a and h files, then b and g, etc.

That gives you 4 positions that are identical except for perspective. Your eval should not vary except that when you reflect the sign should change since white pieces become black and vice versa. The four positions are normal, reflected, normal-mirrored, reflected-mirrored. Any change in number (other than the sign) shows an asymmetry in the evaluation that should be fixed.
Agreed. Doing this helped me find several symmetry bugs in my engine. The only asymmetrical evaluation that I decided to keep was regarding castling kingside vs. queenside. So reflecting a castled king will produce a slightly different score.

jm
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: testing eval

Post by bob »

sje wrote:I agree with Bob's suggestion to use a white/black reversal that includes a reflection of the position about the x-axis. This simple test can uncover a lot of bugs and I've used it in my code.

However, a reversal about the y-axis may not work in all cases. For example, an evaluation function can reward/penalize pawn advancement asymmetrically with respect to flanks and still be valid. The Northwestern Chess 4.x program did this for bishop pawns and perhaps other programs have similar code.
I'm not sure why one would be asymmetric. If you flip about the Y-axis (mirror) the positions are identical, assuming no castling rights of course. Castling is certainly asymmetric in nature, but I always test positions after castling to avoid that issue.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: testing eval

Post by bob »

JVMerlino wrote:
bob wrote:
outAtime wrote:How should I go about testing the output from a mobility score in eval is correct? How would I be able to see just this score while the engine is running?
Thanks!
Another good idea is, if you have a fully symmetric evaluation as I do, Call evaluate. Then reflect the board (interchange 1st and 8th ranks and flip color of all pieces, then 2nd and 7th, etc. and call Evaluate again. Then mirror by doing the same thing with a and h files, then b and g, etc.

That gives you 4 positions that are identical except for perspective. Your eval should not vary except that when you reflect the sign should change since white pieces become black and vice versa. The four positions are normal, reflected, normal-mirrored, reflected-mirrored. Any change in number (other than the sign) shows an asymmetry in the evaluation that should be fixed.
Agreed. Doing this helped me find several symmetry bugs in my engine. The only asymmetrical evaluation that I decided to keep was regarding castling kingside vs. queenside. So reflecting a castled king will produce a slightly different score.

jm
It almost has to there, but I test with no castling rights to solve that. Then the mirrored positions really are identically equal.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: testing eval

Post by sje »

bob wrote:I'm not sure why one would be asymmetric. If you flip about the Y-axis (mirror) the positions are identical, assuming no castling rights of course. Castling is certainly asymmetric in nature, but I always test positions after castling to avoid that issue.
My memory certainly isn't as good as it used to be, so I had to check my copy of Frey's Chess Skill in Man and Machine (1st edition). And lo and behold, on page 96 in the discussion of the pawn evaluation code of Chess 4.x we see the pawn advancement bonus multiplier vector, indexed by file:

Code: Select all

[0 0 3.9 5.4 7.0 2.3 0 0]
And that's asymmetric and would have to be reflected for a y-axis rotation score test to work.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: testing eval

Post by sje »

Also, for positions with no pawns it might be useful to reflect the board about the lines y = x and y = -x to see if the evaluation changes.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: testing eval

Post by bob »

sje wrote:
bob wrote:I'm not sure why one would be asymmetric. If you flip about the Y-axis (mirror) the positions are identical, assuming no castling rights of course. Castling is certainly asymmetric in nature, but I always test positions after castling to avoid that issue.
My memory certainly isn't as good as it used to be, so I had to check my copy of Frey's Chess Skill in Man and Machine (1st edition). And lo and behold, on page 96 in the discussion of the pawn evaluation code of Chess 4.x we see the pawn advancement bonus multiplier vector, indexed by file:

Code: Select all

[0 0 3.9 5.4 7.0 2.3 0 0]
And that's asymmetric and would have to be reflected for a y-axis rotation score test to work.
Won't disagree, although it is a bad idea to do things that way. If you were to play a game and actually mirror left-to-right, would _your_ evaluation of the position really change? Mine wouldn't. not one point.