bob wrote:lkaufman wrote:bob wrote:[quote
For crafty, it is pretty easy to grasp the score. For the position from the second post in this thread, you can discover via the "score" command that some of this is from development (knights on the edge, unconnected rooks, uncastled, etc. Not much of that comes from mobility in our case, it is mainly the special case "uncastled development scoring"...
Remember, it is not the score that counts, it is the move. I suppose everyone could just add in a -50 constant to their scores and make them appear more conservative, but it would not change the move at all...
Changing the scores by a constant would solve nothing, because they are interpreted relative to material and to static factors. The issue is about the relative weighting of static vs. dynamic factors (leaving out king safety as it has elements of both). Perhaps I am mistaken about Crafty overweighting dynamics; I have spent far more time with Stockfish which displays similar behavior in the opening. For me (and surely many others) what I want most from an engine is to get an accurate evaluation of an opening line (which may extend all the way to the endgame!). I put the scores in an IDeA tree using Aquarium and research openings this way. If the evals systematically overrate positions where White has more mobility, it will be "recommending" the wrong lines. So for me, a correct eval of the end node is more important than the rating of an engine.
I spent literally months of time on "centralizing" the evaluation. And outside of the development issues, most scores are pretty well centered around zero. This may have changed during testing, since anything is possible there, but we always wanted "equal" positions to be somewhere near zero. But development is different and trickier. If you do that completely symmetrically, then you either develop a piece or prevent your opponent from developing a piece. This can backfire, thanks to tempo's importance.
There are certainly issues between "real" and "imagined" positional advantages that we do not handle very well (nor does any other program I have seen so far.)
Here's an idea I might could make happen:
1. Set up a bunch of buckets, say -10, -9.5, ..., 0.0, +.5, ..., +10. Those represent evaluation scores (explicitly Crafty of course).
2. Write a program to eat Crafty log files and first notice the result (win, lose or draw) and then look at each last evaluation displayed when making a move, to see what the outcome for that evaluation averages over millions of games.
3. From that, I could assign a probability of winning or losing for each evaluation, and then when I display an evaluation, I could map it through that function to convert it into a probability of winning from 0.0 to 1.0, and perhaps double that and subtract 1.0 so that the scores come out as -1.0 ... 0.0 ... +1.0 with -1.0 being a certain loss, +1.0 being a certain win, and scores distributed in that range.
That is doable, for Crafty only...