Crafty and Stockfish question

Tord Romstad · Post by **Tord Romstad** » Fri Jul 16, 2010 11:38 pm

jdart wrote:You could try tapering off or even removing the bonus for above-average mobility while keeping penalties for low mobility - if you are not doing that already.

That's pretty much what we do (except for the queen, which has very low mobility scores) in Stockfish. The penalties for zero mobility are quite big, but the bonus for an extra available move slowly drops off as the mobility of the piece increases. This table summarizes our middle game mobility evaluation bonuses:

Code: Select all

Mobility  0     1     2     3     4    5    6    7    8    9    10   11   12   13   14  >=15
--------------------------------------------------------------------------------------------
Knight  -0.19 -0.13 -0.06  0.00  0.06 0.13 0.16 0.19 0.19
Bishop  -0.13 -0.06  0.02  0.09  0.16 0.23 0.29 0.33 0.36 0.37 0.38 0.39 0.40 0.40 0.41 0.41
Rook    -0.10 -0.07 -0.04 -0.01  0.02 0.05 0.07 0.10 0.12 0.13 0.14 0.14 0.15 0.15 0.16 0.16
Queen   -0.05 -0.04 -0.03 -0.02 -0.01 0.01 0.02 0.03 0.04 0.05 0.06 0.08 0.08 0.09 0.09 0.10

An idea I have used before, which I will probably try again some day, is to introduce an additional penalty for multiple passive pieces. For each piece, determine whether it is passive (by comparing its mobility to some constant, which should probably be specific for each piece type), and use the number of passive pieces as the index to a lookup table of penalties.

Now to the question of why such high mobility scores work: I don't know, and honestly I wasn't even aware that our mobility scores were unreasonably high (I'm not a chess player). If they are, here's a possible explanation of why these high bonuses work so well in practice:

Except in positions with a single long, completely forced line, the quality of the last few moves of a long PV usually isn't high. The position at the end of the PV will never appear on the board. When the program has an advantage in mobility at the position at the and of the PV, however, it probably also has an advantage in mobility at most positions close to the end position in the search tree. This means that in the last few nodes along the PV, where the PV moves are probably not good, the program will probably have many reasonable alternative moves and the opponent considerably fewer. High mobility scores therefore steers the search towards areas of the tree where there is a good chance to find unexpected resources for the program, and not for the opponent.

Maximizing the chance of pleasant surprises towards the end of the PV while minimizing the chance of unpleasant surprises seems like a good idea, in general.

Ralph Stoesser · Post by **Ralph Stoesser** » Sat Jul 17, 2010 12:27 am

lkaufman wrote:We do that now for each piece separately, but we never tried doing it for total mobility. Worth a test. But the problem also applies to piece location tables.

Maybe you could try it for mobility and piece locations as a unit. The mobility of a piece and the piece square table values are in general related.

In the end we evaluate the difference between white's mobility/piece locations and black's mobility/piece locations. If white has very high values and black has default or slightly below default values, the difference probably could be scaled down compared to the case when white is above average by the same amount as black is below average.

But what exactly is an "average" or good enough mobility value in a given position? I would assume that should depend mainly on game phase (number of pieces) and pawn structure.

lkaufman · Post by **lkaufman** » Sat Jul 17, 2010 12:42 am

Tord Romstad wrote:
Now to the question of why such high mobility scores work: I don't know, and honestly I wasn't even aware that our mobility scores were unreasonably high (I'm not a chess player). If they are, here's a possible explanation of why these high bonuses work so well in practice:

Except in positions with a single long, completely forced line, the quality of the last few moves of a long PV usually isn't high. The position at the end of the PV will never appear on the board. When the program has an advantage in mobility at the position at the and of the PV, however, it probably also has an advantage in mobility at most positions close to the end position in the search tree. This means that in the last few nodes along the PV, where the PV moves are probably not good, the program will probably have many reasonable alternative moves and the opponent considerably fewer. High mobility scores therefore steers the search towards areas of the tree where there is a good chance to find unexpected resources for the program, and not for the opponent.

Maximizing the chance of pleasant surprises towards the end of the PV while minimizing the chance of unpleasant surprises seems like a good idea, in general.

That seems like a good theory. The question now arises: Can I have my cake and eat it too? In other words, is there a way to steer the search towards such promising areas without resorting to artificially high mobility scores? The problem is that programs which use such unrealisitic scores are not very useful for opening analysis by humans, because the evaluations are just way out of line with results in actual play (whether human or engine play) from positions where one side has much more mobility but worse structure. Humans tend to prefer (and score better from) the positions with the better structure but worse mobility.

Michael Sherwin · Post by **Michael Sherwin** » Sat Jul 17, 2010 1:53 am

I will try to be helpful once more. Is a position even NOW or is it even LATER? The Sicilian at one time in human history was considered inferior. Not through evaluation was it decided to be even, but through a couple hundred years of human search. So why is the computer evaluation wrong for seeing whites position as being better. It is better, but through good play it can be equalized. Computers and humans have different strengths and weaknesses. Computers can use mobility better than humans as they see everything that is close. Humans miss a lot of close things, however, they know that certain structures are safer or not so safe and therefore they rely more on structure.

lkaufman · Post by **lkaufman** » Sat Jul 17, 2010 1:59 am

I don't disagree that White is better after 1e4 c5, but to evaluate this position, especially after a 25 ply search or so, as +.60 or +.75 is ridiculous. White's edge should be something like a quarter pawn or so, without requiring really deep analysis to prove this.

Karlo Bala · Post by **Karlo Bala** » Sat Jul 17, 2010 3:28 am

lkaufman wrote: Humans tend to prefer (and score better from) the positions with the better structure but worse mobility.

Because, humans likes safety (or a good prediction of the outcome), and the structure is more stable element then mobility. Also, humans have better skills in the use of good/bad structure then god/bad mobility.

bob · Post by **bob** » Sat Jul 17, 2010 4:59 am

lkaufman wrote:Both Crafty and Stockfish (and many other programs) use very large scores for dynamic factors (especially mobility I think), which have been chosen because they produce the best results in autotesting at fast time controls. These scores produce evaluations that look obviously wrong, even ridiculous, in normal opening positions. For example I recall that some versions of Crafty evaluate the position after 1e4 c5 as something like 3/4 of a pawn better for White! The question is why do such apparently unreasonably large dynamic values test so well? Would the optimum values be lower at longer time limits, or is this just a difference between computer chess and human chess? Rybka and Komodo also overvalue dynamic factors compared to humans, but to a lesser extent. I wonder if any strong program evaluates these factors in a way that is consistent with human results.

Since I've known you for at least 15+ years now, dating back to *socrates and such, the first question I have to ask is simply "What makes you so sure that your 'human evaluation' is the 'right one'"???

in any case, for Crafty, the early big scores are all about castling and development. Once castling is over, things become more sane. Early development is a real trick, since I don't want to depend on my opening book for getting me past the castling point. It took a ton of tuning and for the early stages, I was much less worried about the actual score returned than about the move actually played.

You are right that those numbers were vetted quite heavily thru testing, but I really need to give 'em another tuning, but I need some different starting positions that are way earlier in the game...

bob · Post by **bob** » Sat Jul 17, 2010 5:06 am

lkaufman wrote:The point is though that search leads to positions where White has more mobility and "better" piece location, which Stockfish evaluates way too optimistically. I am rather sure that if Stockfish played both sides of this position thousands of times (using the randomizer for variety) White would get only a modest plus, nothing commensurate with a +0.60 score. This is what I am talking about. Why are the scores so unrealistic, even in terms of expected engine vs. engine results, and yet they test so well at blitz?

Please leave king safety out of this discussion, although I admit it is a "dynamic" term. I'm talking about things like mobility and piece location tables, which are on average valid but in many situations are misleading.

For crafty, it is pretty easy to grasp the score. For the position from the second post in this thread, you can discover via the "score" command that some of this is from development (knights on the edge, unconnected rooks, uncastled, etc. Not much of that comes from mobility in our case, it is mainly the special case "uncastled development scoring"...

Remember, it is not the score that counts, it is the move. I suppose everyone could just add in a -50 constant to their scores and make them appear more conservative, but it would not change the move at all...

lkaufman · Post by **lkaufman** » Sat Jul 17, 2010 5:13 am

My "proof" that my human evaluation (and that of all GMs) is the right one is that even in engine databases, the openings score more or less similarly to the results in human play (in most cases). Thus you would never find 75% scores for White in any database of Sicilian games, which might be expected from the huge evals in Crafty and Stockfish. I also try randomized playouts with Rybka from major openings and usually get similar results to human databases, i.e. scores around 55% for White.

As for earlier starting positions, what do you think about randomized moves for the first N ply, filtering out all positions unbalanced by more than X?

lkaufman · Post by **lkaufman** » Sat Jul 17, 2010 5:37 am

bob wrote:[quote
For crafty, it is pretty easy to grasp the score. For the position from the second post in this thread, you can discover via the "score" command that some of this is from development (knights on the edge, unconnected rooks, uncastled, etc. Not much of that comes from mobility in our case, it is mainly the special case "uncastled development scoring"...

Remember, it is not the score that counts, it is the move. I suppose everyone could just add in a -50 constant to their scores and make them appear more conservative, but it would not change the move at all...

Changing the scores by a constant would solve nothing, because they are interpreted relative to material and to static factors. The issue is about the relative weighting of static vs. dynamic factors (leaving out king safety as it has elements of both). Perhaps I am mistaken about Crafty overweighting dynamics; I have spent far more time with Stockfish which displays similar behavior in the opening. For me (and surely many others) what I want most from an engine is to get an accurate evaluation of an opening line (which may extend all the way to the endgame!). I put the scores in an IDeA tree using Aquarium and research openings this way. If the evals systematically overrate positions where White has more mobility, it will be "recommending" the wrong lines. So for me, a correct eval of the end node is more important than the rating of an engine.

Crafty and Stockfish question

Re: Crafty and Stockfish question

Re: Crafty and Stockfish question

Re: Crafty and Stockfish question

Re: Crafty and Stockfish question

Re: Crafty and Stockfish question

Re: Crafty and Stockfish question

Re: Crafty and Stockfish question

Re: Crafty and Stockfish question

Re: Crafty and Stockfish question

Re: Crafty and Stockfish question