Crafty and Stockfish question

lkaufman · Post by **lkaufman** » Fri Jul 16, 2010 7:02 pm

Both Crafty and Stockfish (and many other programs) use very large scores for dynamic factors (especially mobility I think), which have been chosen because they produce the best results in autotesting at fast time controls. These scores produce evaluations that look obviously wrong, even ridiculous, in normal opening positions. For example I recall that some versions of Crafty evaluate the position after 1e4 c5 as something like 3/4 of a pawn better for White! The question is why do such apparently unreasonably large dynamic values test so well? Would the optimum values be lower at longer time limits, or is this just a difference between computer chess and human chess? Rybka and Komodo also overvalue dynamic factors compared to humans, but to a lesser extent. I wonder if any strong program evaluates these factors in a way that is consistent with human results.

jdart · Post by **jdart** » Fri Jul 16, 2010 8:07 pm

It is only a general impression but I have observed that both Crafty and Rybka have modest size scores for king safety. Many other programs will give large scores - up to several pawns - for having an exposed King or having attackers around the king. Scorpio is one example, so is Junior.

The good thing about large king safety scores is that you play attacks well in most cases and sometimes you find spectacular sacs. The bad thing is that you can overvalue attacking potential and wind up just down material or in a worse position after the attack fizzles.

Large scoring terms in the eval also make aggressive pruning more difficult, especially if they can't be estimated or computed rapidly.

So given Rybka and Crafty are extensively tested I can surmise that they found large scores in this area counter-productive. But they may be over-emphasizing other factors like mobility, still.

By the way, I mention Rybka here not Stockflish because I have not much experience with Stockfish's eval.

Michael Sherwin · Post by **Michael Sherwin** » Fri Jul 16, 2010 8:20 pm

Take this position for example:

[d]rnbqk2r/pp3pbp/3p1np1/2pP4/4P3/2N2N2/PP3PPP/R1BQKB1R w KQkq - 2 8

Many engines give an almost winning value to white in this position while humans would evaluate it as being about equal with chances for both sides. Both are correct. Maybe though the computer more so as it is very easy for black to loose. However, with good play black can create a truly dynamic position with chances for both sides that is just as easy for white to loose. Having good engines on both sides tends to bring the score towards zero as it should be.

Ralph Stoesser · Post by **Ralph Stoesser** » Fri Jul 16, 2010 9:28 pm

Another example

[d]r1bqkbnr/pp1ppppp/2n5/1Bp5/4P3/5N2/PPPP1PPP/RNBQK2R b KQkq - 3 3

Stockfish evaluates this position much too high for a long time, around +0.60.
It's not the static eval which evaluates this position too high, the high values belong from search.

lkaufman · Post by **lkaufman** » Fri Jul 16, 2010 10:28 pm

The point is though that search leads to positions where White has more mobility and "better" piece location, which Stockfish evaluates way too optimistically. I am rather sure that if Stockfish played both sides of this position thousands of times (using the randomizer for variety) White would get only a modest plus, nothing commensurate with a +0.60 score. This is what I am talking about. Why are the scores so unrealistic, even in terms of expected engine vs. engine results, and yet they test so well at blitz?

Please leave king safety out of this discussion, although I admit it is a "dynamic" term. I'm talking about things like mobility and piece location tables, which are on average valid but in many situations are misleading.

Graham Banks · Post by **Graham Banks** » Fri Jul 16, 2010 10:36 pm

There are certainly some openings that computers play extremely poorly and with little success. The Black side of the Sicilian Pelikan is one that springs to mind.

jdart · Post by **jdart** » Fri Jul 16, 2010 10:37 pm

> This is what I am talking about. Why are the scores so unrealistic, even in terms of expected engine vs. engine results, and yet they test so well at blitz?

I think mobility scoring may help in a negative way - it helps you avoid positions where you have a persistent long term cramped position that can eventually turn into a fatal weakness or loss of material. Players at Master level and above can be very skilled at inflicting this on their opponents. I am not sure how often it occurs but I've certainly seen games lost because the program drifted into a horrible position, with a locked-in bishop or buried rook. That's like playing a piece down.

But in ordinary situations mobility scoring may be just "noise" - it doesn't harm you to aim for higher mobility or better piece placement but it doesn't help you much either. absent other factors.

lkaufman · Post by **lkaufman** » Fri Jul 16, 2010 11:00 pm

So how can we avoid the "noise" while still avoiding the really bad positions? I suppose we could score the square of the mobility difference between the two sides or something like that but this doesn't feel right to me.

jdart · Post by **jdart** » Fri Jul 16, 2010 11:09 pm

You could try tapering off or even removing the bonus for above-average mobility while keeping penalties for low mobility - if you are not doing that already. But only test can tell if that is a good change or not.

lkaufman · Post by **lkaufman** » Fri Jul 16, 2010 11:15 pm

We do that now for each piece separately, but we never tried doing it for total mobility. Worth a test. But the problem also applies to piece location tables.

Crafty and Stockfish question

Crafty and Stockfish question

Re: Crafty and Stockfish question

Re: Crafty and Stockfish question

Re: Crafty and Stockfish question

Re: Crafty and Stockfish question

Re: Crafty and Stockfish question

Re: Crafty and Stockfish question

Re: Crafty and Stockfish question

Re: Crafty and Stockfish question

Re: Crafty and Stockfish question