Crafty and Stockfish question

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Crafty and Stockfish question

Post by lkaufman »

Both Crafty and Stockfish (and many other programs) use very large scores for dynamic factors (especially mobility I think), which have been chosen because they produce the best results in autotesting at fast time controls. These scores produce evaluations that look obviously wrong, even ridiculous, in normal opening positions. For example I recall that some versions of Crafty evaluate the position after 1e4 c5 as something like 3/4 of a pawn better for White! The question is why do such apparently unreasonably large dynamic values test so well? Would the optimum values be lower at longer time limits, or is this just a difference between computer chess and human chess? Rybka and Komodo also overvalue dynamic factors compared to humans, but to a lesser extent. I wonder if any strong program evaluates these factors in a way that is consistent with human results.
jdart
Posts: 4367
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Crafty and Stockfish question

Post by jdart »

It is only a general impression but I have observed that both Crafty and Rybka have modest size scores for king safety. Many other programs will give large scores - up to several pawns - for having an exposed King or having attackers around the king. Scorpio is one example, so is Junior.

The good thing about large king safety scores is that you play attacks well in most cases and sometimes you find spectacular sacs. The bad thing is that you can overvalue attacking potential and wind up just down material or in a worse position after the attack fizzles.

Large scoring terms in the eval also make aggressive pruning more difficult, especially if they can't be estimated or computed rapidly.

So given Rybka and Crafty are extensively tested I can surmise that they found large scores in this area counter-productive. But they may be over-emphasizing other factors like mobility, still.

By the way, I mention Rybka here not Stockflish because I have not much experience with Stockfish's eval.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Crafty and Stockfish question

Post by Michael Sherwin »

Take this position for example:

[D]rnbqk2r/pp3pbp/3p1np1/2pP4/4P3/2N2N2/PP3PPP/R1BQKB1R w KQkq - 2 8

Many engines give an almost winning value to white in this position while humans would evaluate it as being about equal with chances for both sides. Both are correct. Maybe though the computer more so as it is very easy for black to loose. However, with good play black can create a truly dynamic position with chances for both sides that is just as easy for white to loose. Having good engines on both sides tends to bring the score towards zero as it should be.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Ralph Stoesser
Posts: 408
Joined: Sat Mar 06, 2010 9:28 am

Re: Crafty and Stockfish question

Post by Ralph Stoesser »

Another example

[D]r1bqkbnr/pp1ppppp/2n5/1Bp5/4P3/5N2/PPPP1PPP/RNBQK2R b KQkq - 3 3

Stockfish evaluates this position much too high for a long time, around +0.60.
It's not the static eval which evaluates this position too high, the high values belong from search.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Crafty and Stockfish question

Post by lkaufman »

The point is though that search leads to positions where White has more mobility and "better" piece location, which Stockfish evaluates way too optimistically. I am rather sure that if Stockfish played both sides of this position thousands of times (using the randomizer for variety) White would get only a modest plus, nothing commensurate with a +0.60 score. This is what I am talking about. Why are the scores so unrealistic, even in terms of expected engine vs. engine results, and yet they test so well at blitz?

Please leave king safety out of this discussion, although I admit it is a "dynamic" term. I'm talking about things like mobility and piece location tables, which are on average valid but in many situations are misleading.
User avatar
Graham Banks
Posts: 41468
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Crafty and Stockfish question

Post by Graham Banks »

There are certainly some openings that computers play extremely poorly and with little success. The Black side of the Sicilian Pelikan is one that springs to mind.
gbanksnz at gmail.com
jdart
Posts: 4367
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Crafty and Stockfish question

Post by jdart »

> This is what I am talking about. Why are the scores so unrealistic, even in terms of expected engine vs. engine results, and yet they test so well at blitz?

I think mobility scoring may help in a negative way - it helps you avoid positions where you have a persistent long term cramped position that can eventually turn into a fatal weakness or loss of material. Players at Master level and above can be very skilled at inflicting this on their opponents. I am not sure how often it occurs but I've certainly seen games lost because the program drifted into a horrible position, with a locked-in bishop or buried rook. That's like playing a piece down.

But in ordinary situations mobility scoring may be just "noise" - it doesn't harm you to aim for higher mobility or better piece placement but it doesn't help you much either. absent other factors.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Crafty and Stockfish question

Post by lkaufman »

So how can we avoid the "noise" while still avoiding the really bad positions? I suppose we could score the square of the mobility difference between the two sides or something like that but this doesn't feel right to me.
jdart
Posts: 4367
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Crafty and Stockfish question

Post by jdart »

You could try tapering off or even removing the bonus for above-average mobility while keeping penalties for low mobility - if you are not doing that already. But only test can tell if that is a good change or not.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Crafty and Stockfish question

Post by lkaufman »

We do that now for each piece separately, but we never tried doing it for total mobility. Worth a test. But the problem also applies to piece location tables.