Stockfish - material balance/imbalance evaluation

Sven · Post by **Sven** » Fri May 07, 2010 11:48 am

mcostalba wrote:
Ralph Stoesser wrote:The bad bishop issue is not only a mobility issue, but a long term mobility issue.
In chess engines "long term" is called "search depth"
Ralph Stoesser wrote:By evaluating only the bishop's mobility, the engine has no knowledge about how deep behind the search horizon the mobility issue would be valid.
Are you sure ?

When we talk of "horizon" we are talking of about 20 moves ahead and counting...

For human chess players the existence of "bad" bishops vs. either "good" bishops or knights in the middlegame can be an important reason to decide about trading pieces and going into an endgame or not. This can really be a "long term" issue, especially if the pawn structure is quite stable due to blocked pawns. For an engine, even a 30-ply search in such middlegames may often end up in doing static evaluation of an early endgame position, where an estimate about "bad" bishops can be helpful.

Therefore I agree with Ralph stating that a bad bishop is not defined via current mobility only, i.e. the square color of pawns may be relevant, too, even if those pawns are currently not attacked or defended by the bishop.

Sven

Sven · Post by **Sven** » Fri May 07, 2010 12:31 pm

Sven Schüle wrote:
mcostalba wrote:
Ralph Stoesser wrote:The bad bishop issue is not only a mobility issue, but a long term mobility issue.
In chess engines "long term" is called "search depth"
Ralph Stoesser wrote:By evaluating only the bishop's mobility, the engine has no knowledge about how deep behind the search horizon the mobility issue would be valid.
Are you sure ?

When we talk of "horizon" we are talking of about 20 moves ahead and counting...
For human chess players the existence of "bad" bishops vs. either "good" bishops or knights in the middlegame can be an important reason to decide about trading pieces and going into an endgame or not. This can really be a "long term" issue, especially if the pawn structure is quite stable due to blocked pawns. For an engine, even a 30-ply search in such middlegames may often end up in doing static evaluation of an early endgame position, where an estimate about "bad" bishops can be helpful.

Therefore I agree with Ralph stating that a bad bishop is not defined via current mobility only, i.e. the square color of pawns may be relevant, too, even if those pawns are currently not attacked or defended by the bishop.

Consider this example position:
[d]
Even a shallow (e.g. 12 ply) search finds that white can win easily, everyone can figure this out. But strong human chess players, and not only the strongest ones, "know" from looking at the position, or can visualize during analysis, that black has the bad bishop and will lose by some kind of zugzwang. So strong humans will evaluate this position as "won" without further search, as soon as they detect that the white king is active enough to reach the fourth rank and oppose the black king. (You may consider this to be like "search", of course ...)

Stockfish 1.7.1 analysis, for instance:

Code: Select all

 12	+2.06	18747	0:00.08	Bd5 Be8 Kd4 Kd6 Ba2 Kc6 Ke5 Kc7 Kd5 Kd8 Kc5 Kd7 Kxb5 Kd6+ Kb6 Ke5 Bc4 Kf5

where a human analysis would probably include 6...Kc7 7.Bd5 (zugzwang) Kd8 8.Bc6 instead of 6...Kd7 7.Kxb5.

But there is no close relation to the current mobility of both bishops in this case. Black bishop has mobility to 7 squares, so who would call him "bad" based on only that? It is the fact that all black pawns are on the bishop's square color and blocked while he can't attack any single white pawn that makes him appear very bad.

Sven

Ralph Stoesser · Post by **Ralph Stoesser** » Fri May 07, 2010 12:54 pm

According to a great post from Tord about the evaluation in general, we could face the following problem.

When we penalize positions with blocked pawns of the same square color as the bishop's square color, we slightly lower the average bishop's value over all chess positions. This could be a problem especially in case the blockade of such pawns is not stable enough.

But because of the complicated material imbalance calculation with it's "magic" coefficients, it would be not that easy to adjust the general bishop's material value, because of possibly unforeseen side effects with other material piece values.

Ralph Stoesser · Post by **Ralph Stoesser** » Fri May 07, 2010 11:34 pm

Eelco de Groot wrote:
Stockfish may be difficult to improve, but it is always possible!

Regards, Eelco

SF is an stylish "open source" black box full of cryptic coefficients. No way to improve it manually. Only automated paramter tuning may raise the Elo bar. But the code for the paramter tuning is hidden source. So the SF team has the everlasting monopole in make it better. We stay spectators for ever, condemned to applaud when the next release with +154 Elo will arrive.

@Sven,
w.r.t your endgame example. I would say it is mainly about attacking possibilities and overloading the defending possibilities. Mobility is not main aspect in this position. also not "long term" mobility. But for sure I also had such positions in mind when I tried to evaluate bad bishops. So after some thinking I would say a bad bishop means much more than slight mobility, and it also means much more than "long term" slight mobility. I think that's the reason why the blocked pawn eval doesn't work.

michiguel · Post by **michiguel** » Sat May 08, 2010 12:31 am

Sven Schüle wrote:
mcostalba wrote:
Ralph Stoesser wrote:The bad bishop issue is not only a mobility issue, but a long term mobility issue.
In chess engines "long term" is called "search depth"
Ralph Stoesser wrote:By evaluating only the bishop's mobility, the engine has no knowledge about how deep behind the search horizon the mobility issue would be valid.
Are you sure ?

When we talk of "horizon" we are talking of about 20 moves ahead and counting...
For human chess players the existence of "bad" bishops vs. either "good" bishops or knights in the middlegame can be an important reason to decide about trading pieces and going into an endgame or not. This can really be a "long term" issue, especially if the pawn structure is quite stable due to blocked pawns. For an engine, even a 30-ply search in such middlegames may often end up in doing static evaluation of an early endgame position, where an estimate about "bad" bishops can be helpful.

Therefore I agree with Ralph stating that a bad bishop is not defined via current mobility only, i.e. the square color of pawns may be relevant, too, even if those pawns are currently not attacked or defended by the bishop.

Sven

The direct concept of good or bad bishop has little to do with mobility, current, or future (but many times it indirectly correlates with long term mobility, but correlation is not really a cause). The definition is difficult, even for human players, despite they could generally recognize the pattern. The way bad bishop could be defined is a bishop that is less compatible with the current structure of its pawns than the bishop from the other color (I know, not very useful...)

To make things more difficult, sometimes it is a good thing to have a bad bishop, particularly in certain defensive positions (because it could protect a key pawn). For instance, in many position of the sicilian Be7 with pawns in d6-e5, but this is a very modern and dynamic concept.

Miguel

mcostalba · Post by **mcostalba** » Sat May 08, 2010 2:48 pm

Ralph Stoesser wrote: SF is an stylish "open source" black box full of cryptic coefficients. No way to improve it manually. Only automated paramter tuning may raise the Elo bar. But the code for the paramter tuning is hidden source. So the SF team has the everlasting monopole in make it better. We stay spectators for ever, condemned to applaud when the next release with +154 Elo will arrive.

All the gain from 1.6 to 1.7 was done without any automatic tuning, all the search tweaks are done without any automatic tuning.

Automatic tuning was used just for evaluation and is now almost not used anymore (we have tuned everything possible

) so if you don't want to be spectator there is open rooms of possibilities....of course it doesn't mean that it is easy to improve, just that not having access to evaluation tuning framework cannot be use as an excuse for not being able in doing it

Ralph Stoesser · Post by **Ralph Stoesser** » Sat May 08, 2010 3:33 pm

Yes, thanks Miguel. Simple definition of bad bishop like mine is a rule of thumb for beginners, but is not valid for engines at +3000 Elo level.

@Marco, at least your hidden automatic tuning machine cannot fix bugs, but we humans can.

evaluate.cpp, line 145

Code: Select all

  const Score ThreatBonus[8][8] = {
      { Z, Z, Z, Z, Z, Z, Z, Z }, // not used
      { Z, S(18,37),       Z, S(37,47), S(55,97), S(55,97), Z, Z }, // KNIGHT attacks
      { Z, S(18,37), S(37,47),       Z, S(55,97), S(55,97), Z, Z }, // BISHOP attacks
      { Z, S( 9,27), S(27,47), S(27,47),       Z, S(37,47), Z, Z }, // ROOK attacks
      { Z, S(27,37), S(27,37), S(27,37), S(27,37),       Z, Z, Z }, // QUEEN attacks
      { Z, Z, Z, Z, Z, Z, Z, Z }, // not used
      { Z, Z, Z, Z, Z, Z, Z, Z }, // not used
      { Z, Z, Z, Z, Z, Z, Z, Z }  // not used
  };

should be

Code: Select all

  const Score ThreatBonus[8][8] = {
      { Z, Z, Z, Z, Z, Z, Z, Z },  // not used
      { Z, Z, Z, Z, Z, Z, Z, Z }, // not used
      { Z, S(18,37),       Z, S(37,47), S(55,97), S(55,97), Z, Z }, // KNIGHT attacks
      { Z, S(18,37), S(37,47),       Z, S(55,97), S(55,97), Z, Z }, // BISHOP attacks
      { Z, S( 9,27), S(27,47), S(27,47),       Z, S(37,47), Z, Z }, // ROOK attacks
      { Z, S(27,37), S(27,37), S(27,37), S(27,37),       Z, Z, Z }, // QUEEN attacks
      { Z, Z, Z, Z, Z, Z, Z, Z }, // not used
      { Z, Z, Z, Z, Z, Z, Z, Z } // not used
  };

Eelco de Groot · Post by **Eelco de Groot** » Sat May 08, 2010 3:47 pm

mcostalba wrote:
Ralph Stoesser wrote: SF is an stylish "open source" black box full of cryptic coefficients. No way to improve it manually. Only automated paramter tuning may raise the Elo bar. But the code for the paramter tuning is hidden source. So the SF team has the everlasting monopole in make it better. We stay spectators for ever, condemned to applaud when the next release with +154 Elo will arrive.
All the gain from 1.6 to 1.7 was done without any automatic tuning, all the search tweaks are done without any automatic tuning.

Automatic tuning was used just for evaluation and is now almost not used anymore (we have tuned everything possible ) so if you don't want to be spectator there is open rooms of possibilities....of course it doesn't mean that it is easy to improve, just that not having access to evaluation tuning framework cannot be use as an excuse for not being able in doing it

My observation would be that I would not put so much emphasis on all the coefficients. They depend on the rules in place; if you change them you (may) need, or may introduce, new coefficients, that is your "reward". If it works well the material imbalance table for instance is just a table that provides a lower bound for this material distribution if you work with bonuses, or an average if you have both bonuses and penalties. If you improve the rules you can introduce higher bonuses and penalties, but the average should stay the same because the material distribution incorporates no rules, at least not at present. That is one of the strong points of Tord's table, it keeps it simple. So if it is done well you need little retuning, I think for instance you are also ignoring the game phase calculation at this point -I would have to check that- but these two systems do much the same work, just in slightly different, complementary ways. As long as they both give conservative estimates most of the real evaluation difference can be given by the rules on top of it, which have more "amplitude" but these rules then hopefully are also more specific, they are for more extreme cases. I don't worry so much as Tord about the negative side-effects a rule may have, in some cases I just blindly believe in the rule and the negative effect may be not the fault of the rule itself but just caused by disturbing the equilibrium that you had, or you need additional rules to cover the holes in your first rule, which would be added complexity but that is okay I think, the Standard model in physics is more complex than the rules from Newton and it still does not cover gravity well. The really complex rules are only needed in extremity, -is this a proper use of this term ?- but that is where it all gets interesting

Regards, Eelco

marcelk · Post by **marcelk** » Sat May 08, 2010 4:01 pm

mcostalba wrote:
Ralph Stoesser wrote: SF is an stylish "open source" black box full of cryptic coefficients. No way to improve it manually. Only automated paramter tuning may raise the Elo bar. But the code for the paramter tuning is hidden source. So the SF team has the everlasting monopole in make it better. We stay spectators for ever, condemned to applaud when the next release with +154 Elo will arrive. :( :lol:
All the gain from 1.6 to 1.7 was done without any automatic tuning, all the search tweaks are done without any automatic tuning.

Automatic tuning was used just for evaluation and is now almost not used anymore (we have tuned everything possible ;-) ) so if you don't want to be spectator there is open rooms of possibilities....of course it doesn't mean that it is easy to improve, just that not having access to evaluation tuning framework cannot be use as an excuse for not being able in doing it :-)

The GPL is a difficult thing, but I think it can be read to mean that the tuning framework must be made available under the same conditions as the program.

The point is that these auto-tuned constants are either 1. not "source" (meaning the "preferred form to modify" them, which in this case is the secret tuner framework), but "object code", or 2. the tuner is "Corresponding Source", meaning

The "Corresponding Source" for a work in object code form means all the source code needed [...] to modify the work, including scripts to control those activities.

This requirement of opening up of tooling derives from the freedom that the GPL is looking after: Nobody should have a monopoly on improving the work.

If my reading is indeed right, and I have to say there I'm not 100% sure of that, but I consider it likely considering other readings of the GPL, then you can't GPL SF without revealing the tuner framework as well. And if you don't want that, then don't release SF under the GPL, but pick another license, such as the BSD style license (or don't release it at all of course).

mcostalba · Post by **mcostalba** » Sat May 08, 2010 4:29 pm

Ralph Stoesser wrote: @Marco, at least your hidden automatic tuning machine cannot fix bugs, but we humans can.
evaluate.cpp, line 145

Wow ! This is a bug, but to properly fix we need to retune all !

If we rewrite the labes as should be:

Code: Select all

  const Score ThreatBonus[8][8] = {
      { Z, Z, Z, Z, Z, Z, Z, Z },
      { Z, S(18,37),       Z, S(37,47), S(55,97), S(55,97), Z, Z }, // not used
      { Z, S(18,37), S(37,47),       Z, S(55,97), S(55,97), Z, Z }, // KNIGHT attacks
      { Z, S( 9,27), S(27,47), S(27,47),       Z, S(37,47), Z, Z }, // BISHOP attacks
      { Z, S(27,37), S(27,37), S(27,37), S(27,37),       Z, Z, Z }, // ROOK attacks
      { Z, Z, Z, Z, Z, Z, Z, Z }, // QUEEN attacks
      { Z, Z, Z, Z, Z, Z, Z, Z }, // not used
      { Z, Z, Z, Z, Z, Z, Z, Z }  // not used
  };

we see that evaluation for the queen is completely missing ! and that also evaluation for a KNIGHT attacking a BISHOP, for a BISHOP attacking a ROOK and a ROOK attacking a QUEEN are all missing.

So we need to retune all !

Thanks, yes, this is a case where we will need to use autmatic tuning

Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation