large king safety scores

lucasart · Post by **lucasart** » Wed Jul 25, 2012 6:21 am

I realized that my king safety could sometimes return very large values. Here's an example:
[d]r3k1nr/pppb1ppp/8/4p1K1/8/8/P1P2PPP/q1BQ1BNR w kq - 0 1
Here my engine gives white a king safety penalty of 4.65 pawns, which is quite big. This kind of value should lead to a quick mate, otherwise it can have a negative effect in the search, if refutable. The score is justified by the fact that the king is very badly placed, has no pawn shield, and appears to be attacked by an army of pawns.
However, the king can retreat, and for example after Kh4, the safety score drops to 0.72 pawns, as the king gets "closer to safety" and the pawns attacks suddenly "disappear"
[d]r3k1nr/pppb1ppp/8/4p3/7K/8/P1P2PPP/q1BQ1BNR b kq - 0 1
Intuitively, I would thinnk that this kind of large score is dangerous, because: 1/ if it's refutable it may lead to unsound sacrifices, even if not played in the game they pollute the search tree 2/ it has a discontinuous behaviour (as shown above).

So I tried the following workaround:
* if the safety score is > value(knight)
* and the king can retreat (ie. a move south, south-east, south-west is not blocked by own pieces or enemy attacked squares)
* penalty -= (penalty - value(knight)) / 2

And the results are highly unclear:
* T1: 55% at 100k nodes/move after 1200 games
* T2: 54% at 200k nodes/move after 1200 games
* T3: 50.5% at 12sec+0.2sec after 1200 games

Formally, an experiment produces an estimator score_hat(N,T) where N is the number of games and T the time control (T can be assumed one dimensional by rescaling time and increment or nodes appropriately). And of course score_hat(N,T) = score(T) + O(1/sqrt(N)).

Now if we forget the O(1/sqrt(N)), for the sake of the argument, and assume that my observations of T1, T2 and T3 are asymptotic ones, we have 3 observations score(T1,T2,T3) showing a decreasing behaviour. But that still doesn't tell me whether for large values of T score(T) is going to be above or below 50%.

So I end up having to do a leap of faith, and just assume that this feature is good, which I hate doing because I've been wrong so many times doing that in the past (so many things that intuitively make sense but get proven to be regression in testing...)

I remember Don Dailey saying once that he did a lot of experiment with king safety and noticed that it actually had a negative effect at super fast games and started to kick in as the time increases. So perhaps what I'm looking at on T1 and T2 is the reduction of the negative effect of king safety at such fast time controls, and T3 is well below 95% confidence level anyway, so there's nothing to look at...

I wonder if others have tried taking measures to trim off excessive safety scores. And if so, how and with what success.

kbhearn · Post by **kbhearn** » Wed Jul 25, 2012 8:27 am

Honestly it doesn't seem particularly out of whack to consider that king position as worth almost a full rook. If anything, the lack of development of black's remaining pieces should be the mitigating factor, not the retreat square (if black could immediately do a rook raise for instance, then white's king is in big trouble).

zamar · Post by **zamar** » Wed Jul 25, 2012 9:31 am

If white's king is on the fifth rank in a middlegame positions and queens are still on board, I'd say that >99% of cases the game is lost. So penalty of one rook seems fully justified for me. Search should be able to handle the rest <1% of cases.

jdart · Post by **jdart** » Wed Jul 25, 2012 4:34 pm

I don't give this position as large a score.

Arasan currently has 3 components to king safety:

1. A "king cover" score based on pawn protection of the King.
2. An "attack" score based on how many enemy pieces attack.
3. A "boost" factor applied when the king is both exposed and attacked.

All are scaled by the opponent's material level.

Here is what I get for this position:

Black piece attacks on opposing king:
cover= -75
pawn proximity=4
attack_count=11
pin_count=0
boost factor= 1.40952
king attack score (Black) : 148 (pre-scaling), 138 (scaled)

(values in centipawns). The king attack score is in addition to the cover score so the total king safety is a little over -2 pawns here.

(No claim this is optimal but this algorithm is somewhat tuned by a lot of testing. However this is an unusual position).

diep · Post by **diep** » Wed Jul 25, 2012 4:56 pm

lucasart wrote:I realized that my king safety could sometimes return very large values. Here's an example:
[d]r3k1nr/pppb1ppp/8/4p1K1/8/8/P1P2PPP/q1BQ1BNR w kq - 0 1
Here my engine gives white a king safety penalty of 4.65 pawns, which is quite big. This kind of value should lead to a quick mate, otherwise it can have a negative effect in the search, if refutable. The score is justified by the fact that the king is very badly placed, has no pawn shield, and appears to be attacked by an army of pawns.
However, the king can retreat, and for example after Kh4, the safety score drops to 0.72 pawns, as the king gets "closer to safety" and the pawns attacks suddenly "disappear"

Interesting position for evaluation.
There is however 1 problem.
Black is up a rook or something (exchange).

Diep's static evaluation is black up 5.908 pawns.
King safety is many terms in diep not just 1 'unity'.
One of the collection scores of kingsafety gets white king on g5 a penalty of all terms together of 1.085 pawns.

That's not counting the piece square table penalty of course.

[d]r3k1nr/pppb1ppp/8/4p3/7K/8/P1P2PPP/q1BQ1BNR b kq - 0 1
Intuitively, I would thinnk that this kind of large score is dangerous, because: 1/ if it's refutable it may lead to unsound sacrifices, even if not played in the game they pollute the search tree 2/ it has a discontinuous behaviour (as shown above).

well in this case it doesn't matter, black is winning.
After Kh4 diep evaluates position as black up 5.528 pawns.

King on h4 gets penalized, on top of the PSQ score, another 0.990 pawns

So I tried the following workaround:
* if the safety score is > value(knight)
* and the king can retreat (ie. a move south, south-east, south-west is not blocked by own pieces or enemy attacked squares)
* penalty -= (penalty - value(knight)) / 2

And the results are highly unclear:
* T1: 55% at 100k nodes/move after 1200 games
* T2: 54% at 200k nodes/move after 1200 games
* T3: 50.5% at 12sec+0.2sec after 1200 games

Formally, an experiment produces an estimator score_hat(N,T) where N is the number of games and T the time control (T can be assumed one dimensional by rescaling time and increment or nodes appropriately). And of course score_hat(N,T) = score(T) + O(1/sqrt(N)).

Now if we forget the O(1/sqrt(N)), for the sake of the argument, and assume that my observations of T1, T2 and T3 are asymptotic ones, we have 3 observations score(T1,T2,T3) showing a decreasing behaviour. But that still doesn't tell me whether for large values of T score(T) is going to be above or below 50%.

So I end up having to do a leap of faith, and just assume that this feature is good, which I hate doing because I've been wrong so many times doing that in the past (so many things that intuitively make sense but get proven to be regression in testing...)

I remember Don Dailey saying once that he did a lot of experiment with king safety and noticed that it actually had a negative effect at super fast games and started to kick in as the time increases. So perhaps what I'm looking at on T1 and T2 is the reduction of the negative effect of king safety at such fast time controls, and T3 is well below 95% confidence level anyway, so there's nothing to look at...

I wonder if others have tried taking measures to trim off excessive safety scores. And if so, how and with what success.

Interesting thoughts. Yet superbullet games never worked for measuring anything in Diep, except that sometimes you notice a huge bug in a game if you watch. But it's bad for the eyes to watch such superbullet games.

I doubt other programmers would learn that much from superbullet games if they watched the screen. In a flash of a second i see something that looks suspicious, how many others would see that?

diep · Post by **diep** » Wed Jul 25, 2012 5:11 pm

zamar wrote:If white's king is on the fifth rank in a middlegame positions and queens are still on board, I'd say that >99% of cases the game is lost. So penalty of one rook seems fully justified for me. Search should be able to handle the rest <1% of cases.

We already realized you never built your own program.

diep · Post by **diep** » Wed Jul 25, 2012 5:15 pm

jdart wrote:I don't give this position as large a score.

Arasan currently has 3 components to king safety:

1. A "king cover" score based on pawn protection of the King.
2. An "attack" score based on how many enemy pieces attack.
3. A "boost" factor applied when the king is both exposed and attacked.

All are scaled by the opponent's material level.

Here is what I get for this position:

Black piece attacks on opposing king:
cover= -75
pawn proximity=4
attack_count=11
pin_count=0
boost factor= 1.40952
king attack score (Black) : 148 (pre-scaling), 138 (scaled)

(values in centipawns). The king attack score is in addition to the cover score so the total king safety is a little over -2 pawns here.

(No claim this is optimal but this algorithm is somewhat tuned by a lot of testing. However this is an unusual position).

The resulting value you post is more in line with what i'm doing in Diep as well Jon. Please note that it's possible in PSQ's i give a much higher penalty for the king there than you do in Arasan. Maybe worth looking at.

It's getting in diep's PSQ 0.65 penalty for a king on g5.
How much do you give there?

jdart · Post by **jdart** » Wed Jul 25, 2012 5:42 pm

I currently only use PSQ for the King in the endgame, so that doesn't apply here.

--Jon

Don · Post by **Don** » Wed Jul 25, 2012 5:43 pm

I had a similar issue with Komodo where I was trying to debug lazy evaluation - I was looking for scores from the evaluation that were out of line with the material situation. I found a lot of positions where the evaluation score was many pawns (up to 7 or 8) different than the material score - but there was not a single case I could find where these big scores seemed unreasonably high. Most of them where the difference was 3 or 4 pawns were probably too low because a mate was surely around the corner. In every case I saw where the score was over 2 or 3 pawns, it was pretty convincingly won for the side with the advantage.

Probably the trickiest part is those cases where there is a king safety issue but only a moderate one - where it really takes a certain amount of human judgement to weigh the issues. It's easy to tell if you have overwhelming king safety but not so hard to determine if it's worth 1/4 or 1/2 a pawn but no more.

lucasart wrote:I realized that my king safety could sometimes return very large values. Here's an example:
[d]r3k1nr/pppb1ppp/8/4p1K1/8/8/P1P2PPP/q1BQ1BNR w kq - 0 1
Here my engine gives white a king safety penalty of 4.65 pawns, which is quite big. This kind of value should lead to a quick mate, otherwise it can have a negative effect in the search, if refutable. The score is justified by the fact that the king is very badly placed, has no pawn shield, and appears to be attacked by an army of pawns.
However, the king can retreat, and for example after Kh4, the safety score drops to 0.72 pawns, as the king gets "closer to safety" and the pawns attacks suddenly "disappear"
[d]r3k1nr/pppb1ppp/8/4p3/7K/8/P1P2PPP/q1BQ1BNR b kq - 0 1
Intuitively, I would thinnk that this kind of large score is dangerous, because: 1/ if it's refutable it may lead to unsound sacrifices, even if not played in the game they pollute the search tree 2/ it has a discontinuous behaviour (as shown above).

So I tried the following workaround:
* if the safety score is > value(knight)
* and the king can retreat (ie. a move south, south-east, south-west is not blocked by own pieces or enemy attacked squares)
* penalty -= (penalty - value(knight)) / 2

And the results are highly unclear:
* T1: 55% at 100k nodes/move after 1200 games
* T2: 54% at 200k nodes/move after 1200 games
* T3: 50.5% at 12sec+0.2sec after 1200 games

Formally, an experiment produces an estimator score_hat(N,T) where N is the number of games and T the time control (T can be assumed one dimensional by rescaling time and increment or nodes appropriately). And of course score_hat(N,T) = score(T) + O(1/sqrt(N)).

Now if we forget the O(1/sqrt(N)), for the sake of the argument, and assume that my observations of T1, T2 and T3 are asymptotic ones, we have 3 observations score(T1,T2,T3) showing a decreasing behaviour. But that still doesn't tell me whether for large values of T score(T) is going to be above or below 50%.

So I end up having to do a leap of faith, and just assume that this feature is good, which I hate doing because I've been wrong so many times doing that in the past (so many things that intuitively make sense but get proven to be regression in testing...)

I remember Don Dailey saying once that he did a lot of experiment with king safety and noticed that it actually had a negative effect at super fast games and started to kick in as the time increases. So perhaps what I'm looking at on T1 and T2 is the reduction of the negative effect of king safety at such fast time controls, and T3 is well below 95% confidence level anyway, so there's nothing to look at...

I wonder if others have tried taking measures to trim off excessive safety scores. And if so, how and with what success.

Adam Hair · Post by **Adam Hair** » Wed Jul 25, 2012 6:23 pm

Don, I hope you don't mind this inane and off-topic response:

"Your superior intellect is no match for our puny weapons." - Kang and Kudos

large king safety scores

large king safety scores

Re: large king safety scores

Re: large king safety scores

Re: large king safety scores

Re: large king safety scores

Re: large king safety scores

Re: large king safety scores

Re: large king safety scores

Re: large king safety scores

Re: large king safety scores