Stockfish has included WDL stats in engine output

kinderchocolate · Post by **kinderchocolate** » Fri Jul 17, 2020 12:07 am

> Fishtest has 400cp adjudication, so any game reaching it for a few plies get marked as a win, though in some instance playing on it would end in a draw.

No. The graph in Fishest was approaching 1 as the x-axis goes to 400. This is not the same as the C++ code.

> You got the (cp, ply) -> wdl function very wrong, because with Stockfish's actual WDL formula the draw probability increase as ply count increase instead of the winning probability increasing as in your graph.

That makes no sense...

Alayan · Post by **Alayan** » Fri Jul 17, 2020 12:42 am

You are free to misunderstand.

Pio · Post by **Pio** » Fri Jul 17, 2020 12:44 am

Alayan wrote: ↑Thu Jul 16, 2020 11:59 pm You got several things wrong.

Fishtest has 400cp adjudication, so any game reaching it for a few plies get marked as a win, though in some instance playing on it would end in a draw.

Stockfish internal units aren't the same as centipawns. 600 or so would be the value of a knight in internal units, not cp. Of course usually a position down a knight snowballs into much worse quickly.

You got the (cp, ply) -> wdl function very wrong, because with Stockfish's actual WDL formula the draw probability increase as ply count increase instead of the winning probability increasing as in your graph.

I just want to say that (cp, ply) -> wdl seems not to be the best function you could get, since ply is not something the function should be heavily dependent on. Very little of a chess position is a function of the moves before leading up to the position. Only 50-move draw rule, threefold repetition and maybe enpassant and castling (depending on how you look at it) is history dependent or ply-dependent.

I realise however that it is hard to substitute the ply for something else/better in the (cp, ply) -> wdl because if it was easy we would have great evaluation-functions.

I guess it is possible to get a better predictor than (cp, ply) and one is to look at what type of moves were in the history prior to the position and what moves are in the PV. My guess is that shuffling moves (could be identified as many moves made by the same piece and the number of pawn moves) and material left are better predictors that is (cp, shuffling moves(recent_history, PV), material left(recent_history, PV)) -> wdl might be better

/Pio

zullil · Post by **zullil** » Fri Jul 17, 2020 1:44 am

kinderchocolate wrote: ↑Thu Jul 16, 2020 11:44 pm Thanks. I think probability of winning is a good measure for chess reporting, and is in fact better than "cp":

cp is a programming concept not for chess analysis

cp is heavily implementation dependent
Reporting probability from fitting a sigmoid curve is a nice way to normalize the conflicts. I attach a plot of the Stockfish's WDL code.

https://github.com/glinscott/fishtest/w ... n-fishtest saturate around 400, but the SF code saturate around 600. Not sure why the author of patch reported "The model fits rather accurately the LTC fishtest statistics". The saturation point is critically important in the model, so if I'm not mistaken the patch was horribly badly programmed.

600 is a little less than a knight in SF

At cp==0, the winning chance in the Fishtest link is about little less than 1 (hard to see). The SF code is 0.076 (vertical line in the plot).
Basically, the code tells us if we have an advantage something between a pawn and a knight, it's almost certain win. Up by a pawn is approximately 25% winning chance, not including draws.

I believe the attached graph correctly depicts (a continuous approximation to) Stockfish's current model of win rate as a function of game ply, assuming the current evaluation is 0.00.

syzygy · Post by **syzygy** » Fri Jul 17, 2020 3:03 am

kinderchocolate wrote: ↑Thu Jul 16, 2020 10:04 pm Probably asked somewhere else, but I can't find it. What's the impact of using ply in the calculation? The problem here is we don't have such information if we start from a non-initial position.

When starting from a complete fen, we do have that information. A complete fen includes the move number.

syzygy · Post by **syzygy** » Fri Jul 17, 2020 3:07 am

kinderchocolate wrote: ↑Thu Jul 16, 2020 11:44 pm
600 is a little less than a knight in SF

If you are talking about cp as reported by SF, then a knight in SF is much less than 600cp.

When SF reports a score, it scales its internal score so that a pawn is about 100cp (as it should).

syzygy · Post by **syzygy** » Fri Jul 17, 2020 3:14 am

kinderchocolate wrote: ↑Thu Jul 16, 2020 11:57 pm If I was to add it analysis, I may just drop the ply parameter, and just hard-code it to 10. It looks like at 10, a knight advantage is about 75% winning. I like it to be 75% winning for a piece up.

Your "I like it to be 75% winning for a piece up" seems to be another good reason to just stick to reporting cp. A cp score is an objective score (for conventional engines like Stockfish) and everybody can subjectively interpret a cp score however they like.

MikeB · Post by **MikeB** » Fri Jul 17, 2020 5:12 am

kinderchocolate wrote: ↑Thu Jul 16, 2020 11:57 pm If I was to add it analysis, I may just drop the ply parameter, and just hard-code it to 10. It looks like at 10, a knight advantage is about 75% winning. I like it to be 75% winning for a piece up.

That might be true in some human games, but with computers , it is above 90%

zullil · Post by **zullil** » Fri Jul 17, 2020 3:23 pm

kinderchocolate wrote: ↑Thu Jul 16, 2020 11:44 pm Thanks. I think probability of winning is a good measure for chess reporting, and is in fact better than "cp":

cp is a programming concept not for chess analysis

cp is heavily implementation dependent
Reporting probability from fitting a sigmoid curve is a nice way to normalize the conflicts. I attach a plot of the Stockfish's WDL code.

https://github.com/glinscott/fishtest/w ... n-fishtest saturate around 400, but the SF code saturate around 600. Not sure why the author of patch reported "The model fits rather accurately the LTC fishtest statistics". The saturation point is critically important in the model, so if I'm not mistaken the patch was horribly badly programmed.

600 is a little less than a knight in SF

At cp==0, the winning chance in the Fishtest link is about little less than 1 (hard to see). The SF code is 0.076 (vertical line in the plot).
Basically, the code tells us if we have an advantage something between a pawn and a knight, it's almost certain win. Up by a pawn is approximately 25% winning chance, not including draws.

I think your graphs are using (cp * PawnValueEg) rather than cp as horizontal units. I believe the graph below is a correct rendering of Stockfish's WDL model, assuming a game ply of 10.

Kunokunzi · Post by **Kunokunzi** » Sun Apr 11, 2021 8:27 am

Can the WDL values from Stockfish be reproduced with the formula in the win rate model with a given evaluation as a decimal number and a given game-ply?

First of all, it would have to be clarified which value is to be used for the variable PawnValueEg in the line
'double x = std::clamp(double(100 * v) / PawnValueEg, -1000.0, 1000.0)'.
In the 'Source code for chess.engine' (https://python-chess.readthedocs.io/en/ ... ngine.html) this divisor is simply removed or equated with 1. The formula then produces values that match the original Stockfish values fairly closely, but not always exactly. Is this slight discrepancy due to the ominous variable PawnValueEg? This appears to be 208 within the Stockfish code. But using 208 or 2.08 as a divisor leads to absurd results.

How must the formula be modified outside of Stockfish to get exactly the WDL values provided by Stockfish?

Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output

Re: Stockfish has included WDL stats in engine output