Stockfish has included WDL stats in engine output

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

kinderchocolate
Posts: 454
Joined: Mon Nov 01, 2010 6:55 am
Full name: Ted Wong

Re: Stockfish has included WDL stats in engine output

Post by kinderchocolate »

> Fishtest has 400cp adjudication, so any game reaching it for a few plies get marked as a win, though in some instance playing on it would end in a draw.

No. The graph in Fishest was approaching 1 as the x-axis goes to 400. This is not the same as the C++ code.

> You got the (cp, ply) -> wdl function very wrong, because with Stockfish's actual WDL formula the draw probability increase as ply count increase instead of the winning probability increasing as in your graph.

That makes no sense...
Alayan
Posts: 550
Joined: Tue Nov 19, 2019 8:48 pm
Full name: Alayan Feh

Re: Stockfish has included WDL stats in engine output

Post by Alayan »

You are free to misunderstand.
Pio
Posts: 334
Joined: Sat Feb 25, 2012 10:42 pm
Location: Stockholm

Re: Stockfish has included WDL stats in engine output

Post by Pio »

Alayan wrote: Thu Jul 16, 2020 11:59 pm You got several things wrong.

Fishtest has 400cp adjudication, so any game reaching it for a few plies get marked as a win, though in some instance playing on it would end in a draw.

Stockfish internal units aren't the same as centipawns. 600 or so would be the value of a knight in internal units, not cp. Of course usually a position down a knight snowballs into much worse quickly.

You got the (cp, ply) -> wdl function very wrong, because with Stockfish's actual WDL formula the draw probability increase as ply count increase instead of the winning probability increasing as in your graph.
I just want to say that (cp, ply) -> wdl seems not to be the best function you could get, since ply is not something the function should be heavily dependent on. Very little of a chess position is a function of the moves before leading up to the position. Only 50-move draw rule, threefold repetition and maybe enpassant and castling (depending on how you look at it) is history dependent or ply-dependent.

I realise however that it is hard to substitute the ply for something else/better in the (cp, ply) -> wdl because if it was easy we would have great evaluation-functions.

I guess it is possible to get a better predictor than (cp, ply) and one is to look at what type of moves were in the history prior to the position and what moves are in the PV. My guess is that shuffling moves (could be identified as many moves made by the same piece and the number of pawn moves) and material left are better predictors that is (cp, shuffling moves(recent_history, PV), material left(recent_history, PV)) -> wdl might be better

/Pio
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Stockfish has included WDL stats in engine output

Post by zullil »

kinderchocolate wrote: Thu Jul 16, 2020 11:44 pm Thanks. I think probability of winning is a good measure for chess reporting, and is in fact better than "cp":
  • cp is a programming concept not for chess analysis
  • cp is heavily implementation dependent
Reporting probability from fitting a sigmoid curve is a nice way to normalize the conflicts. I attach a plot of the Stockfish's WDL code.
  • https://github.com/glinscott/fishtest/w ... n-fishtest saturate around 400, but the SF code saturate around 600. Not sure why the author of patch reported "The model fits rather accurately the LTC fishtest statistics". The saturation point is critically important in the model, so if I'm not mistaken the patch was horribly badly programmed.
  • 600 is a little less than a knight in SF
  • At cp==0, the winning chance in the Fishtest link is about little less than 1 (hard to see). The SF code is 0.076 (vertical line in the plot).
Basically, the code tells us if we have an advantage something between a pawn and a knight, it's almost certain win. Up by a pawn is approximately 25% winning chance, not including draws.
I believe the attached graph correctly depicts (a continuous approximation to) Stockfish's current model of win rate as a function of game ply, assuming the current evaluation is 0.00.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Stockfish has included WDL stats in engine output

Post by syzygy »

kinderchocolate wrote: Thu Jul 16, 2020 10:04 pm Probably asked somewhere else, but I can't find it. What's the impact of using ply in the calculation? The problem here is we don't have such information if we start from a non-initial position.
When starting from a complete fen, we do have that information. A complete fen includes the move number.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Stockfish has included WDL stats in engine output

Post by syzygy »

kinderchocolate wrote: Thu Jul 16, 2020 11:44 pm
  • 600 is a little less than a knight in SF
If you are talking about cp as reported by SF, then a knight in SF is much less than 600cp.

When SF reports a score, it scales its internal score so that a pawn is about 100cp (as it should).
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Stockfish has included WDL stats in engine output

Post by syzygy »

kinderchocolate wrote: Thu Jul 16, 2020 11:57 pm If I was to add it analysis, I may just drop the ply parameter, and just hard-code it to 10. It looks like at 10, a knight advantage is about 75% winning. I like it to be 75% winning for a piece up.
Your "I like it to be 75% winning for a piece up" seems to be another good reason to just stick to reporting cp. A cp score is an objective score (for conventional engines like Stockfish) and everybody can subjectively interpret a cp score however they like.
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: Stockfish has included WDL stats in engine output

Post by MikeB »

kinderchocolate wrote: Thu Jul 16, 2020 11:57 pm If I was to add it analysis, I may just drop the ply parameter, and just hard-code it to 10. It looks like at 10, a knight advantage is about 75% winning. I like it to be 75% winning for a piece up.
That might be true in some human games, but with computers , it is above 90%
Image
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Stockfish has included WDL stats in engine output

Post by zullil »

kinderchocolate wrote: Thu Jul 16, 2020 11:44 pm Thanks. I think probability of winning is a good measure for chess reporting, and is in fact better than "cp":
  • cp is a programming concept not for chess analysis
  • cp is heavily implementation dependent
Reporting probability from fitting a sigmoid curve is a nice way to normalize the conflicts. I attach a plot of the Stockfish's WDL code.
  • https://github.com/glinscott/fishtest/w ... n-fishtest saturate around 400, but the SF code saturate around 600. Not sure why the author of patch reported "The model fits rather accurately the LTC fishtest statistics". The saturation point is critically important in the model, so if I'm not mistaken the patch was horribly badly programmed.
  • 600 is a little less than a knight in SF
  • At cp==0, the winning chance in the Fishtest link is about little less than 1 (hard to see). The SF code is 0.076 (vertical line in the plot).
Basically, the code tells us if we have an advantage something between a pawn and a knight, it's almost certain win. Up by a pawn is approximately 25% winning chance, not including draws.
I think your graphs are using (cp * PawnValueEg) rather than cp as horizontal units. I believe the graph below is a correct rendering of Stockfish's WDL model, assuming a game ply of 10.
Kunokunzi
Posts: 1
Joined: Mon Apr 05, 2021 6:34 am
Full name: Konrad Franz Hüttner

Re: Stockfish has included WDL stats in engine output

Post by Kunokunzi »

Can the WDL values from Stockfish be reproduced with the formula in the win rate model with a given evaluation as a decimal number and a given game-ply?

First of all, it would have to be clarified which value is to be used for the variable PawnValueEg in the line
'double x = std::clamp(double(100 * v) / PawnValueEg, -1000.0, 1000.0)'.
In the 'Source code for chess.engine' (https://python-chess.readthedocs.io/en/ ... ngine.html) this divisor is simply removed or equated with 1. The formula then produces values that match the original Stockfish values fairly closely, but not always exactly. Is this slight discrepancy due to the ominous variable PawnValueEg? This appears to be 208 within the Stockfish code. But using 208 or 2.08 as a divisor leads to absurd results.

How must the formula be modified outside of Stockfish to get exactly the WDL values provided by Stockfish?