It's fine if you don't care what happens to other people, but don't try to discourage me from trying to spread the information about the damaging aspects of WDL.
If WDL fit the data then King vs King would correctly show white has 0% chance of winning
It's fine if you don't care what happens to other people, but don't try to discourage me from trying to spread the information about the damaging aspects of WDL.
If WDL fit the data then King vs King would correctly show white has 0% chance of winning
I'm not interested in discouraging you. It just seemed to me, based on the last sentence of your initial post, that you didn't understand how to interpret WDL output. Maybe I misunderstood that post.
UCI_ShowWDL
If enabled, show approximate WDL statistics as part of the engine output. These WDL numbers model expected game outcomes for a given evaluation and game ply for engine self-play at fishtest LTC conditions (60+0.6s per game).
Ovyron wrote: ↑Fri Jul 03, 2020 8:00 am
I still can't wrap my head around something like "White has a 50% chance to win this", because for that to be useful I also need to know black's winning chances. If black has 0% chance to win this then this position is a great one to aim for. If black also has 50% chance to win this then the expected performance is 50%, and I'd rather play into one where White's chances are only 30% but black's are only 10%.
But WDL has no way to show that difference, so it falls flat on its face.
Stockfish's WDL output is always shown from the perspective of the side having the move, as indicated in the FEN. So let's assume that it's White's move and the output is 500 500 0. That would correspond to your first case, of Black having 0% chance of winning. If the output were 500 0 500, that would correspond to your second case, of each side having a 50% chance of winning (with no possibility of a draw).
zullil wrote: ↑Fri Jul 03, 2020 1:19 pm
If WDL shows 200 500 300, for example, then the interpretation is that the side-to move has a 20% chance of winning and a 30% chance of losing.
That predicts that 50% of the games will be drawn. The rest are decided games. How accurate is that? Where's the data that shows when 200 500 300 is shown, that half the games are actually drawn?
Because if they are then I stand corrected and tip off my hat to WDL, as I have no way of knowing when a position has 50% chance of being decided in any way, WDL is providing incredibly useful information just like that.
But if in reality when that position is played out the decided games are nowhere near 50%, then WDL is just smoke and mirrors. People are making wrong decisions because of faulty information WDL shows them.
The real information is reported score and depth. The WDL printed by SF is just a re-interpretation of the reported score and depth.
Most people are very happy with getting such numbers even if they don't mean anything. I have seen this before a couple of times in varying contexts. There is probably a good business idea here (write an app that predicts something and sell it, the quality of your model is of no importance).
syzygy wrote: ↑Sat Jul 04, 2020 1:55 pm
The real information is reported score and depth. The WDL printed by SF is just a re-interpretation of the reported score and depth.
syzygy wrote: ↑Sat Jul 04, 2020 1:55 pm
The real information is reported score and depth. The WDL printed by SF is just a re-interpretation of the reported score and depth.
The WDL printed by SF is just a re-interpretation of the reported score and current move number.
Right. Although the reported score likely varies from iteration to iteration of the search, the current depth of the search is not directly used in the production of the WDL output.
syzygy wrote: ↑Sat Jul 04, 2020 1:55 pm
The real information is reported score and depth. The WDL printed by SF is just a re-interpretation of the reported score and depth.
The WDL printed by SF is just a re-interpretation of the reported score and current move number.
You are right, thanks for the correction. I thought "ply" meant depth, but it means "game_ply()".
Maybe it could be improved by taking into account game phase instead. (And contempt would seem to play a role as well.)
Apparently the default value of UCI_ShowWDL has now been changed to "false".
syzygy wrote: ↑Sat Jul 04, 2020 1:55 pmMost people are very happy with getting such numbers even if they don't mean anything.
No, they're happy because they think those numbers mean something, and they think they mean something very useful.
But I'll just shrug and hope I face someone that makes a mistake because they relied on WDL, egotistically something like this being around can only benefit me (and letting people know the numbers don't mean anything is against my interests.)
DOUBLE POST NOTE - If WDL worked as advertised I'd be the first to switch, accurate predictive information of game outcomes would make scores obsolete and WDL would be the best thing that could be used to rank moves and variations.
This thing is called WDL without actually being WDL.
Ovyron wrote: ↑Sun Jul 05, 2020 12:36 am
DOUBLE POST NOTE - If WDL worked as advertised I'd be the first to switch, accurate predictive information of game outcomes would make scores obsolete and WDL would be the best thing that could be used to rank moves and variations.
This thing is called WDL without actually being WDL.
This is all it is as stated by the authors,
#### UCI_ShowWDL
If enabled, show approximate WDL statistics as part of the engine output.
These WDL numbers model expected game outcomes for a given evaluation and
game ply for engine self-play at fishtest LTC conditions (60+0.6s per game).
No one says you have to like it or use it. I think it's interesting data.