At the same time, I am aware that there exists a win expectancy estimation model and that Lichess uses a similar model. Rather than relying on historical games, couldn't one run both engines from the start position and use the average evaluation to calculate a win expectancy?It's essential to keep results per side (if playing without book) or results of game pairs with the same start position (if using a large set of start positions).. Not doing that results in overestimation of statistical error (see http://talkchess.com/forum/viewtopic.ph ... ight=gsprt ) and waste of testing resources. Why doesn't mainline Stockfish testing doesn't do that? - well, they found enough dupes to contribute their resources.
Error margins via resampling (jackknifing)
Moderators: hgm, Rebel, chrisw
-
- Posts: 27
- Joined: Sat Dec 03, 2016 2:20 pm
Re: Error margins via resampling (jackknifing)
Pardon my ignorance here, even after having read this thread and related threads. I am advised:
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: Error margins via resampling (jackknifing)
The win expectancy of an engine should be based on the evaluation of positions and the results from the games of the engine itself and not from other players' games. Every engine has its own evaluation and win expectancy equivalent.Toadofsky wrote:Pardon my ignorance here, even after having read this thread and related threads. I am advised:
At the same time, I am aware that there exists a win expectancy estimation model and that Lichess uses a similar model. Rather than relying on historical games, couldn't one run both engines from the start position and use the average evaluation to calculate a win expectancy?It's essential to keep results per side (if playing without book) or results of game pairs with the same start position (if using a large set of start positions).. Not doing that results in overestimation of statistical error (see http://talkchess.com/forum/viewtopic.ph ... ight=gsprt ) and waste of testing resources. Why doesn't mainline Stockfish testing doesn't do that? - well, they found enough dupes to contribute their resources.