Error margins via resampling (jackknifing)

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Toadofsky
Posts: 27
Joined: Sat Dec 03, 2016 1:20 pm
Contact:

Re: Error margins via resampling (jackknifing)

Post by Toadofsky » Wed Feb 01, 2017 12:31 pm

Pardon my ignorance here, even after having read this thread and related threads. I am advised:
It's essential to keep results per side (if playing without book) or results of game pairs with the same start position (if using a large set of start positions).. Not doing that results in overestimation of statistical error (see http://talkchess.com/forum/viewtopic.ph ... ight=gsprt ) and waste of testing resources. Why doesn't mainline Stockfish testing doesn't do that? - well, they found enough dupes to contribute their resources.
At the same time, I am aware that there exists a win expectancy estimation model and that Lichess uses a similar model. Rather than relying on historical games, couldn't one run both engines from the start position and use the average evaluation to calculate a win expectancy?

Ferdy
Posts: 4111
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: Error margins via resampling (jackknifing)

Post by Ferdy » Thu Feb 02, 2017 3:54 am

Toadofsky wrote:Pardon my ignorance here, even after having read this thread and related threads. I am advised:
It's essential to keep results per side (if playing without book) or results of game pairs with the same start position (if using a large set of start positions).. Not doing that results in overestimation of statistical error (see http://talkchess.com/forum/viewtopic.ph ... ight=gsprt ) and waste of testing resources. Why doesn't mainline Stockfish testing doesn't do that? - well, they found enough dupes to contribute their resources.
At the same time, I am aware that there exists a win expectancy estimation model and that Lichess uses a similar model. Rather than relying on historical games, couldn't one run both engines from the start position and use the average evaluation to calculate a win expectancy?
The win expectancy of an engine should be based on the evaluation of positions and the results from the games of the engine itself and not from other players' games. Every engine has its own evaluation and win expectancy equivalent.

Post Reply