Re: Error margins via resampling (jackknifing)
Posted: Fri Aug 19, 2016 3:25 pm
Usually not the case, most of databases are drawn from number of available openings much larger than the number of games.Michel wrote:This is weird. There is nothing special about the jackknife. It is just a (non parametric) estimator for the variance. So should give the same value as any other estimator.I played a bit with "naive" 5-nomial variance, the unbalanced is indeed smaller than balanced, but not to extent given by jackknifing, where a factor of 2 in variance is common.
As I said the 5-nomial model does not take into accound correllations between repeated opening positions. But in that case the jackknife is also wrong since it assumes independent identically distributed trials.
In case of completely balanced openings and equal engines W1=W2=L1=L2 there is no difference between trinomial and 5-nomial, but I found by jackknifing a systematic and stable compression of 12-15% in variance for balanced (both my "ultrabalanced" openings and "2moves_v1" from Stockfish testing framework) compared to naive trinomial. Naive 5-nomial doesn't account for compression of variance for completely balanced and equal. Also, the common factor of 2 in variance for unbalanced of order 90cp given by jackknifing is hard to come by with naive 5-nomial. I seem to have a modified empirical form of 5-nomial with just 1 empirical parameter which fits well the jackknifed data for all sorts of openings and strength differences I fed. But I guess jackknifing data on the test run is even more efficient.Not in the 3-nomial cas no. But with the 5-nomial model it is perfectly possible since you have 5 empiric frequencies to play with. For example in case of 100% correllation the outcome would be 1 (out of 2) with 100% probability.Also, in case of balanced, there is no way to input correlation between the games.
Thanks. With hindsight the formula can be easily explained heuristically. The formula is exact in case we are computing the LLR for the mean of a normal distribution with known variance (and then it is standard).Anyway, I am still amazed by your master formula, this must be exploited
If the variance is unknown then the formula says you are allowed to estimate it from the sample. This can be explained by observing that for a normal distribution the maximum likelihood estimators for the mean and the variance are uncorrelated.
Finally using the law of large numbers pretty much anything can be approximated by a normal distribution.