Getting SPRT right
Posted: Thu Apr 23, 2015 12:02 am
I was working to add SPRT to my evaluation framework and noticed a strange difference between how LLR is computed in cutechess-cli and fishtest. I'll paste the relevant piece of code:
In cutechess-cli: https://github.com/cutechess/cutechess/ ... c/sprt.cpp
In fishtest https://raw.githubusercontent.com/glins ... at_util.py
So fishtest omits the scaling step, i.e. it uses a fixed draw elo. Indeed the results are different. For
print SPRT({'wins': 716, 'losses': 591, 'draws': 2163}, 0, 0.05, 6, 0.05, 200)
fishtests prints LLR as 2.9948445563125237
while cutechess prints LLR as 4.373536
Which one is correct? Does cutechess's test allow one to run fewer tests?
In cutechess-cli: https://github.com/cutechess/cutechess/ ... c/sprt.cpp
Code: Select all
// Probability laws under H0 and H1
const double s = b.scale();
const BayesElo b0(m_elo0 / s, b.drawElo());
const BayesElo b1(m_elo1 / s, b.drawElo());
const SprtProbability p0(b0), p1(b1);
Code: Select all
# Probability laws under H0 and H1
P0 = bayeselo_to_proba(elo0, drawelo)
P1 = bayeselo_to_proba(elo1, drawelo)
print SPRT({'wins': 716, 'losses': 591, 'draws': 2163}, 0, 0.05, 6, 0.05, 200)
fishtests prints LLR as 2.9948445563125237
while cutechess prints LLR as 4.373536
Which one is correct? Does cutechess's test allow one to run fewer tests?