SPRT and Engine testing

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: SPRT and Engine testing

Post by Adam Hair »

I am going to proceed on to incorporating draws, although, at this time,
I am not sure this going to develope into something that is going to be
extremely helpful to engine authors. I am going to have to play around
with the idea some more.

One way to deal with draws is to assume that the draw rate stays the
same. This approximately true if the original engine and the new version
are close in strength. From examining data from Kirill Kryukov's KCEC,
the draw rate from his games changes 0.00021 when the two engines
are 10 Elo apart as compared to two even engines. By computing the
conditional probability of the trinomial distribution when the draw rate is
constant, it can be shown that:

K_i = W ln(p_1/p_o) + ((N-D)-W) ln ((1-d-p_1)/(1-d-p_o))
K_i = W ln(p_1/(1-d-p_1)) - (1-d)N ln((1-d-p_o)/(1-d-p_1))

where d = draw rate

The upper and lower bounds of S_n = Sum(K_i) would still be +-ln((1-a/b).

However, the actual draw rate of the match is going to fluctuate, which will
render the sequential upper and lower bounds of the number of wins
inaccurate. I don't think this is the way to go.

My next post will be about another way to deal with the draws that should
work.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: SPRT and Engine testing

Post by Adam Hair »

I am in the middle of accumulating a large number of games
that may help me determine the best way to apply SPRT to
engine testing. The biggest problem is trying to adapt SPRT,
which concerns testing simple hypotheses, into something
to test composite hypotheses, i.e. H( elo(engine A) = elo( engine B))
versus H ( elo(A) < elo(B)), while still being able to predetermine the
strength of the test ( Type I and Type II errors ).