With SPRT, Michel has a very nice simplification posted here:brtzsnr wrote:Hi, Kai!
I think most people here are interested to know how to implement the stopping rule. Most of the statistics is way beyond me despite taking a few introductory statistics courses. Would it be possible to provide some code?
I'm looking forward to implement this in my testing framework and reduce the number of games by 50%. I'm already using your other idea for SPRT (alpha = 3%, beta = 15%) which saves a lot of time on bad patches.
On my end, I could generate a set of many unbalanced openings (e.g. 1 to 5 random moves from the start + 1m eval). Would that help you in anyway?
http://www.talkchess.com/forum/viewtopi ... 5&start=19
It is still with the same, normal (w,d,l)-variance equivalent to SPRT used in Cutechess-Cli and SF testing, but much cleaner and simpler. To use unbalanced positions one has to use 5-nomial variance, then SPRT is applied identically. To have this, one has to do the following:
--- Play successively side-and-reversed games on opening positions.
--- Count outcomes for each pair of games as one of the 2,3/2,1,1/2,0 results in the 5-nomial.
--- Compute 5-nomial variance. Say for N pairs of games, we have M1 counts of 2 points outcome, M2 counts of 3/2, M3 of 1, M4 of 1/2, M5 of 0, building a 5-nomial. The average score is:
s = (2*M1+3/2*M2+1*M3+1/2*M4+0*M5)/N.
Then the 5-nomial variance is:
(M1*(2-s)^2 + M2*(3/2-s)^2 + M3*(1-s)^2 + M4*(1/2-s)^2 + M5*(0-s)^2) / (2*N).
Input this variance in the LLR given by Michel for stop rule. I should stress that this will significantly shorten matches (or increase resolution) only for draw rates with balanced openings of above 60%. For very high draw rates expected in some 10 years, it will be extremely beneficial. I don't know what draw rates you have in self games from balanced positions with the development of Zurichess, but it seems likely as it gets stronger, you will hit a high draw ratio too.
I also have built an EPD file of about 11,000 unbalanced opening positions in the range of unbalance 80cp-120cp of SF eval in 100ms. It seems an adequate unbalance to be used. I uploaded the EPD file here:
http://s000.tinyupload.com/?file_id=129 ... 1867711000