Don wrote: lucasart wrote:
I have not given cutechess-cli a try yet since I had long ago developed my own autotesting system. But I should try it - I understand it works quite well.
I assume it will work from Linux too, right?
Yes, it's avilable on github. You need to install Qt to compile it though
It has basically 3 projects:
- a library with all common functions
- a cli tournament manager. very powerful and feature rich.
- a GUI. very nice, but in the relatively early stages of developpement. what is there is really neat, but there's a lot missing in terms of features.
You can also program the SPRT algorithm yourself, it's not complicated. Have a look at sprt.h and sprt.cpp in cutechess (library). The only major thing that needs improving on it is the dynamic calibration of the drawelo value. This is important, especially in self testing, where the drawelo value is typically much higher, than the 97.3 default value (which came from BayesElo and was calibrated a long time ago).
The system I created with simulation also simulates the draw percentages we experiment in our self testing. Can the drawelo be calculated from that somehow?
Yes, although one has to be careful not to draw early conclusion from a drawelo calculated over a small sample.
Perhaps a weighted average between th initial value (analytically derived from drawelo = 100 and elodiff=0) and the estimated value (analytically derived from the results of the sample). A reasonable weighting could be something like this
- x = 1/log2(nb_draws): weight for the initial value
- 1-x for the estimated value, so as log2(nb_draw) grows we trust the sample more and more
Let's look at a practical example. You play K0 (Komodo0) against K1 (Komnodo1), and get the following results: 520-500-1600 [N=2620 games, 520 wins, 500 losses, 1600 draws]
Your sample estimated values are:
p(win) = 520/2620 = 19.85%
p(loss) = 500/2620 = 19.08%
p(draw) = 1600/2620 = 61.07%
Now here's the "drawelo equation" (in its simplified form as we don't care about color biais, as I suppose you play alternatively W/B)
- let L(x) = 1/(1+10^(x/400))
- p(win) = L(-elo+drawelo)
- p(loss) = L(+elo+drawelo)
Now you can solve this equation to get (elo,drawelo) from the sample. elo represents the estimation of elo(K1)-elo(K0).
elo = 2.65
drawelo = 247
and as the sample grows you can trust this value more and more rather then the initial value 100.
anyway, to be verified & tested, but that's the idea
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.