SPRT questions

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
Uri Blass
Posts: 8368
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

SPRT questions

Post by Uri Blass » Fri May 15, 2015 5:06 am

one result from the stockfish framework

LLR: 0.72 (-2.94,2.94) [0.00,4.50]
Total: 233707 W: 38339 L: 37483 D: 157885

1)Is there a simple calculator to check if this result is enough to pass
[0.00 b] for some smaller bound than 4.50?

2)What happens if people make a test that simply stop if you pass SPRT
with bounds [0,b] for some positive bound for b<=6 and fail if you fail SPRT with bounds [0,b] for some positive bound for b<=6.

Can people calculate what is the theoretical probability for a regression of 1 elo to pass the test and what is the probability of 0 elo to pass the test?

Same also for the expected number of games and what is the worst case.

People claim that SPRT is the best but it is not clear for what
and in the example of Lucas's patch it is not clear to me if it is not simply better to accept the patch of Lucas and stop the test.

What is the important additional information that we get if the test pass or fail.

We may not know that the patch is positive with 95% confidence but I think that we can know that there is even not 0.5 elo regression with 95% confidence for every possible result and at the same time it is possible that we have 1.5 elo improvement for every possible result.

3)Did somebody test if the worst case really behave in the way that SPRT expects?

A possible way is to test the program against itself(so we know 0 elo is the right result) with SPRT(-b,b) many times to see if the distribution of the length of games is really the distribution that theory expect.

I suspect that the worse case is practically worse at least if both programs play the same opening with white and black every time
and I see no reason not to do it because not doing it increase the variety of the result and it is better to reduce the variety of the result(when of course using SPRT is not the correct way to continue because the assumption of independent results is not correct)

Zenmastur
Posts: 272
Joined: Sat May 31, 2014 6:28 am

Re: SPRT questions

Post by Zenmastur » Fri May 15, 2015 8:59 am

Kai Laskos is working on these problems. See the thread labeled "
Maximum ELO gain per test game played?" I would link it but I don't know how.

Apparently the model set-up is more tedious than he anticipated, but I believe he has the equations in hand.

See his last 4 or 5 posts.

Regards,

Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.

User avatar
Ajedrecista
Posts: 1376
Joined: Wed Jul 13, 2011 7:04 pm
Location: Madrid, Spain.
Contact:

Re: SPRT questions.

Post by Ajedrecista » Fri May 15, 2015 11:02 am

Hello Uri:

I will try to answer some of your questions in a random order:

------------------------
Uri Blass wrote:Can people calculate what is the theoretical probability for a regression of 1 elo to pass the test and what is the probability of 0 elo to pass the test?

Same also for the expected number of games and what is the worst case.
Yes, it is possible thanks to Michel's Python script sprta.py (he also made a C file so you can compile it and build an executable). Please take a look here for more details. It returns the probability of pass and the expected number of games.

The worst case (the largest expected number of games) is when Bayeselo = (Bayeselo_0 + Bayeselo_1)/2 in a SPRT(Bayeselo_0, Bayeselo_1) case (Bayeselo_0 < Bayeselo_1 always). Michel's script has an special output: Elo gain [logistic Elo, i.e. Elo] and SPRT bounds [Bayeselo].

------------------------
Uri Blass wrote:one result from the stockfish framework

LLR: 0.72 (-2.94,2.94) [0.00,4.50]
Total: 233707 W: 38339 L: 37483 D: 157885

1)Is there a simple calculator to check if this result is enough to pass
[0.00 b] for some smaller bound than 4.50?
I wrote a LLR calculator long time ago. But you must keep in mind that SPRT is sequential so the history of the games counts (it is not the same WWWLLL than WLWLWL). The answer to your question is yes, it exists such calculator.

In SF testing framework: alpha = 0.05 = beta in SPRT. So only changing Bayeselo_1:

Code: Select all

Lower bound for LLR&#58; -2.9444
Upper bound for LLR&#58;  2.9444
 
----------------------------
 
Games&#58;     233707
 
Wins&#58;       38339 &#40;16.40 %).
Loses&#58;      37483 &#40;16.04 %).
Draws&#58;     157885 &#40;67.56 %).
 
bayeselo&#58;     2.3410
drawelo&#58;    285.2261

----------------------------

LLR&#91;SPRT&#40;0, 4.5&#41;&#93; ~ 0.7222
LLR&#91;SPRT&#40;0, 4.4&#41;&#93; ~ 1.0941
LLR&#91;SPRT&#40;0, 4.3&#41;&#93; ~ 1.4484
LLR&#91;SPRT&#40;0, 4.2&#41;&#93; ~ 1.7851
LLR&#91;SPRT&#40;0, 4.1&#41;&#93; ~ 2.1041
LLR&#91;SPRT&#40;0, 4&#41;&#93;   ~ 2.4055
LLR&#91;SPRT&#40;0, 3.9&#41;&#93; ~ 2.6892
LLR&#91;SPRT&#40;0, 3.8&#41;&#93; ~ 2.9553
LLR&#91;SPRT&#40;0, 3.7&#41;&#93; ~ 3.2038
LLR&#91;SPRT&#40;0, 3.6&#41;&#93; ~ 3.4346
LLR&#91;SPRT&#40;0, 3.5&#41;&#93; ~ 3.6478

Interpolating b with Regula-Falsi method, then let the LLR calculator to compute LLR&#58;
http&#58;//en.wikipedia.org/wiki/False_position_method&#93;Regula-Falsi method
&#40;And rounding all the inputs and outputs to 1e-4&#41;&#58;

LLR&#91;SPRT&#40;0, 3.8041&#41;&#93; ~ 2.9447
LLR&#91;SPRT&#40;0, 3.8042&#41;&#93; ~ 2.9445
LLR&#91;SPRT&#40;0, 3.8043&#41;&#93; ~ 2.9442
In this case: 3.8042 < b_critical < 3.8043.

If I assume a fixed Bayeselo gain of 2.341 Bayeselo and a drawelo parameter of 285.2261 (computed from the sample of 233707 games), I ran 10000 simulations with a modified version of my SPRT simulator, starting with {wins, loses, draws} = {38339, 37483, 157885} instead of {0, 0, 0}. Here is a summary of my results with SPRT(0, 4.5):

Code: Select all

&#91;...&#93;
 9996/ 10000    Passes&#58;   6556    Fails&#58;   3440    <Games>/simulation&#58;  286641
 9997/ 10000    Passes&#58;   6556    Fails&#58;   3441    <Games>/simulation&#58;  286643
 9998/ 10000    Passes&#58;   6556    Fails&#58;   3442    <Games>/simulation&#58;  286647
 9999/ 10000    Passes&#58;   6557    Fails&#58;   3442    <Games>/simulation&#58;  286656
10000/ 10000    Passes&#58;   6558    Fails&#58;   3442    <Games>/simulation&#58;  286658

Shortest simulation&#58; 235136 games &#40;simulation 6661&#41;&#58; +38630 -37666 =158840.
Longest simulation&#58;  679262 games &#40;simulation 5685&#41;&#58; +111204 -108951 =459107.

Average number of games per simulation&#58; 286658
Median of the distribution&#58;             272932 (+44827 -43730 =184375&#41;.

There are 3442 simulations with score > 50% that failed SPRT.
There are    0 simulations with score = 50% that failed SPRT.

Distribution of the length of simulations&#58;

From  235000 to  235999 games&#58;      3 simulations (  0.03 %); accumulated&#58;   0.03 %.
From  236000 to  236999 games&#58;     14 simulations (  0.14 %); accumulated&#58;   0.17 %.
From  237000 to  237999 games&#58;     48 simulations (  0.48 %); accumulated&#58;   0.65 %.
&#91;...&#93;
From  243000 to  243999 games&#58;    176 simulations (  1.76 %); accumulated&#58;   8.47 %.
From  244000 to  244999 games&#58;    176 simulations (  1.76 %); accumulated&#58;  10.23 %.
&#91;...&#93;
From  249000 to  249999 games&#58;    160 simulations (  1.60 %); accumulated&#58;  18.65 %.
From  250000 to  250999 games&#58;    179 simulations (  1.79 %); accumulated&#58;  20.44 %.
&#91;...&#93;
From  256000 to  256999 games&#58;    157 simulations (  1.57 %); accumulated&#58;  29.87 %.
From  257000 to  257999 games&#58;    126 simulations (  1.26 %); accumulated&#58;  31.13 %.
&#91;...&#93;
From  263000 to  263999 games&#58;    133 simulations (  1.33 %); accumulated&#58;  39.57 %.
From  264000 to  264999 games&#58;    132 simulations (  1.32 %); accumulated&#58;  40.89 %.
&#91;...&#93;
From  271000 to  271999 games&#58;    133 simulations (  1.33 %); accumulated&#58;  48.94 %.
From  272000 to  272999 games&#58;    113 simulations (  1.13 %); accumulated&#58;  50.07 %.
&#91;...&#93;
From  282000 to  282999 games&#58;     76 simulations (  0.76 %); accumulated&#58;  59.62 %.
From  283000 to  283999 games&#58;     93 simulations (  0.93 %); accumulated&#58;  60.55 %.
&#91;...&#93;
From  296000 to  296999 games&#58;     55 simulations (  0.55 %); accumulated&#58;  69.93 %.
From  297000 to  297999 games&#58;     66 simulations (  0.66 %); accumulated&#58;  70.59 %.
&#91;...&#93;
From  314000 to  314999 games&#58;     37 simulations (  0.37 %); accumulated&#58;  79.85 %.
From  315000 to  315999 games&#58;     45 simulations (  0.45 %); accumulated&#58;  80.30 %.
&#91;...&#93;
From  344000 to  344999 games&#58;     18 simulations (  0.18 %); accumulated&#58;  89.97 %.
From  345000 to  345999 games&#58;     22 simulations (  0.22 %); accumulated&#58;  90.19 %.
&#91;...&#93;
From  618000 to  618999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  619000 to  619999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.99 %.
From  620000 to  620999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
&#91;...&#93;
From  678000 to  678999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
From  679000 to  679999 games&#58;      1 simulation  (  0.01 %); accumulated&#58; 100.00 %.
------------------------
Uri Blass wrote:3)Did somebody test if the worst case really behave in the way that SPRT expects?

A possible way is to test the program against itself(so we know 0 elo is the right result) with SPRT(-b,b) many times to see if the distribution of the length of games is really the distribution that theory expect.
I ran an example SPRT(-3, 3) (a Bayeselo span of 3 - (-3) = 6 Bayeselo) with an expected gain of 0 Elo. I randomly choosed an a priori drawelo parameter of 240. With alpha = 0.05 = beta and 10000 simulations again:

Code: Select all

In theory&#58; passes = fails = 50%.

My results&#58;

10000/ 10000    Passes&#58;   5010    Fails&#58;   4990    <Games>/simulation&#58;   28899

Shortest simulation&#58;    1823 games &#40;simulation 4959&#41;&#58; +325 -433 =1065.
Longest simulation&#58;   241322 games &#40;simulation 1943&#41;&#58; +48511 -48618 =144193.

Average number of games per simulation&#58;   28899
Median of the distribution&#58;               21708 (+4505 -4397 =12806&#41;.

Distribution of the length of simulations&#58;

From    1000 to    1999 games&#58;      3 simulations (  0.03 %); accumulated&#58;   0.03 %.
From    2000 to    2999 games&#58;     34 simulations (  0.34 %); accumulated&#58;   0.37 %.
From    3000 to    3999 games&#58;    111 simulations (  1.11 %); accumulated&#58;   1.48 %.
From    4000 to    4999 games&#58;     84 simulations (  0.84 %); accumulated&#58;   2.32 %.
From    5000 to    5999 games&#58;    300 simulations (  3.00 %); accumulated&#58;   5.32 %.
From    6000 to    6999 games&#58;    135 simulations (  1.35 %); accumulated&#58;   6.67 %.
From    7000 to    7999 games&#58;    345 simulations (  3.45 %); accumulated&#58;  10.12 %.
From    8000 to    8999 games&#58;    383 simulations (  3.83 %); accumulated&#58;  13.95 %.
From    9000 to    9999 games&#58;    354 simulations (  3.54 %); accumulated&#58;  17.49 %.
From   10000 to   10999 games&#58;    322 simulations (  3.22 %); accumulated&#58;  20.71 %.
From   11000 to   11999 games&#58;    286 simulations (  2.86 %); accumulated&#58;  23.57 %.
From   12000 to   12999 games&#58;    307 simulations (  3.07 %); accumulated&#58;  26.64 %.
From   13000 to   13999 games&#58;    306 simulations (  3.06 %); accumulated&#58;  29.70 %.
From   14000 to   14999 games&#58;    277 simulations (  2.77 %); accumulated&#58;  32.47 %.
From   15000 to   15999 games&#58;    312 simulations (  3.12 %); accumulated&#58;  35.59 %.
From   16000 to   16999 games&#58;    279 simulations (  2.79 %); accumulated&#58;  38.38 %.
From   17000 to   17999 games&#58;    281 simulations (  2.81 %); accumulated&#58;  41.19 %.
From   18000 to   18999 games&#58;    244 simulations (  2.44 %); accumulated&#58;  43.63 %.
From   19000 to   19999 games&#58;    219 simulations (  2.19 %); accumulated&#58;  45.82 %.
From   20000 to   20999 games&#58;    243 simulations (  2.43 %); accumulated&#58;  48.25 %.
From   21000 to   21999 games&#58;    242 simulations (  2.42 %); accumulated&#58;  50.67 %.
From   22000 to   22999 games&#58;    187 simulations (  1.87 %); accumulated&#58;  52.54 %.
From   23000 to   23999 games&#58;    213 simulations (  2.13 %); accumulated&#58;  54.67 %.
From   24000 to   24999 games&#58;    203 simulations (  2.03 %); accumulated&#58;  56.70 %.
From   25000 to   25999 games&#58;    199 simulations (  1.99 %); accumulated&#58;  58.69 %.
From   26000 to   26999 games&#58;    176 simulations (  1.76 %); accumulated&#58;  60.45 %.
From   27000 to   27999 games&#58;    131 simulations (  1.31 %); accumulated&#58;  61.76 %.
From   28000 to   28999 games&#58;    177 simulations (  1.77 %); accumulated&#58;  63.53 %.
From   29000 to   29999 games&#58;    154 simulations (  1.54 %); accumulated&#58;  65.07 %.
From   30000 to   30999 games&#58;    153 simulations (  1.53 %); accumulated&#58;  66.60 %.
From   31000 to   31999 games&#58;    134 simulations (  1.34 %); accumulated&#58;  67.94 %.
From   32000 to   32999 games&#58;    137 simulations (  1.37 %); accumulated&#58;  69.31 %.
From   33000 to   33999 games&#58;    131 simulations (  1.31 %); accumulated&#58;  70.62 %.
From   34000 to   34999 games&#58;    139 simulations (  1.39 %); accumulated&#58;  72.01 %.
From   35000 to   35999 games&#58;    111 simulations (  1.11 %); accumulated&#58;  73.12 %.
From   36000 to   36999 games&#58;    122 simulations (  1.22 %); accumulated&#58;  74.34 %.
From   37000 to   37999 games&#58;     94 simulations (  0.94 %); accumulated&#58;  75.28 %.
From   38000 to   38999 games&#58;     85 simulations (  0.85 %); accumulated&#58;  76.13 %.
From   39000 to   39999 games&#58;    104 simulations (  1.04 %); accumulated&#58;  77.17 %.
From   40000 to   40999 games&#58;     94 simulations (  0.94 %); accumulated&#58;  78.11 %.
From   41000 to   41999 games&#58;     84 simulations (  0.84 %); accumulated&#58;  78.95 %.
From   42000 to   42999 games&#58;     95 simulations (  0.95 %); accumulated&#58;  79.90 %.
From   43000 to   43999 games&#58;     84 simulations (  0.84 %); accumulated&#58;  80.74 %.
From   44000 to   44999 games&#58;     88 simulations (  0.88 %); accumulated&#58;  81.62 %.
From   45000 to   45999 games&#58;     74 simulations (  0.74 %); accumulated&#58;  82.36 %.
From   46000 to   46999 games&#58;     82 simulations (  0.82 %); accumulated&#58;  83.18 %.
From   47000 to   47999 games&#58;     62 simulations (  0.62 %); accumulated&#58;  83.80 %.
From   48000 to   48999 games&#58;     70 simulations (  0.70 %); accumulated&#58;  84.50 %.
From   49000 to   49999 games&#58;     78 simulations (  0.78 %); accumulated&#58;  85.28 %.
From   50000 to   50999 games&#58;     49 simulations (  0.49 %); accumulated&#58;  85.77 %.
From   51000 to   51999 games&#58;     66 simulations (  0.66 %); accumulated&#58;  86.43 %.
From   52000 to   52999 games&#58;     69 simulations (  0.69 %); accumulated&#58;  87.12 %.
From   53000 to   53999 games&#58;     46 simulations (  0.46 %); accumulated&#58;  87.58 %.
From   54000 to   54999 games&#58;     57 simulations (  0.57 %); accumulated&#58;  88.15 %.
From   55000 to   55999 games&#58;     50 simulations (  0.50 %); accumulated&#58;  88.65 %.
From   56000 to   56999 games&#58;     38 simulations (  0.38 %); accumulated&#58;  89.03 %.
From   57000 to   57999 games&#58;     49 simulations (  0.49 %); accumulated&#58;  89.52 %.
From   58000 to   58999 games&#58;     42 simulations (  0.42 %); accumulated&#58;  89.94 %.
From   59000 to   59999 games&#58;     38 simulations (  0.38 %); accumulated&#58;  90.32 %.
From   60000 to   60999 games&#58;     46 simulations (  0.46 %); accumulated&#58;  90.78 %.
From   61000 to   61999 games&#58;     38 simulations (  0.38 %); accumulated&#58;  91.16 %.
From   62000 to   62999 games&#58;     38 simulations (  0.38 %); accumulated&#58;  91.54 %.
From   63000 to   63999 games&#58;     29 simulations (  0.29 %); accumulated&#58;  91.83 %.
From   64000 to   64999 games&#58;     33 simulations (  0.33 %); accumulated&#58;  92.16 %.
From   65000 to   65999 games&#58;     32 simulations (  0.32 %); accumulated&#58;  92.48 %.
From   66000 to   66999 games&#58;     21 simulations (  0.21 %); accumulated&#58;  92.69 %.
From   67000 to   67999 games&#58;     27 simulations (  0.27 %); accumulated&#58;  92.96 %.
From   68000 to   68999 games&#58;     34 simulations (  0.34 %); accumulated&#58;  93.30 %.
From   69000 to   69999 games&#58;     24 simulations (  0.24 %); accumulated&#58;  93.54 %.
From   70000 to   70999 games&#58;     34 simulations (  0.34 %); accumulated&#58;  93.88 %.
From   71000 to   71999 games&#58;     21 simulations (  0.21 %); accumulated&#58;  94.09 %.
From   72000 to   72999 games&#58;     22 simulations (  0.22 %); accumulated&#58;  94.31 %.
From   73000 to   73999 games&#58;     18 simulations (  0.18 %); accumulated&#58;  94.49 %.
From   74000 to   74999 games&#58;     14 simulations (  0.14 %); accumulated&#58;  94.63 %.
From   75000 to   75999 games&#58;     27 simulations (  0.27 %); accumulated&#58;  94.90 %.
From   76000 to   76999 games&#58;     19 simulations (  0.19 %); accumulated&#58;  95.09 %.
From   77000 to   77999 games&#58;     16 simulations (  0.16 %); accumulated&#58;  95.25 %.
From   78000 to   78999 games&#58;     23 simulations (  0.23 %); accumulated&#58;  95.48 %.
From   79000 to   79999 games&#58;     17 simulations (  0.17 %); accumulated&#58;  95.65 %.
From   80000 to   80999 games&#58;     20 simulations (  0.20 %); accumulated&#58;  95.85 %.
From   81000 to   81999 games&#58;     22 simulations (  0.22 %); accumulated&#58;  96.07 %.
From   82000 to   82999 games&#58;     13 simulations (  0.13 %); accumulated&#58;  96.20 %.
From   83000 to   83999 games&#58;     22 simulations (  0.22 %); accumulated&#58;  96.42 %.
From   84000 to   84999 games&#58;     10 simulations (  0.10 %); accumulated&#58;  96.52 %.
From   85000 to   85999 games&#58;     10 simulations (  0.10 %); accumulated&#58;  96.62 %.
From   86000 to   86999 games&#58;     16 simulations (  0.16 %); accumulated&#58;  96.78 %.
From   87000 to   87999 games&#58;     16 simulations (  0.16 %); accumulated&#58;  96.94 %.
From   88000 to   88999 games&#58;     14 simulations (  0.14 %); accumulated&#58;  97.08 %.
From   89000 to   89999 games&#58;     15 simulations (  0.15 %); accumulated&#58;  97.23 %.
From   90000 to   90999 games&#58;     13 simulations (  0.13 %); accumulated&#58;  97.36 %.
From   91000 to   91999 games&#58;      9 simulations (  0.09 %); accumulated&#58;  97.45 %.
From   92000 to   92999 games&#58;     12 simulations (  0.12 %); accumulated&#58;  97.57 %.
From   93000 to   93999 games&#58;     10 simulations (  0.10 %); accumulated&#58;  97.67 %.
From   94000 to   94999 games&#58;      8 simulations (  0.08 %); accumulated&#58;  97.75 %.
From   95000 to   95999 games&#58;     11 simulations (  0.11 %); accumulated&#58;  97.86 %.
From   96000 to   96999 games&#58;      6 simulations (  0.06 %); accumulated&#58;  97.92 %.
From   97000 to   97999 games&#58;      5 simulations (  0.05 %); accumulated&#58;  97.97 %.
From   98000 to   98999 games&#58;      9 simulations (  0.09 %); accumulated&#58;  98.06 %.
From   99000 to   99999 games&#58;      8 simulations (  0.08 %); accumulated&#58;  98.14 %.
From  100000 to  100999 games&#58;      4 simulations (  0.04 %); accumulated&#58;  98.18 %.
From  101000 to  101999 games&#58;     12 simulations (  0.12 %); accumulated&#58;  98.30 %.
From  102000 to  102999 games&#58;      9 simulations (  0.09 %); accumulated&#58;  98.39 %.
From  103000 to  103999 games&#58;      8 simulations (  0.08 %); accumulated&#58;  98.47 %.
From  104000 to  104999 games&#58;      9 simulations (  0.09 %); accumulated&#58;  98.56 %.
From  105000 to  105999 games&#58;      8 simulations (  0.08 %); accumulated&#58;  98.64 %.
From  106000 to  106999 games&#58;      6 simulations (  0.06 %); accumulated&#58;  98.70 %.
From  107000 to  107999 games&#58;      6 simulations (  0.06 %); accumulated&#58;  98.76 %.
From  108000 to  108999 games&#58;      5 simulations (  0.05 %); accumulated&#58;  98.81 %.
From  109000 to  109999 games&#58;      3 simulations (  0.03 %); accumulated&#58;  98.84 %.
From  110000 to  110999 games&#58;      4 simulations (  0.04 %); accumulated&#58;  98.88 %.
From  111000 to  111999 games&#58;      4 simulations (  0.04 %); accumulated&#58;  98.92 %.
From  112000 to  112999 games&#58;      2 simulations (  0.02 %); accumulated&#58;  98.94 %.
From  113000 to  113999 games&#58;      4 simulations (  0.04 %); accumulated&#58;  98.98 %.
From  114000 to  114999 games&#58;      4 simulations (  0.04 %); accumulated&#58;  99.02 %.
From  115000 to  115999 games&#58;      7 simulations (  0.07 %); accumulated&#58;  99.09 %.
From  116000 to  116999 games&#58;      3 simulations (  0.03 %); accumulated&#58;  99.12 %.
From  117000 to  117999 games&#58;      5 simulations (  0.05 %); accumulated&#58;  99.17 %.
From  118000 to  118999 games&#58;      3 simulations (  0.03 %); accumulated&#58;  99.20 %.
From  119000 to  119999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.21 %.
From  120000 to  120999 games&#58;      5 simulations (  0.05 %); accumulated&#58;  99.26 %.
From  121000 to  121999 games&#58;      4 simulations (  0.04 %); accumulated&#58;  99.30 %.
From  122000 to  122999 games&#58;      2 simulations (  0.02 %); accumulated&#58;  99.32 %.
From  123000 to  123999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.33 %.
From  124000 to  124999 games&#58;      3 simulations (  0.03 %); accumulated&#58;  99.36 %.
From  125000 to  125999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.36 %.
From  126000 to  126999 games&#58;      2 simulations (  0.02 %); accumulated&#58;  99.38 %.
From  127000 to  127999 games&#58;      4 simulations (  0.04 %); accumulated&#58;  99.42 %.
From  128000 to  128999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.43 %.
From  129000 to  129999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.44 %.
From  130000 to  130999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.44 %.
From  131000 to  131999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.45 %.
From  132000 to  132999 games&#58;      2 simulations (  0.02 %); accumulated&#58;  99.47 %.
From  133000 to  133999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.48 %.
From  134000 to  134999 games&#58;      3 simulations (  0.03 %); accumulated&#58;  99.51 %.
From  135000 to  135999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.51 %.
From  136000 to  136999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.52 %.
From  137000 to  137999 games&#58;      3 simulations (  0.03 %); accumulated&#58;  99.55 %.
From  138000 to  138999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.55 %.
From  139000 to  139999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.56 %.
From  140000 to  140999 games&#58;      3 simulations (  0.03 %); accumulated&#58;  99.59 %.
From  141000 to  141999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.59 %.
From  142000 to  142999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.59 %.
From  143000 to  143999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.60 %.
From  144000 to  144999 games&#58;      2 simulations (  0.02 %); accumulated&#58;  99.62 %.
From  145000 to  145999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.62 %.
From  146000 to  146999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.62 %.
From  147000 to  147999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.63 %.
From  148000 to  148999 games&#58;      3 simulations (  0.03 %); accumulated&#58;  99.66 %.
From  149000 to  149999 games&#58;      2 simulations (  0.02 %); accumulated&#58;  99.68 %.
From  150000 to  150999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.69 %.
From  151000 to  151999 games&#58;      2 simulations (  0.02 %); accumulated&#58;  99.71 %.
From  152000 to  152999 games&#58;      2 simulations (  0.02 %); accumulated&#58;  99.73 %.
From  153000 to  153999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.74 %.
From  154000 to  154999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.75 %.
From  155000 to  155999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.75 %.
From  156000 to  156999 games&#58;      3 simulations (  0.03 %); accumulated&#58;  99.78 %.
From  157000 to  157999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.79 %.
From  158000 to  158999 games&#58;      2 simulations (  0.02 %); accumulated&#58;  99.81 %.
From  159000 to  159999 games&#58;      2 simulations (  0.02 %); accumulated&#58;  99.83 %.
From  160000 to  160999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.84 %.
From  161000 to  161999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.85 %.
From  162000 to  162999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.85 %.
From  163000 to  163999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.86 %.
From  164000 to  164999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.87 %.
From  165000 to  165999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.87 %.
From  166000 to  166999 games&#58;      2 simulations (  0.02 %); accumulated&#58;  99.89 %.
From  167000 to  167999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.90 %.
From  168000 to  168999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.90 %.
From  169000 to  169999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.90 %.
From  170000 to  170999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.90 %.
From  171000 to  171999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.90 %.
From  172000 to  172999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.90 %.
From  173000 to  173999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.91 %.
From  174000 to  174999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.91 %.
From  175000 to  175999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.91 %.
From  176000 to  176999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.91 %.
From  177000 to  177999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.91 %.
From  178000 to  178999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.91 %.
From  179000 to  179999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.91 %.
From  180000 to  180999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.92 %.
From  181000 to  181999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.93 %.
From  182000 to  182999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.93 %.
From  183000 to  183999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.93 %.
From  184000 to  184999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.93 %.
From  185000 to  185999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.93 %.
From  186000 to  186999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.93 %.
From  187000 to  187999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.94 %.
From  188000 to  188999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.95 %.
From  189000 to  189999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.95 %.
From  190000 to  190999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.95 %.
From  191000 to  191999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.95 %.
From  192000 to  192999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.95 %.
From  193000 to  193999 games&#58;      2 simulations (  0.02 %); accumulated&#58;  99.97 %.
From  194000 to  194999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.97 %.
From  195000 to  195999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.98 %.
From  196000 to  196999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  197000 to  197999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  198000 to  198999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  199000 to  199999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  200000 to  200999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  201000 to  201999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  202000 to  202999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  203000 to  203999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  204000 to  204999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  205000 to  205999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  206000 to  206999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  207000 to  207999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  208000 to  208999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  209000 to  209999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  210000 to  210999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  211000 to  211999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  212000 to  212999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  213000 to  213999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  214000 to  214999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  215000 to  215999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  216000 to  216999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  217000 to  217999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  218000 to  218999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  219000 to  219999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  220000 to  220999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  221000 to  221999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  222000 to  222999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  223000 to  223999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  224000 to  224999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.98 %.
From  225000 to  225999 games&#58;      1 simulation  (  0.01 %); accumulated&#58;  99.99 %.
From  226000 to  226999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
From  227000 to  227999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
From  228000 to  228999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
From  229000 to  229999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
From  230000 to  230999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
From  231000 to  231999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
From  232000 to  232999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
From  233000 to  233999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
From  234000 to  234999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
From  235000 to  235999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
From  236000 to  236999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
From  237000 to  237999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
From  238000 to  238999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
From  239000 to  239999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
From  240000 to  240999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
From  241000 to  241999 games&#58;      1 simulation  (  0.01 %); accumulated&#58; 100.00 %.
I let you to decide if the distribution is what you expect or not.

------------------------

Sorry for the length of this post. I hope no typos.

Regards from Spain.

Ajedrecista.

User avatar
lucasart
Posts: 3031
Joined: Mon May 31, 2010 11:29 am
Full name: lucasart
Contact:

Re: SPRT questions

Post by lucasart » Fri May 15, 2015 11:45 am

You can't change the condition of the test after looking at the results. Arguing that the test would have passed an SPRT(0,X) where X is conviniently chosen ex-post is not serious.

The real problem is that there is only a finite (small) number of openings, and eventually we are just repeating the same games over and over, in shuffled order. I'm not sure playing another 250k games really adds any information. Perhaps it makes sense to stop the test and: (i) commit the patch (ii) toss a coin to decide (iii) or stop the test and not commit.

I'll let Joona decide.

Here are the numbers for SPRT(0,4.5):

Code: Select all

$ ./sprt 0 3 0.25 50000 290 0 4.5
     Elo  BayesElo     %Pass   Avg run      Q50%      Q90%      Q95%      Q99%
    0.00      0.00    0.0499     35203     28057     68311     85363    125770
    0.25      0.47    0.0896     40543     31742     80313    101045    149184
    0.50      0.94    0.1525     46648     36131     93453    118239    175859
    0.75      1.41    0.2494     52208     39939    106370    135075    201357
    1.00      1.87    0.3793     56441     42757    116390    147578    221303
    1.25      2.34    0.5290     57315     43273    118091    150719    224661
    1.50      2.81    0.6744     55121     41618    113910    144474    215451
    1.75      3.28    0.7919     50263     38675    101996    129430    194491
    2.00      3.75    0.8738     44203     34187     88400    111375    167143
    2.25      4.22    0.9276     38438     30177     75382     94763    139902
    2.50      4.69    0.9593     33373     26715     64270     80319    118185
    2.75      5.15    0.9777     29056     23711     54670     67578     99416
    3.00      5.62    0.9881     25580     21264     47230     58010     82981
You can see that we're already slightly above the worst 99% quantile. For example, if the true elo value (which we don't know) is 1.25, then there is 1% chance that the run time is 224,661 or more...

So the Gods of Randomness really are against me :x

SPRT simulator (multi-threaded, very fast)
https://github.com/lucasart/sprt
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.

User avatar
Ajedrecista
Posts: 1376
Joined: Wed Jul 13, 2011 7:04 pm
Location: Madrid, Spain.
Contact:

Re: SPRT questions.

Post by Ajedrecista » Fri May 15, 2015 12:56 pm

Hello Lucas:
lucasart wrote:Here are the numbers for SPRT(0,4.5):

Code: Select all

$ ./sprt 0 3 0.25 50000 290 0 4.5 
     Elo  BayesElo     %Pass   Avg run      Q50%      Q90%      Q95%      Q99% 
    0.00      0.00    0.0499     35203     28057     68311     85363    125770 
    0.25      0.47    0.0896     40543     31742     80313    101045    149184 
    0.50      0.94    0.1525     46648     36131     93453    118239    175859 
    0.75      1.41    0.2494     52208     39939    106370    135075    201357 
    1.00      1.87    0.3793     56441     42757    116390    147578    221303 
    1.25      2.34    0.5290     57315     43273    118091    150719    224661 
    1.50      2.81    0.6744     55121     41618    113910    144474    215451 
    1.75      3.28    0.7919     50263     38675    101996    129430    194491 
    2.00      3.75    0.8738     44203     34187     88400    111375    167143 
    2.25      4.22    0.9276     38438     30177     75382     94763    139902 
    2.50      4.69    0.9593     33373     26715     64270     80319    118185 
    2.75      5.15    0.9777     29056     23711     54670     67578     99416 
    3.00      5.62    0.9881     25580     21264     47230     58010     82981
I ran SPRT(0, 4.5) with Elo = 0 just to compare our results. After 50000 simulations:

Code: Select all

&#91;...&#93;
 50000/ 50000    Passes&#58;   2516    Fails&#58;  47484    <Games>/simulation&#58;   35718

Shortest simulation&#58;    2330 games &#40;simulation  27400&#41;.
Longest simulation&#58;   313906 games &#40;simulation  30444&#41;.

Average number of games per simulation&#58;   35718
Median of the distribution&#58;               28462

There are  15276 simulations with score > 50% that failed SPRT.
There are    171 simulations with score = 50% that failed SPRT.

Estimated elapsed time&#58;  1270.26 seconds.
Speed&#58; 1405947 games/second.
Distribution of the length of simulations:

Code: Select all

&#91;...&#93;
From   27000 to   27999 games&#58;    990 simulations (  1.98 %); accumulated&#58;  49.12 %.
From   28000 to   28999 games&#58;    948 simulations (  1.90 %); accumulated&#58;  51.01 %.
&#91;...&#93;
From   34000 to   34999 games&#58;    784 simulations (  1.57 %); accumulated&#58;  61.43 %.
From   35000 to   35999 games&#58;    752 simulations (  1.50 %); accumulated&#58;  62.94 %.
&#91;...&#93;
From   68000 to   68999 games&#58;    198 simulations (  0.40 %); accumulated&#58;  89.93 %.
From   69000 to   69999 games&#58;    210 simulations (  0.42 %); accumulated&#58;  90.35 %.
&#91;...&#93;
From   85000 to   85999 games&#58;    101 simulations (  0.20 %); accumulated&#58;  94.87 %.
From   86000 to   86999 games&#58;     94 simulations (  0.19 %); accumulated&#58;  95.06 %.
&#91;...&#93;
From  126000 to  126999 games&#58;     27 simulations (  0.05 %); accumulated&#58;  98.96 %.
From  127000 to  127999 games&#58;     18 simulations (  0.04 %); accumulated&#58;  99.00 %.
&#91;...&#93;
Our results do not contradict. :) There are small differences, as expected.

I must compile your sources to take benefit from multi-thread.

Comparing your results with Michel's script:

Code: Select all

SPRT&#40;0, 4.5&#41;. Results with Michel's script &#40;drawelo = 290&#41;&#58;

     Elo  BayesElo     Pass    Avg run
    0.00      0.00    0.0500     35185
    0.25      0.47    0.0886     40631
    0.50      0.94    0.1521     46616
    0.75      1.41    0.2488     52337
    1.00      1.87    0.3795     56423
    1.25      2.34    0.5303     57484
    1.50      2.81    0.6758     55097
    1.75      3.28    0.7938     50177
    2.00      3.75    0.8766     44215
    2.25      4.22    0.9292     38381
    2.50      4.69    0.9604     33248
    2.75      5.15    0.9781     28960
    3.00      5.62    0.9880     25453
Your results are very, very similar... even more than mine. Anyone still doubt it? :P

Regards from Spain.

Ajedrecista.

User avatar
Ajedrecista
Posts: 1376
Joined: Wed Jul 13, 2011 7:04 pm
Location: Madrid, Spain.
Contact:

Re: SPRT questions.

Post by Ajedrecista » Sun May 17, 2015 10:06 am

Hello:

psq test is now in a pending status. At this moment:

Code: Select all

LLR&#58; 1.79 (-2.94,2.94&#41; &#91;0.00,4.50&#93;
Total&#58; 279297 W&#58; 45731 L&#58; 44667 D&#58; 188899
sprt @ 60+0.05 th 1
drawelo ~ 285.7191 estimated from the sample of 279297 games; Bayeselo gain ~ 2.4396 Bayeselo if I am not wrong. Using alpha = 0.05 = beta and 50000 simulations, starting at +45731 -44667 =188899 instead of +0 -0 =0. Here are my results of simulation of SPRT(0, 4.5):

Code: Select all

&#91;...&#93;
 49996/ 50000    Passes&#58;  42119    Fails&#58;   7877    <Games>/simulation&#58;  313407
 49997/ 50000    Passes&#58;  42120    Fails&#58;   7877    <Games>/simulation&#58;  313407
 49998/ 50000    Passes&#58;  42121    Fails&#58;   7877    <Games>/simulation&#58;  313406
 49999/ 50000    Passes&#58;  42121    Fails&#58;   7878    <Games>/simulation&#58;  313406
 50000/ 50000    Passes&#58;  42122    Fails&#58;   7878    <Games>/simulation&#58;  313406

Shortest simulation&#58;  279744 games &#40;simulation  49024&#41;.
Longest simulation&#58;   775708 games &#40;simulation  38781&#41;.

Average number of games per simulation&#58;  313406
Median of the distribution&#58;              296424

There are   7878 simulations with score > 50% that failed SPRT.
There are      0 simulations with score = 50% that failed SPRT.
So, this test could have right now a probability of pass of circa 84.24% (if my input parameters are valid enough). Some extra info:

Code: Select all

Shortest simulation&#58;    PASS after 279744 games ( +45841  -44722 =189181&#41;.

Medians of simulations&#58; PASS after 296424 games ( +48518  -47341 =200565&#41;.
                        PASS after 296424 games ( +48545  -47368 =200511&#41;.

Longest simulation&#58;     FAIL after 775708 games (+126582 -123997 =525129&#41;.
Summary of the distribution of the length of simulations:

Code: Select all

From  279000 to  279999 games&#58;     14 simulations (  0.03 %); accumulated&#58;   0.03 %.
From  280000 to  280999 games&#58;   1197 simulations (  2.39 %); accumulated&#58;   2.42 %.
From  281000 to  281999 games&#58;   2517 simulations (  5.03 %); accumulated&#58;   7.46 %.
From  282000 to  282999 games&#58;   2706 simulations (  5.41 %); accumulated&#58;  12.87 %.
From  283000 to  283999 games&#58;   2428 simulations (  4.86 %); accumulated&#58;  17.72 %.
From  284000 to  284999 games&#58;   2285 simulations (  4.57 %); accumulated&#58;  22.29 %.
From  285000 to  285999 games&#58;   1951 simulations (  3.90 %); accumulated&#58;  26.20 %.
From  286000 to  286999 games&#58;   1827 simulations (  3.65 %); accumulated&#58;  29.85 %.
From  287000 to  287999 games&#58;   1619 simulations (  3.24 %); accumulated&#58;  33.09 %.
From  288000 to  288999 games&#58;   1370 simulations (  2.74 %); accumulated&#58;  35.83 %.
From  289000 to  289999 games&#58;   1283 simulations (  2.57 %); accumulated&#58;  38.39 %.
From  290000 to  290999 games&#58;   1115 simulations (  2.23 %); accumulated&#58;  40.62 %.
From  291000 to  291999 games&#58;    979 simulations (  1.96 %); accumulated&#58;  42.58 %.
From  292000 to  292999 games&#58;    918 simulations (  1.84 %); accumulated&#58;  44.42 %.
From  293000 to  293999 games&#58;    855 simulations (  1.71 %); accumulated&#58;  46.13 %.
From  294000 to  294999 games&#58;    881 simulations (  1.76 %); accumulated&#58;  47.89 %.
From  295000 to  295999 games&#58;    732 simulations (  1.46 %); accumulated&#58;  49.35 %.
From  296000 to  296999 games&#58;    730 simulations (  1.46 %); accumulated&#58;  50.81 %.
&#91;...&#93;
From  312000 to  312999 games&#58;    398 simulations (  0.80 %); accumulated&#58;  66.70 %.
From  313000 to  313999 games&#58;    372 simulations (  0.74 %); accumulated&#58;  67.44 %.
From  314000 to  314999 games&#58;    341 simulations (  0.68 %); accumulated&#58;  68.12 %.
&#91;...&#93;
From  366000 to  366999 games&#58;    106 simulations (  0.21 %); accumulated&#58;  89.83 %.
From  367000 to  367999 games&#58;    105 simulations (  0.21 %); accumulated&#58;  90.04 %.
&#91;...&#93;
From  398000 to  398999 games&#58;     58 simulations (  0.12 %); accumulated&#58;  94.95 %.
From  399000 to  399999 games&#58;     55 simulations (  0.11 %); accumulated&#58;  95.06 %.
&#91;...&#93;
From  471000 to  471999 games&#58;     11 simulations (  0.02 %); accumulated&#58;  98.99 %.
From  472000 to  472999 games&#58;     10 simulations (  0.02 %); accumulated&#58;  99.01 %.
&#91;...&#93;
From  502000 to  502999 games&#58;      8 simulations (  0.02 %); accumulated&#58;  99.49 %.
From  503000 to  503999 games&#58;      8 simulations (  0.02 %); accumulated&#58;  99.50 %.
From  504000 to  504999 games&#58;      4 simulations (  0.01 %); accumulated&#58;  99.51 %.
&#91;...&#93;
From  511000 to  511999 games&#58;      7 simulations (  0.01 %); accumulated&#58;  99.59 %.
&#91;...&#93;
From  568000 to  568999 games&#58;      2 simulations (  0.00 %); accumulated&#58;  99.89 %.
From  569000 to  569999 games&#58;      1 simulation  (  0.00 %); accumulated&#58;  99.90 %.
From  570000 to  570999 games&#58;      4 simulations (  0.01 %); accumulated&#58;  99.90 %.
&#91;...&#93;
From  732000 to  732999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
From  733000 to  733999 games&#58;      1 simulation  (  0.00 %); accumulated&#58; 100.00 %.
From  734000 to  734999 games&#58;      1 simulation  (  0.00 %); accumulated&#58; 100.00 %.
From  735000 to  735999 games&#58;      0 simulations (  0.00 %); accumulated&#58; 100.00 %.
&#91;...&#93;
From  773000 to  773999 games&#58;      0 simulations (  0.00 %); accumulated&#58; 100.00 %.
From  774000 to  774999 games&#58;      0 simulations (  0.00 %); accumulated&#58; 100.00 %.
From  775000 to  775999 games&#58;      1 simulation  (  0.00 %); accumulated&#58; 100.00 %.
With my input parameters: median - starting point = 17127 games. The end is probably near.

Regards from Spain.

Ajedrecista.

Uri Blass
Posts: 8368
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: SPRT questions.

Post by Uri Blass » Sun May 17, 2015 12:41 pm

Ajedrecista wrote:Hello:

psq test is now in a pending status. At this moment:

Code: Select all

LLR&#58; 1.79 (-2.94,2.94&#41; &#91;0.00,4.50&#93;
Total&#58; 279297 W&#58; 45731 L&#58; 44667 D&#58; 188899
sprt @ 60+0.05 th 1
drawelo ~ 285.7191 estimated from the sample of 279297 games; Bayeselo gain ~ 2.4396 Bayeselo if I am not wrong. Using alpha = 0.05 = beta and 50000 simulations, starting at +45731 -44667 =188899 instead of +0 -0 =0. Here are my results of simulation of SPRT(0, 4.5):

Code: Select all

&#91;...&#93;
 49996/ 50000    Passes&#58;  42119    Fails&#58;   7877    <Games>/simulation&#58;  313407
 49997/ 50000    Passes&#58;  42120    Fails&#58;   7877    <Games>/simulation&#58;  313407
 49998/ 50000    Passes&#58;  42121    Fails&#58;   7877    <Games>/simulation&#58;  313406
 49999/ 50000    Passes&#58;  42121    Fails&#58;   7878    <Games>/simulation&#58;  313406
 50000/ 50000    Passes&#58;  42122    Fails&#58;   7878    <Games>/simulation&#58;  313406

Shortest simulation&#58;  279744 games &#40;simulation  49024&#41;.
Longest simulation&#58;   775708 games &#40;simulation  38781&#41;.

Average number of games per simulation&#58;  313406
Median of the distribution&#58;              296424

There are   7878 simulations with score > 50% that failed SPRT.
There are      0 simulations with score = 50% that failed SPRT.
So, this test could have right now a probability of pass of circa 84.24% (if my input parameters are valid enough). Some extra info:

Code: Select all

Shortest simulation&#58;    PASS after 279744 games ( +45841  -44722 =189181&#41;.

Medians of simulations&#58; PASS after 296424 games ( +48518  -47341 =200565&#41;.
                        PASS after 296424 games ( +48545  -47368 =200511&#41;.

Longest simulation&#58;     FAIL after 775708 games (+126582 -123997 =525129&#41;.
Summary of the distribution of the length of simulations:

Code: Select all

From  279000 to  279999 games&#58;     14 simulations (  0.03 %); accumulated&#58;   0.03 %.
From  280000 to  280999 games&#58;   1197 simulations (  2.39 %); accumulated&#58;   2.42 %.
From  281000 to  281999 games&#58;   2517 simulations (  5.03 %); accumulated&#58;   7.46 %.
From  282000 to  282999 games&#58;   2706 simulations (  5.41 %); accumulated&#58;  12.87 %.
From  283000 to  283999 games&#58;   2428 simulations (  4.86 %); accumulated&#58;  17.72 %.
From  284000 to  284999 games&#58;   2285 simulations (  4.57 %); accumulated&#58;  22.29 %.
From  285000 to  285999 games&#58;   1951 simulations (  3.90 %); accumulated&#58;  26.20 %.
From  286000 to  286999 games&#58;   1827 simulations (  3.65 %); accumulated&#58;  29.85 %.
From  287000 to  287999 games&#58;   1619 simulations (  3.24 %); accumulated&#58;  33.09 %.
From  288000 to  288999 games&#58;   1370 simulations (  2.74 %); accumulated&#58;  35.83 %.
From  289000 to  289999 games&#58;   1283 simulations (  2.57 %); accumulated&#58;  38.39 %.
From  290000 to  290999 games&#58;   1115 simulations (  2.23 %); accumulated&#58;  40.62 %.
From  291000 to  291999 games&#58;    979 simulations (  1.96 %); accumulated&#58;  42.58 %.
From  292000 to  292999 games&#58;    918 simulations (  1.84 %); accumulated&#58;  44.42 %.
From  293000 to  293999 games&#58;    855 simulations (  1.71 %); accumulated&#58;  46.13 %.
From  294000 to  294999 games&#58;    881 simulations (  1.76 %); accumulated&#58;  47.89 %.
From  295000 to  295999 games&#58;    732 simulations (  1.46 %); accumulated&#58;  49.35 %.
From  296000 to  296999 games&#58;    730 simulations (  1.46 %); accumulated&#58;  50.81 %.
&#91;...&#93;
From  312000 to  312999 games&#58;    398 simulations (  0.80 %); accumulated&#58;  66.70 %.
From  313000 to  313999 games&#58;    372 simulations (  0.74 %); accumulated&#58;  67.44 %.
From  314000 to  314999 games&#58;    341 simulations (  0.68 %); accumulated&#58;  68.12 %.
&#91;...&#93;
From  366000 to  366999 games&#58;    106 simulations (  0.21 %); accumulated&#58;  89.83 %.
From  367000 to  367999 games&#58;    105 simulations (  0.21 %); accumulated&#58;  90.04 %.
&#91;...&#93;
From  398000 to  398999 games&#58;     58 simulations (  0.12 %); accumulated&#58;  94.95 %.
From  399000 to  399999 games&#58;     55 simulations (  0.11 %); accumulated&#58;  95.06 %.
&#91;...&#93;
From  471000 to  471999 games&#58;     11 simulations (  0.02 %); accumulated&#58;  98.99 %.
From  472000 to  472999 games&#58;     10 simulations (  0.02 %); accumulated&#58;  99.01 %.
&#91;...&#93;
From  502000 to  502999 games&#58;      8 simulations (  0.02 %); accumulated&#58;  99.49 %.
From  503000 to  503999 games&#58;      8 simulations (  0.02 %); accumulated&#58;  99.50 %.
From  504000 to  504999 games&#58;      4 simulations (  0.01 %); accumulated&#58;  99.51 %.
&#91;...&#93;
From  511000 to  511999 games&#58;      7 simulations (  0.01 %); accumulated&#58;  99.59 %.
&#91;...&#93;
From  568000 to  568999 games&#58;      2 simulations (  0.00 %); accumulated&#58;  99.89 %.
From  569000 to  569999 games&#58;      1 simulation  (  0.00 %); accumulated&#58;  99.90 %.
From  570000 to  570999 games&#58;      4 simulations (  0.01 %); accumulated&#58;  99.90 %.
&#91;...&#93;
From  732000 to  732999 games&#58;      0 simulations (  0.00 %); accumulated&#58;  99.99 %.
From  733000 to  733999 games&#58;      1 simulation  (  0.00 %); accumulated&#58; 100.00 %.
From  734000 to  734999 games&#58;      1 simulation  (  0.00 %); accumulated&#58; 100.00 %.
From  735000 to  735999 games&#58;      0 simulations (  0.00 %); accumulated&#58; 100.00 %.
&#91;...&#93;
From  773000 to  773999 games&#58;      0 simulations (  0.00 %); accumulated&#58; 100.00 %.
From  774000 to  774999 games&#58;      0 simulations (  0.00 %); accumulated&#58; 100.00 %.
From  775000 to  775999 games&#58;      1 simulation  (  0.00 %); accumulated&#58; 100.00 %.
With my input parameters: median - starting point = 17127 games. The end is probably near.

Regards from Spain.

Ajedrecista.
The expected number of games is based on some assumptions that I think do not exist in the games.

The result of the games are not independent variables assuming both programs play white and black from the same position.

I also think that nobody replied question 2 that I ask.
Note that b is not constant in question 2 and the idea is that you stop the test if you pass SPRT(0,b) for some 0<b<=6
or if the test fail SPRT(0,b) for some 0<b<=6:

You can calculate after every game if there is b<=6 when the test pass SPRT(0,b) and if there is b<=6 when the test fail SPRT(0,b)

This is clearly different test than normal SPRT and the expected number of games is clearly less than SPRT(0,6) because it is possible that after a lot of games SPRT(0,6) is not decided but the test passed SPRT(0,5).

The question is what is the price that you pay for it.

Zenmastur
Posts: 272
Joined: Sat May 31, 2014 6:28 am

Re: SPRT questions.

Post by Zenmastur » Tue May 19, 2015 2:56 am

You should direct this question to Kai Laskos as he has solved the equations analytically that govern this behavior. He is working on a model that can be used to predict various aspect including changing the ELO bounds.

Regards,

Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.

Post Reply