SPRT questions
Moderators: hgm, Rebel, chrisw
-
- Posts: 10301
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
SPRT questions
one result from the stockfish framework
LLR: 0.72 (-2.94,2.94) [0.00,4.50]
Total: 233707 W: 38339 L: 37483 D: 157885
1)Is there a simple calculator to check if this result is enough to pass
[0.00 b] for some smaller bound than 4.50?
2)What happens if people make a test that simply stop if you pass SPRT
with bounds [0,b] for some positive bound for b<=6 and fail if you fail SPRT with bounds [0,b] for some positive bound for b<=6.
Can people calculate what is the theoretical probability for a regression of 1 elo to pass the test and what is the probability of 0 elo to pass the test?
Same also for the expected number of games and what is the worst case.
People claim that SPRT is the best but it is not clear for what
and in the example of Lucas's patch it is not clear to me if it is not simply better to accept the patch of Lucas and stop the test.
What is the important additional information that we get if the test pass or fail.
We may not know that the patch is positive with 95% confidence but I think that we can know that there is even not 0.5 elo regression with 95% confidence for every possible result and at the same time it is possible that we have 1.5 elo improvement for every possible result.
3)Did somebody test if the worst case really behave in the way that SPRT expects?
A possible way is to test the program against itself(so we know 0 elo is the right result) with SPRT(-b,b) many times to see if the distribution of the length of games is really the distribution that theory expect.
I suspect that the worse case is practically worse at least if both programs play the same opening with white and black every time
and I see no reason not to do it because not doing it increase the variety of the result and it is better to reduce the variety of the result(when of course using SPRT is not the correct way to continue because the assumption of independent results is not correct)
LLR: 0.72 (-2.94,2.94) [0.00,4.50]
Total: 233707 W: 38339 L: 37483 D: 157885
1)Is there a simple calculator to check if this result is enough to pass
[0.00 b] for some smaller bound than 4.50?
2)What happens if people make a test that simply stop if you pass SPRT
with bounds [0,b] for some positive bound for b<=6 and fail if you fail SPRT with bounds [0,b] for some positive bound for b<=6.
Can people calculate what is the theoretical probability for a regression of 1 elo to pass the test and what is the probability of 0 elo to pass the test?
Same also for the expected number of games and what is the worst case.
People claim that SPRT is the best but it is not clear for what
and in the example of Lucas's patch it is not clear to me if it is not simply better to accept the patch of Lucas and stop the test.
What is the important additional information that we get if the test pass or fail.
We may not know that the patch is positive with 95% confidence but I think that we can know that there is even not 0.5 elo regression with 95% confidence for every possible result and at the same time it is possible that we have 1.5 elo improvement for every possible result.
3)Did somebody test if the worst case really behave in the way that SPRT expects?
A possible way is to test the program against itself(so we know 0 elo is the right result) with SPRT(-b,b) many times to see if the distribution of the length of games is really the distribution that theory expect.
I suspect that the worse case is practically worse at least if both programs play the same opening with white and black every time
and I see no reason not to do it because not doing it increase the variety of the result and it is better to reduce the variety of the result(when of course using SPRT is not the correct way to continue because the assumption of independent results is not correct)
-
- Posts: 919
- Joined: Sat May 31, 2014 8:28 am
Re: SPRT questions
Kai Laskos is working on these problems. See the thread labeled "
Maximum ELO gain per test game played?" I would link it but I don't know how.
Apparently the model set-up is more tedious than he anticipated, but I believe he has the equations in hand.
See his last 4 or 5 posts.
Regards,
Zen
Maximum ELO gain per test game played?" I would link it but I don't know how.
Apparently the model set-up is more tedious than he anticipated, but I believe he has the equations in hand.
See his last 4 or 5 posts.
Regards,
Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
-
- Posts: 1971
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: SPRT questions.
Hello Uri:
I will try to answer some of your questions in a random order:
------------------------
The worst case (the largest expected number of games) is when Bayeselo = (Bayeselo_0 + Bayeselo_1)/2 in a SPRT(Bayeselo_0, Bayeselo_1) case (Bayeselo_0 < Bayeselo_1 always). Michel's script has an special output: Elo gain [logistic Elo, i.e. Elo] and SPRT bounds [Bayeselo].
------------------------
In SF testing framework: alpha = 0.05 = beta in SPRT. So only changing Bayeselo_1:
In this case: 3.8042 < b_critical < 3.8043.
If I assume a fixed Bayeselo gain of 2.341 Bayeselo and a drawelo parameter of 285.2261 (computed from the sample of 233707 games), I ran 10000 simulations with a modified version of my SPRT simulator, starting with {wins, loses, draws} = {38339, 37483, 157885} instead of {0, 0, 0}. Here is a summary of my results with SPRT(0, 4.5):
------------------------
I let you to decide if the distribution is what you expect or not.
------------------------
Sorry for the length of this post. I hope no typos.
Regards from Spain.
Ajedrecista.
I will try to answer some of your questions in a random order:
------------------------
Yes, it is possible thanks to Michel's Python script sprta.py (he also made a C file so you can compile it and build an executable). Please take a look here for more details. It returns the probability of pass and the expected number of games.Uri Blass wrote:Can people calculate what is the theoretical probability for a regression of 1 elo to pass the test and what is the probability of 0 elo to pass the test?
Same also for the expected number of games and what is the worst case.
The worst case (the largest expected number of games) is when Bayeselo = (Bayeselo_0 + Bayeselo_1)/2 in a SPRT(Bayeselo_0, Bayeselo_1) case (Bayeselo_0 < Bayeselo_1 always). Michel's script has an special output: Elo gain [logistic Elo, i.e. Elo] and SPRT bounds [Bayeselo].
------------------------
I wrote a LLR calculator long time ago. But you must keep in mind that SPRT is sequential so the history of the games counts (it is not the same WWWLLL than WLWLWL). The answer to your question is yes, it exists such calculator.Uri Blass wrote:one result from the stockfish framework
LLR: 0.72 (-2.94,2.94) [0.00,4.50]
Total: 233707 W: 38339 L: 37483 D: 157885
1)Is there a simple calculator to check if this result is enough to pass
[0.00 b] for some smaller bound than 4.50?
In SF testing framework: alpha = 0.05 = beta in SPRT. So only changing Bayeselo_1:
Code: Select all
Lower bound for LLR: -2.9444
Upper bound for LLR: 2.9444
----------------------------
Games: 233707
Wins: 38339 (16.40 %).
Loses: 37483 (16.04 %).
Draws: 157885 (67.56 %).
bayeselo: 2.3410
drawelo: 285.2261
----------------------------
LLR[SPRT(0, 4.5)] ~ 0.7222
LLR[SPRT(0, 4.4)] ~ 1.0941
LLR[SPRT(0, 4.3)] ~ 1.4484
LLR[SPRT(0, 4.2)] ~ 1.7851
LLR[SPRT(0, 4.1)] ~ 2.1041
LLR[SPRT(0, 4)] ~ 2.4055
LLR[SPRT(0, 3.9)] ~ 2.6892
LLR[SPRT(0, 3.8)] ~ 2.9553
LLR[SPRT(0, 3.7)] ~ 3.2038
LLR[SPRT(0, 3.6)] ~ 3.4346
LLR[SPRT(0, 3.5)] ~ 3.6478
Interpolating b with Regula-Falsi method, then let the LLR calculator to compute LLR:
http://en.wikipedia.org/wiki/False_position_method]Regula-Falsi method
(And rounding all the inputs and outputs to 1e-4):
LLR[SPRT(0, 3.8041)] ~ 2.9447
LLR[SPRT(0, 3.8042)] ~ 2.9445
LLR[SPRT(0, 3.8043)] ~ 2.9442
If I assume a fixed Bayeselo gain of 2.341 Bayeselo and a drawelo parameter of 285.2261 (computed from the sample of 233707 games), I ran 10000 simulations with a modified version of my SPRT simulator, starting with {wins, loses, draws} = {38339, 37483, 157885} instead of {0, 0, 0}. Here is a summary of my results with SPRT(0, 4.5):
Code: Select all
[...]
9996/ 10000 Passes: 6556 Fails: 3440 <Games>/simulation: 286641
9997/ 10000 Passes: 6556 Fails: 3441 <Games>/simulation: 286643
9998/ 10000 Passes: 6556 Fails: 3442 <Games>/simulation: 286647
9999/ 10000 Passes: 6557 Fails: 3442 <Games>/simulation: 286656
10000/ 10000 Passes: 6558 Fails: 3442 <Games>/simulation: 286658
Shortest simulation: 235136 games (simulation 6661): +38630 -37666 =158840.
Longest simulation: 679262 games (simulation 5685): +111204 -108951 =459107.
Average number of games per simulation: 286658
Median of the distribution: 272932 (+44827 -43730 =184375).
There are 3442 simulations with score > 50% that failed SPRT.
There are 0 simulations with score = 50% that failed SPRT.
Distribution of the length of simulations:
From 235000 to 235999 games: 3 simulations ( 0.03 %); accumulated: 0.03 %.
From 236000 to 236999 games: 14 simulations ( 0.14 %); accumulated: 0.17 %.
From 237000 to 237999 games: 48 simulations ( 0.48 %); accumulated: 0.65 %.
[...]
From 243000 to 243999 games: 176 simulations ( 1.76 %); accumulated: 8.47 %.
From 244000 to 244999 games: 176 simulations ( 1.76 %); accumulated: 10.23 %.
[...]
From 249000 to 249999 games: 160 simulations ( 1.60 %); accumulated: 18.65 %.
From 250000 to 250999 games: 179 simulations ( 1.79 %); accumulated: 20.44 %.
[...]
From 256000 to 256999 games: 157 simulations ( 1.57 %); accumulated: 29.87 %.
From 257000 to 257999 games: 126 simulations ( 1.26 %); accumulated: 31.13 %.
[...]
From 263000 to 263999 games: 133 simulations ( 1.33 %); accumulated: 39.57 %.
From 264000 to 264999 games: 132 simulations ( 1.32 %); accumulated: 40.89 %.
[...]
From 271000 to 271999 games: 133 simulations ( 1.33 %); accumulated: 48.94 %.
From 272000 to 272999 games: 113 simulations ( 1.13 %); accumulated: 50.07 %.
[...]
From 282000 to 282999 games: 76 simulations ( 0.76 %); accumulated: 59.62 %.
From 283000 to 283999 games: 93 simulations ( 0.93 %); accumulated: 60.55 %.
[...]
From 296000 to 296999 games: 55 simulations ( 0.55 %); accumulated: 69.93 %.
From 297000 to 297999 games: 66 simulations ( 0.66 %); accumulated: 70.59 %.
[...]
From 314000 to 314999 games: 37 simulations ( 0.37 %); accumulated: 79.85 %.
From 315000 to 315999 games: 45 simulations ( 0.45 %); accumulated: 80.30 %.
[...]
From 344000 to 344999 games: 18 simulations ( 0.18 %); accumulated: 89.97 %.
From 345000 to 345999 games: 22 simulations ( 0.22 %); accumulated: 90.19 %.
[...]
From 618000 to 618999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 619000 to 619999 games: 1 simulation ( 0.01 %); accumulated: 99.99 %.
From 620000 to 620999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %.
[...]
From 678000 to 678999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %.
From 679000 to 679999 games: 1 simulation ( 0.01 %); accumulated: 100.00 %.
I ran an example SPRT(-3, 3) (a Bayeselo span of 3 - (-3) = 6 Bayeselo) with an expected gain of 0 Elo. I randomly choosed an a priori drawelo parameter of 240. With alpha = 0.05 = beta and 10000 simulations again:Uri Blass wrote:3)Did somebody test if the worst case really behave in the way that SPRT expects?
A possible way is to test the program against itself(so we know 0 elo is the right result) with SPRT(-b,b) many times to see if the distribution of the length of games is really the distribution that theory expect.
Code: Select all
In theory: passes = fails = 50%.
My results:
10000/ 10000 Passes: 5010 Fails: 4990 <Games>/simulation: 28899
Shortest simulation: 1823 games (simulation 4959): +325 -433 =1065.
Longest simulation: 241322 games (simulation 1943): +48511 -48618 =144193.
Average number of games per simulation: 28899
Median of the distribution: 21708 (+4505 -4397 =12806).
Distribution of the length of simulations:
From 1000 to 1999 games: 3 simulations ( 0.03 %); accumulated: 0.03 %.
From 2000 to 2999 games: 34 simulations ( 0.34 %); accumulated: 0.37 %.
From 3000 to 3999 games: 111 simulations ( 1.11 %); accumulated: 1.48 %.
From 4000 to 4999 games: 84 simulations ( 0.84 %); accumulated: 2.32 %.
From 5000 to 5999 games: 300 simulations ( 3.00 %); accumulated: 5.32 %.
From 6000 to 6999 games: 135 simulations ( 1.35 %); accumulated: 6.67 %.
From 7000 to 7999 games: 345 simulations ( 3.45 %); accumulated: 10.12 %.
From 8000 to 8999 games: 383 simulations ( 3.83 %); accumulated: 13.95 %.
From 9000 to 9999 games: 354 simulations ( 3.54 %); accumulated: 17.49 %.
From 10000 to 10999 games: 322 simulations ( 3.22 %); accumulated: 20.71 %.
From 11000 to 11999 games: 286 simulations ( 2.86 %); accumulated: 23.57 %.
From 12000 to 12999 games: 307 simulations ( 3.07 %); accumulated: 26.64 %.
From 13000 to 13999 games: 306 simulations ( 3.06 %); accumulated: 29.70 %.
From 14000 to 14999 games: 277 simulations ( 2.77 %); accumulated: 32.47 %.
From 15000 to 15999 games: 312 simulations ( 3.12 %); accumulated: 35.59 %.
From 16000 to 16999 games: 279 simulations ( 2.79 %); accumulated: 38.38 %.
From 17000 to 17999 games: 281 simulations ( 2.81 %); accumulated: 41.19 %.
From 18000 to 18999 games: 244 simulations ( 2.44 %); accumulated: 43.63 %.
From 19000 to 19999 games: 219 simulations ( 2.19 %); accumulated: 45.82 %.
From 20000 to 20999 games: 243 simulations ( 2.43 %); accumulated: 48.25 %.
From 21000 to 21999 games: 242 simulations ( 2.42 %); accumulated: 50.67 %.
From 22000 to 22999 games: 187 simulations ( 1.87 %); accumulated: 52.54 %.
From 23000 to 23999 games: 213 simulations ( 2.13 %); accumulated: 54.67 %.
From 24000 to 24999 games: 203 simulations ( 2.03 %); accumulated: 56.70 %.
From 25000 to 25999 games: 199 simulations ( 1.99 %); accumulated: 58.69 %.
From 26000 to 26999 games: 176 simulations ( 1.76 %); accumulated: 60.45 %.
From 27000 to 27999 games: 131 simulations ( 1.31 %); accumulated: 61.76 %.
From 28000 to 28999 games: 177 simulations ( 1.77 %); accumulated: 63.53 %.
From 29000 to 29999 games: 154 simulations ( 1.54 %); accumulated: 65.07 %.
From 30000 to 30999 games: 153 simulations ( 1.53 %); accumulated: 66.60 %.
From 31000 to 31999 games: 134 simulations ( 1.34 %); accumulated: 67.94 %.
From 32000 to 32999 games: 137 simulations ( 1.37 %); accumulated: 69.31 %.
From 33000 to 33999 games: 131 simulations ( 1.31 %); accumulated: 70.62 %.
From 34000 to 34999 games: 139 simulations ( 1.39 %); accumulated: 72.01 %.
From 35000 to 35999 games: 111 simulations ( 1.11 %); accumulated: 73.12 %.
From 36000 to 36999 games: 122 simulations ( 1.22 %); accumulated: 74.34 %.
From 37000 to 37999 games: 94 simulations ( 0.94 %); accumulated: 75.28 %.
From 38000 to 38999 games: 85 simulations ( 0.85 %); accumulated: 76.13 %.
From 39000 to 39999 games: 104 simulations ( 1.04 %); accumulated: 77.17 %.
From 40000 to 40999 games: 94 simulations ( 0.94 %); accumulated: 78.11 %.
From 41000 to 41999 games: 84 simulations ( 0.84 %); accumulated: 78.95 %.
From 42000 to 42999 games: 95 simulations ( 0.95 %); accumulated: 79.90 %.
From 43000 to 43999 games: 84 simulations ( 0.84 %); accumulated: 80.74 %.
From 44000 to 44999 games: 88 simulations ( 0.88 %); accumulated: 81.62 %.
From 45000 to 45999 games: 74 simulations ( 0.74 %); accumulated: 82.36 %.
From 46000 to 46999 games: 82 simulations ( 0.82 %); accumulated: 83.18 %.
From 47000 to 47999 games: 62 simulations ( 0.62 %); accumulated: 83.80 %.
From 48000 to 48999 games: 70 simulations ( 0.70 %); accumulated: 84.50 %.
From 49000 to 49999 games: 78 simulations ( 0.78 %); accumulated: 85.28 %.
From 50000 to 50999 games: 49 simulations ( 0.49 %); accumulated: 85.77 %.
From 51000 to 51999 games: 66 simulations ( 0.66 %); accumulated: 86.43 %.
From 52000 to 52999 games: 69 simulations ( 0.69 %); accumulated: 87.12 %.
From 53000 to 53999 games: 46 simulations ( 0.46 %); accumulated: 87.58 %.
From 54000 to 54999 games: 57 simulations ( 0.57 %); accumulated: 88.15 %.
From 55000 to 55999 games: 50 simulations ( 0.50 %); accumulated: 88.65 %.
From 56000 to 56999 games: 38 simulations ( 0.38 %); accumulated: 89.03 %.
From 57000 to 57999 games: 49 simulations ( 0.49 %); accumulated: 89.52 %.
From 58000 to 58999 games: 42 simulations ( 0.42 %); accumulated: 89.94 %.
From 59000 to 59999 games: 38 simulations ( 0.38 %); accumulated: 90.32 %.
From 60000 to 60999 games: 46 simulations ( 0.46 %); accumulated: 90.78 %.
From 61000 to 61999 games: 38 simulations ( 0.38 %); accumulated: 91.16 %.
From 62000 to 62999 games: 38 simulations ( 0.38 %); accumulated: 91.54 %.
From 63000 to 63999 games: 29 simulations ( 0.29 %); accumulated: 91.83 %.
From 64000 to 64999 games: 33 simulations ( 0.33 %); accumulated: 92.16 %.
From 65000 to 65999 games: 32 simulations ( 0.32 %); accumulated: 92.48 %.
From 66000 to 66999 games: 21 simulations ( 0.21 %); accumulated: 92.69 %.
From 67000 to 67999 games: 27 simulations ( 0.27 %); accumulated: 92.96 %.
From 68000 to 68999 games: 34 simulations ( 0.34 %); accumulated: 93.30 %.
From 69000 to 69999 games: 24 simulations ( 0.24 %); accumulated: 93.54 %.
From 70000 to 70999 games: 34 simulations ( 0.34 %); accumulated: 93.88 %.
From 71000 to 71999 games: 21 simulations ( 0.21 %); accumulated: 94.09 %.
From 72000 to 72999 games: 22 simulations ( 0.22 %); accumulated: 94.31 %.
From 73000 to 73999 games: 18 simulations ( 0.18 %); accumulated: 94.49 %.
From 74000 to 74999 games: 14 simulations ( 0.14 %); accumulated: 94.63 %.
From 75000 to 75999 games: 27 simulations ( 0.27 %); accumulated: 94.90 %.
From 76000 to 76999 games: 19 simulations ( 0.19 %); accumulated: 95.09 %.
From 77000 to 77999 games: 16 simulations ( 0.16 %); accumulated: 95.25 %.
From 78000 to 78999 games: 23 simulations ( 0.23 %); accumulated: 95.48 %.
From 79000 to 79999 games: 17 simulations ( 0.17 %); accumulated: 95.65 %.
From 80000 to 80999 games: 20 simulations ( 0.20 %); accumulated: 95.85 %.
From 81000 to 81999 games: 22 simulations ( 0.22 %); accumulated: 96.07 %.
From 82000 to 82999 games: 13 simulations ( 0.13 %); accumulated: 96.20 %.
From 83000 to 83999 games: 22 simulations ( 0.22 %); accumulated: 96.42 %.
From 84000 to 84999 games: 10 simulations ( 0.10 %); accumulated: 96.52 %.
From 85000 to 85999 games: 10 simulations ( 0.10 %); accumulated: 96.62 %.
From 86000 to 86999 games: 16 simulations ( 0.16 %); accumulated: 96.78 %.
From 87000 to 87999 games: 16 simulations ( 0.16 %); accumulated: 96.94 %.
From 88000 to 88999 games: 14 simulations ( 0.14 %); accumulated: 97.08 %.
From 89000 to 89999 games: 15 simulations ( 0.15 %); accumulated: 97.23 %.
From 90000 to 90999 games: 13 simulations ( 0.13 %); accumulated: 97.36 %.
From 91000 to 91999 games: 9 simulations ( 0.09 %); accumulated: 97.45 %.
From 92000 to 92999 games: 12 simulations ( 0.12 %); accumulated: 97.57 %.
From 93000 to 93999 games: 10 simulations ( 0.10 %); accumulated: 97.67 %.
From 94000 to 94999 games: 8 simulations ( 0.08 %); accumulated: 97.75 %.
From 95000 to 95999 games: 11 simulations ( 0.11 %); accumulated: 97.86 %.
From 96000 to 96999 games: 6 simulations ( 0.06 %); accumulated: 97.92 %.
From 97000 to 97999 games: 5 simulations ( 0.05 %); accumulated: 97.97 %.
From 98000 to 98999 games: 9 simulations ( 0.09 %); accumulated: 98.06 %.
From 99000 to 99999 games: 8 simulations ( 0.08 %); accumulated: 98.14 %.
From 100000 to 100999 games: 4 simulations ( 0.04 %); accumulated: 98.18 %.
From 101000 to 101999 games: 12 simulations ( 0.12 %); accumulated: 98.30 %.
From 102000 to 102999 games: 9 simulations ( 0.09 %); accumulated: 98.39 %.
From 103000 to 103999 games: 8 simulations ( 0.08 %); accumulated: 98.47 %.
From 104000 to 104999 games: 9 simulations ( 0.09 %); accumulated: 98.56 %.
From 105000 to 105999 games: 8 simulations ( 0.08 %); accumulated: 98.64 %.
From 106000 to 106999 games: 6 simulations ( 0.06 %); accumulated: 98.70 %.
From 107000 to 107999 games: 6 simulations ( 0.06 %); accumulated: 98.76 %.
From 108000 to 108999 games: 5 simulations ( 0.05 %); accumulated: 98.81 %.
From 109000 to 109999 games: 3 simulations ( 0.03 %); accumulated: 98.84 %.
From 110000 to 110999 games: 4 simulations ( 0.04 %); accumulated: 98.88 %.
From 111000 to 111999 games: 4 simulations ( 0.04 %); accumulated: 98.92 %.
From 112000 to 112999 games: 2 simulations ( 0.02 %); accumulated: 98.94 %.
From 113000 to 113999 games: 4 simulations ( 0.04 %); accumulated: 98.98 %.
From 114000 to 114999 games: 4 simulations ( 0.04 %); accumulated: 99.02 %.
From 115000 to 115999 games: 7 simulations ( 0.07 %); accumulated: 99.09 %.
From 116000 to 116999 games: 3 simulations ( 0.03 %); accumulated: 99.12 %.
From 117000 to 117999 games: 5 simulations ( 0.05 %); accumulated: 99.17 %.
From 118000 to 118999 games: 3 simulations ( 0.03 %); accumulated: 99.20 %.
From 119000 to 119999 games: 1 simulation ( 0.01 %); accumulated: 99.21 %.
From 120000 to 120999 games: 5 simulations ( 0.05 %); accumulated: 99.26 %.
From 121000 to 121999 games: 4 simulations ( 0.04 %); accumulated: 99.30 %.
From 122000 to 122999 games: 2 simulations ( 0.02 %); accumulated: 99.32 %.
From 123000 to 123999 games: 1 simulation ( 0.01 %); accumulated: 99.33 %.
From 124000 to 124999 games: 3 simulations ( 0.03 %); accumulated: 99.36 %.
From 125000 to 125999 games: 0 simulations ( 0.00 %); accumulated: 99.36 %.
From 126000 to 126999 games: 2 simulations ( 0.02 %); accumulated: 99.38 %.
From 127000 to 127999 games: 4 simulations ( 0.04 %); accumulated: 99.42 %.
From 128000 to 128999 games: 1 simulation ( 0.01 %); accumulated: 99.43 %.
From 129000 to 129999 games: 1 simulation ( 0.01 %); accumulated: 99.44 %.
From 130000 to 130999 games: 0 simulations ( 0.00 %); accumulated: 99.44 %.
From 131000 to 131999 games: 1 simulation ( 0.01 %); accumulated: 99.45 %.
From 132000 to 132999 games: 2 simulations ( 0.02 %); accumulated: 99.47 %.
From 133000 to 133999 games: 1 simulation ( 0.01 %); accumulated: 99.48 %.
From 134000 to 134999 games: 3 simulations ( 0.03 %); accumulated: 99.51 %.
From 135000 to 135999 games: 0 simulations ( 0.00 %); accumulated: 99.51 %.
From 136000 to 136999 games: 1 simulation ( 0.01 %); accumulated: 99.52 %.
From 137000 to 137999 games: 3 simulations ( 0.03 %); accumulated: 99.55 %.
From 138000 to 138999 games: 0 simulations ( 0.00 %); accumulated: 99.55 %.
From 139000 to 139999 games: 1 simulation ( 0.01 %); accumulated: 99.56 %.
From 140000 to 140999 games: 3 simulations ( 0.03 %); accumulated: 99.59 %.
From 141000 to 141999 games: 0 simulations ( 0.00 %); accumulated: 99.59 %.
From 142000 to 142999 games: 0 simulations ( 0.00 %); accumulated: 99.59 %.
From 143000 to 143999 games: 1 simulation ( 0.01 %); accumulated: 99.60 %.
From 144000 to 144999 games: 2 simulations ( 0.02 %); accumulated: 99.62 %.
From 145000 to 145999 games: 0 simulations ( 0.00 %); accumulated: 99.62 %.
From 146000 to 146999 games: 0 simulations ( 0.00 %); accumulated: 99.62 %.
From 147000 to 147999 games: 1 simulation ( 0.01 %); accumulated: 99.63 %.
From 148000 to 148999 games: 3 simulations ( 0.03 %); accumulated: 99.66 %.
From 149000 to 149999 games: 2 simulations ( 0.02 %); accumulated: 99.68 %.
From 150000 to 150999 games: 1 simulation ( 0.01 %); accumulated: 99.69 %.
From 151000 to 151999 games: 2 simulations ( 0.02 %); accumulated: 99.71 %.
From 152000 to 152999 games: 2 simulations ( 0.02 %); accumulated: 99.73 %.
From 153000 to 153999 games: 1 simulation ( 0.01 %); accumulated: 99.74 %.
From 154000 to 154999 games: 1 simulation ( 0.01 %); accumulated: 99.75 %.
From 155000 to 155999 games: 0 simulations ( 0.00 %); accumulated: 99.75 %.
From 156000 to 156999 games: 3 simulations ( 0.03 %); accumulated: 99.78 %.
From 157000 to 157999 games: 1 simulation ( 0.01 %); accumulated: 99.79 %.
From 158000 to 158999 games: 2 simulations ( 0.02 %); accumulated: 99.81 %.
From 159000 to 159999 games: 2 simulations ( 0.02 %); accumulated: 99.83 %.
From 160000 to 160999 games: 1 simulation ( 0.01 %); accumulated: 99.84 %.
From 161000 to 161999 games: 1 simulation ( 0.01 %); accumulated: 99.85 %.
From 162000 to 162999 games: 0 simulations ( 0.00 %); accumulated: 99.85 %.
From 163000 to 163999 games: 1 simulation ( 0.01 %); accumulated: 99.86 %.
From 164000 to 164999 games: 1 simulation ( 0.01 %); accumulated: 99.87 %.
From 165000 to 165999 games: 0 simulations ( 0.00 %); accumulated: 99.87 %.
From 166000 to 166999 games: 2 simulations ( 0.02 %); accumulated: 99.89 %.
From 167000 to 167999 games: 1 simulation ( 0.01 %); accumulated: 99.90 %.
From 168000 to 168999 games: 0 simulations ( 0.00 %); accumulated: 99.90 %.
From 169000 to 169999 games: 0 simulations ( 0.00 %); accumulated: 99.90 %.
From 170000 to 170999 games: 0 simulations ( 0.00 %); accumulated: 99.90 %.
From 171000 to 171999 games: 0 simulations ( 0.00 %); accumulated: 99.90 %.
From 172000 to 172999 games: 0 simulations ( 0.00 %); accumulated: 99.90 %.
From 173000 to 173999 games: 1 simulation ( 0.01 %); accumulated: 99.91 %.
From 174000 to 174999 games: 0 simulations ( 0.00 %); accumulated: 99.91 %.
From 175000 to 175999 games: 0 simulations ( 0.00 %); accumulated: 99.91 %.
From 176000 to 176999 games: 0 simulations ( 0.00 %); accumulated: 99.91 %.
From 177000 to 177999 games: 0 simulations ( 0.00 %); accumulated: 99.91 %.
From 178000 to 178999 games: 0 simulations ( 0.00 %); accumulated: 99.91 %.
From 179000 to 179999 games: 0 simulations ( 0.00 %); accumulated: 99.91 %.
From 180000 to 180999 games: 1 simulation ( 0.01 %); accumulated: 99.92 %.
From 181000 to 181999 games: 1 simulation ( 0.01 %); accumulated: 99.93 %.
From 182000 to 182999 games: 0 simulations ( 0.00 %); accumulated: 99.93 %.
From 183000 to 183999 games: 0 simulations ( 0.00 %); accumulated: 99.93 %.
From 184000 to 184999 games: 0 simulations ( 0.00 %); accumulated: 99.93 %.
From 185000 to 185999 games: 0 simulations ( 0.00 %); accumulated: 99.93 %.
From 186000 to 186999 games: 0 simulations ( 0.00 %); accumulated: 99.93 %.
From 187000 to 187999 games: 1 simulation ( 0.01 %); accumulated: 99.94 %.
From 188000 to 188999 games: 1 simulation ( 0.01 %); accumulated: 99.95 %.
From 189000 to 189999 games: 0 simulations ( 0.00 %); accumulated: 99.95 %.
From 190000 to 190999 games: 0 simulations ( 0.00 %); accumulated: 99.95 %.
From 191000 to 191999 games: 0 simulations ( 0.00 %); accumulated: 99.95 %.
From 192000 to 192999 games: 0 simulations ( 0.00 %); accumulated: 99.95 %.
From 193000 to 193999 games: 2 simulations ( 0.02 %); accumulated: 99.97 %.
From 194000 to 194999 games: 0 simulations ( 0.00 %); accumulated: 99.97 %.
From 195000 to 195999 games: 1 simulation ( 0.01 %); accumulated: 99.98 %.
From 196000 to 196999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 197000 to 197999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 198000 to 198999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 199000 to 199999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 200000 to 200999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 201000 to 201999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 202000 to 202999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 203000 to 203999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 204000 to 204999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 205000 to 205999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 206000 to 206999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 207000 to 207999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 208000 to 208999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 209000 to 209999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 210000 to 210999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 211000 to 211999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 212000 to 212999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 213000 to 213999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 214000 to 214999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 215000 to 215999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 216000 to 216999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 217000 to 217999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 218000 to 218999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 219000 to 219999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 220000 to 220999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 221000 to 221999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 222000 to 222999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 223000 to 223999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 224000 to 224999 games: 0 simulations ( 0.00 %); accumulated: 99.98 %.
From 225000 to 225999 games: 1 simulation ( 0.01 %); accumulated: 99.99 %.
From 226000 to 226999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %.
From 227000 to 227999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %.
From 228000 to 228999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %.
From 229000 to 229999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %.
From 230000 to 230999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %.
From 231000 to 231999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %.
From 232000 to 232999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %.
From 233000 to 233999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %.
From 234000 to 234999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %.
From 235000 to 235999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %.
From 236000 to 236999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %.
From 237000 to 237999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %.
From 238000 to 238999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %.
From 239000 to 239999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %.
From 240000 to 240999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %.
From 241000 to 241999 games: 1 simulation ( 0.01 %); accumulated: 100.00 %.
------------------------
Sorry for the length of this post. I hope no typos.
Regards from Spain.
Ajedrecista.
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: SPRT questions
You can't change the condition of the test after looking at the results. Arguing that the test would have passed an SPRT(0,X) where X is conviniently chosen ex-post is not serious.
The real problem is that there is only a finite (small) number of openings, and eventually we are just repeating the same games over and over, in shuffled order. I'm not sure playing another 250k games really adds any information. Perhaps it makes sense to stop the test and: (i) commit the patch (ii) toss a coin to decide (iii) or stop the test and not commit.
I'll let Joona decide.
Here are the numbers for SPRT(0,4.5):
You can see that we're already slightly above the worst 99% quantile. For example, if the true elo value (which we don't know) is 1.25, then there is 1% chance that the run time is 224,661 or more...
So the Gods of Randomness really are against me
SPRT simulator (multi-threaded, very fast)
https://github.com/lucasart/sprt
The real problem is that there is only a finite (small) number of openings, and eventually we are just repeating the same games over and over, in shuffled order. I'm not sure playing another 250k games really adds any information. Perhaps it makes sense to stop the test and: (i) commit the patch (ii) toss a coin to decide (iii) or stop the test and not commit.
I'll let Joona decide.
Here are the numbers for SPRT(0,4.5):
Code: Select all
$ ./sprt 0 3 0.25 50000 290 0 4.5
Elo BayesElo %Pass Avg run Q50% Q90% Q95% Q99%
0.00 0.00 0.0499 35203 28057 68311 85363 125770
0.25 0.47 0.0896 40543 31742 80313 101045 149184
0.50 0.94 0.1525 46648 36131 93453 118239 175859
0.75 1.41 0.2494 52208 39939 106370 135075 201357
1.00 1.87 0.3793 56441 42757 116390 147578 221303
1.25 2.34 0.5290 57315 43273 118091 150719 224661
1.50 2.81 0.6744 55121 41618 113910 144474 215451
1.75 3.28 0.7919 50263 38675 101996 129430 194491
2.00 3.75 0.8738 44203 34187 88400 111375 167143
2.25 4.22 0.9276 38438 30177 75382 94763 139902
2.50 4.69 0.9593 33373 26715 64270 80319 118185
2.75 5.15 0.9777 29056 23711 54670 67578 99416
3.00 5.62 0.9881 25580 21264 47230 58010 82981
So the Gods of Randomness really are against me
SPRT simulator (multi-threaded, very fast)
https://github.com/lucasart/sprt
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 1971
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: SPRT questions.
Hello Lucas:
Distribution of the length of simulations:
Our results do not contradict. There are small differences, as expected.
I must compile your sources to take benefit from multi-thread.
Comparing your results with Michel's script:
Your results are very, very similar... even more than mine. Anyone still doubt it?
Regards from Spain.
Ajedrecista.
I ran SPRT(0, 4.5) with Elo = 0 just to compare our results. After 50000 simulations:lucasart wrote:Here are the numbers for SPRT(0,4.5):
Code: Select all
$ ./sprt 0 3 0.25 50000 290 0 4.5 Elo BayesElo %Pass Avg run Q50% Q90% Q95% Q99% 0.00 0.00 0.0499 35203 28057 68311 85363 125770 0.25 0.47 0.0896 40543 31742 80313 101045 149184 0.50 0.94 0.1525 46648 36131 93453 118239 175859 0.75 1.41 0.2494 52208 39939 106370 135075 201357 1.00 1.87 0.3793 56441 42757 116390 147578 221303 1.25 2.34 0.5290 57315 43273 118091 150719 224661 1.50 2.81 0.6744 55121 41618 113910 144474 215451 1.75 3.28 0.7919 50263 38675 101996 129430 194491 2.00 3.75 0.8738 44203 34187 88400 111375 167143 2.25 4.22 0.9276 38438 30177 75382 94763 139902 2.50 4.69 0.9593 33373 26715 64270 80319 118185 2.75 5.15 0.9777 29056 23711 54670 67578 99416 3.00 5.62 0.9881 25580 21264 47230 58010 82981
Code: Select all
[...]
50000/ 50000 Passes: 2516 Fails: 47484 <Games>/simulation: 35718
Shortest simulation: 2330 games (simulation 27400).
Longest simulation: 313906 games (simulation 30444).
Average number of games per simulation: 35718
Median of the distribution: 28462
There are 15276 simulations with score > 50% that failed SPRT.
There are 171 simulations with score = 50% that failed SPRT.
Estimated elapsed time: 1270.26 seconds.
Speed: 1405947 games/second.
Code: Select all
[...]
From 27000 to 27999 games: 990 simulations ( 1.98 %); accumulated: 49.12 %.
From 28000 to 28999 games: 948 simulations ( 1.90 %); accumulated: 51.01 %.
[...]
From 34000 to 34999 games: 784 simulations ( 1.57 %); accumulated: 61.43 %.
From 35000 to 35999 games: 752 simulations ( 1.50 %); accumulated: 62.94 %.
[...]
From 68000 to 68999 games: 198 simulations ( 0.40 %); accumulated: 89.93 %.
From 69000 to 69999 games: 210 simulations ( 0.42 %); accumulated: 90.35 %.
[...]
From 85000 to 85999 games: 101 simulations ( 0.20 %); accumulated: 94.87 %.
From 86000 to 86999 games: 94 simulations ( 0.19 %); accumulated: 95.06 %.
[...]
From 126000 to 126999 games: 27 simulations ( 0.05 %); accumulated: 98.96 %.
From 127000 to 127999 games: 18 simulations ( 0.04 %); accumulated: 99.00 %.
[...]
I must compile your sources to take benefit from multi-thread.
Comparing your results with Michel's script:
Code: Select all
SPRT(0, 4.5). Results with Michel's script (drawelo = 290):
Elo BayesElo Pass Avg run
0.00 0.00 0.0500 35185
0.25 0.47 0.0886 40631
0.50 0.94 0.1521 46616
0.75 1.41 0.2488 52337
1.00 1.87 0.3795 56423
1.25 2.34 0.5303 57484
1.50 2.81 0.6758 55097
1.75 3.28 0.7938 50177
2.00 3.75 0.8766 44215
2.25 4.22 0.9292 38381
2.50 4.69 0.9604 33248
2.75 5.15 0.9781 28960
3.00 5.62 0.9880 25453
Regards from Spain.
Ajedrecista.
-
- Posts: 1971
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: SPRT questions.
Hello:
psq test is now in a pending status. At this moment:
drawelo ~ 285.7191 estimated from the sample of 279297 games; Bayeselo gain ~ 2.4396 Bayeselo if I am not wrong. Using alpha = 0.05 = beta and 50000 simulations, starting at +45731 -44667 =188899 instead of +0 -0 =0. Here are my results of simulation of SPRT(0, 4.5):
So, this test could have right now a probability of pass of circa 84.24% (if my input parameters are valid enough). Some extra info:
Summary of the distribution of the length of simulations:
With my input parameters: median - starting point = 17127 games. The end is probably near.
Regards from Spain.
Ajedrecista.
psq test is now in a pending status. At this moment:
Code: Select all
LLR: 1.79 (-2.94,2.94) [0.00,4.50]
Total: 279297 W: 45731 L: 44667 D: 188899
sprt @ 60+0.05 th 1
Code: Select all
[...]
49996/ 50000 Passes: 42119 Fails: 7877 <Games>/simulation: 313407
49997/ 50000 Passes: 42120 Fails: 7877 <Games>/simulation: 313407
49998/ 50000 Passes: 42121 Fails: 7877 <Games>/simulation: 313406
49999/ 50000 Passes: 42121 Fails: 7878 <Games>/simulation: 313406
50000/ 50000 Passes: 42122 Fails: 7878 <Games>/simulation: 313406
Shortest simulation: 279744 games (simulation 49024).
Longest simulation: 775708 games (simulation 38781).
Average number of games per simulation: 313406
Median of the distribution: 296424
There are 7878 simulations with score > 50% that failed SPRT.
There are 0 simulations with score = 50% that failed SPRT.
Code: Select all
Shortest simulation: PASS after 279744 games ( +45841 -44722 =189181).
Medians of simulations: PASS after 296424 games ( +48518 -47341 =200565).
PASS after 296424 games ( +48545 -47368 =200511).
Longest simulation: FAIL after 775708 games (+126582 -123997 =525129).
Code: Select all
From 279000 to 279999 games: 14 simulations ( 0.03 %); accumulated: 0.03 %.
From 280000 to 280999 games: 1197 simulations ( 2.39 %); accumulated: 2.42 %.
From 281000 to 281999 games: 2517 simulations ( 5.03 %); accumulated: 7.46 %.
From 282000 to 282999 games: 2706 simulations ( 5.41 %); accumulated: 12.87 %.
From 283000 to 283999 games: 2428 simulations ( 4.86 %); accumulated: 17.72 %.
From 284000 to 284999 games: 2285 simulations ( 4.57 %); accumulated: 22.29 %.
From 285000 to 285999 games: 1951 simulations ( 3.90 %); accumulated: 26.20 %.
From 286000 to 286999 games: 1827 simulations ( 3.65 %); accumulated: 29.85 %.
From 287000 to 287999 games: 1619 simulations ( 3.24 %); accumulated: 33.09 %.
From 288000 to 288999 games: 1370 simulations ( 2.74 %); accumulated: 35.83 %.
From 289000 to 289999 games: 1283 simulations ( 2.57 %); accumulated: 38.39 %.
From 290000 to 290999 games: 1115 simulations ( 2.23 %); accumulated: 40.62 %.
From 291000 to 291999 games: 979 simulations ( 1.96 %); accumulated: 42.58 %.
From 292000 to 292999 games: 918 simulations ( 1.84 %); accumulated: 44.42 %.
From 293000 to 293999 games: 855 simulations ( 1.71 %); accumulated: 46.13 %.
From 294000 to 294999 games: 881 simulations ( 1.76 %); accumulated: 47.89 %.
From 295000 to 295999 games: 732 simulations ( 1.46 %); accumulated: 49.35 %.
From 296000 to 296999 games: 730 simulations ( 1.46 %); accumulated: 50.81 %.
[...]
From 312000 to 312999 games: 398 simulations ( 0.80 %); accumulated: 66.70 %.
From 313000 to 313999 games: 372 simulations ( 0.74 %); accumulated: 67.44 %.
From 314000 to 314999 games: 341 simulations ( 0.68 %); accumulated: 68.12 %.
[...]
From 366000 to 366999 games: 106 simulations ( 0.21 %); accumulated: 89.83 %.
From 367000 to 367999 games: 105 simulations ( 0.21 %); accumulated: 90.04 %.
[...]
From 398000 to 398999 games: 58 simulations ( 0.12 %); accumulated: 94.95 %.
From 399000 to 399999 games: 55 simulations ( 0.11 %); accumulated: 95.06 %.
[...]
From 471000 to 471999 games: 11 simulations ( 0.02 %); accumulated: 98.99 %.
From 472000 to 472999 games: 10 simulations ( 0.02 %); accumulated: 99.01 %.
[...]
From 502000 to 502999 games: 8 simulations ( 0.02 %); accumulated: 99.49 %.
From 503000 to 503999 games: 8 simulations ( 0.02 %); accumulated: 99.50 %.
From 504000 to 504999 games: 4 simulations ( 0.01 %); accumulated: 99.51 %.
[...]
From 511000 to 511999 games: 7 simulations ( 0.01 %); accumulated: 99.59 %.
[...]
From 568000 to 568999 games: 2 simulations ( 0.00 %); accumulated: 99.89 %.
From 569000 to 569999 games: 1 simulation ( 0.00 %); accumulated: 99.90 %.
From 570000 to 570999 games: 4 simulations ( 0.01 %); accumulated: 99.90 %.
[...]
From 732000 to 732999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %.
From 733000 to 733999 games: 1 simulation ( 0.00 %); accumulated: 100.00 %.
From 734000 to 734999 games: 1 simulation ( 0.00 %); accumulated: 100.00 %.
From 735000 to 735999 games: 0 simulations ( 0.00 %); accumulated: 100.00 %.
[...]
From 773000 to 773999 games: 0 simulations ( 0.00 %); accumulated: 100.00 %.
From 774000 to 774999 games: 0 simulations ( 0.00 %); accumulated: 100.00 %.
From 775000 to 775999 games: 1 simulation ( 0.00 %); accumulated: 100.00 %.
Regards from Spain.
Ajedrecista.
-
- Posts: 10301
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: SPRT questions.
The expected number of games is based on some assumptions that I think do not exist in the games.Ajedrecista wrote:Hello:
psq test is now in a pending status. At this moment:
drawelo ~ 285.7191 estimated from the sample of 279297 games; Bayeselo gain ~ 2.4396 Bayeselo if I am not wrong. Using alpha = 0.05 = beta and 50000 simulations, starting at +45731 -44667 =188899 instead of +0 -0 =0. Here are my results of simulation of SPRT(0, 4.5):Code: Select all
LLR: 1.79 (-2.94,2.94) [0.00,4.50] Total: 279297 W: 45731 L: 44667 D: 188899 sprt @ 60+0.05 th 1
So, this test could have right now a probability of pass of circa 84.24% (if my input parameters are valid enough). Some extra info:Code: Select all
[...] 49996/ 50000 Passes: 42119 Fails: 7877 <Games>/simulation: 313407 49997/ 50000 Passes: 42120 Fails: 7877 <Games>/simulation: 313407 49998/ 50000 Passes: 42121 Fails: 7877 <Games>/simulation: 313406 49999/ 50000 Passes: 42121 Fails: 7878 <Games>/simulation: 313406 50000/ 50000 Passes: 42122 Fails: 7878 <Games>/simulation: 313406 Shortest simulation: 279744 games (simulation 49024). Longest simulation: 775708 games (simulation 38781). Average number of games per simulation: 313406 Median of the distribution: 296424 There are 7878 simulations with score > 50% that failed SPRT. There are 0 simulations with score = 50% that failed SPRT.
Summary of the distribution of the length of simulations:Code: Select all
Shortest simulation: PASS after 279744 games ( +45841 -44722 =189181). Medians of simulations: PASS after 296424 games ( +48518 -47341 =200565). PASS after 296424 games ( +48545 -47368 =200511). Longest simulation: FAIL after 775708 games (+126582 -123997 =525129).
With my input parameters: median - starting point = 17127 games. The end is probably near.Code: Select all
From 279000 to 279999 games: 14 simulations ( 0.03 %); accumulated: 0.03 %. From 280000 to 280999 games: 1197 simulations ( 2.39 %); accumulated: 2.42 %. From 281000 to 281999 games: 2517 simulations ( 5.03 %); accumulated: 7.46 %. From 282000 to 282999 games: 2706 simulations ( 5.41 %); accumulated: 12.87 %. From 283000 to 283999 games: 2428 simulations ( 4.86 %); accumulated: 17.72 %. From 284000 to 284999 games: 2285 simulations ( 4.57 %); accumulated: 22.29 %. From 285000 to 285999 games: 1951 simulations ( 3.90 %); accumulated: 26.20 %. From 286000 to 286999 games: 1827 simulations ( 3.65 %); accumulated: 29.85 %. From 287000 to 287999 games: 1619 simulations ( 3.24 %); accumulated: 33.09 %. From 288000 to 288999 games: 1370 simulations ( 2.74 %); accumulated: 35.83 %. From 289000 to 289999 games: 1283 simulations ( 2.57 %); accumulated: 38.39 %. From 290000 to 290999 games: 1115 simulations ( 2.23 %); accumulated: 40.62 %. From 291000 to 291999 games: 979 simulations ( 1.96 %); accumulated: 42.58 %. From 292000 to 292999 games: 918 simulations ( 1.84 %); accumulated: 44.42 %. From 293000 to 293999 games: 855 simulations ( 1.71 %); accumulated: 46.13 %. From 294000 to 294999 games: 881 simulations ( 1.76 %); accumulated: 47.89 %. From 295000 to 295999 games: 732 simulations ( 1.46 %); accumulated: 49.35 %. From 296000 to 296999 games: 730 simulations ( 1.46 %); accumulated: 50.81 %. [...] From 312000 to 312999 games: 398 simulations ( 0.80 %); accumulated: 66.70 %. From 313000 to 313999 games: 372 simulations ( 0.74 %); accumulated: 67.44 %. From 314000 to 314999 games: 341 simulations ( 0.68 %); accumulated: 68.12 %. [...] From 366000 to 366999 games: 106 simulations ( 0.21 %); accumulated: 89.83 %. From 367000 to 367999 games: 105 simulations ( 0.21 %); accumulated: 90.04 %. [...] From 398000 to 398999 games: 58 simulations ( 0.12 %); accumulated: 94.95 %. From 399000 to 399999 games: 55 simulations ( 0.11 %); accumulated: 95.06 %. [...] From 471000 to 471999 games: 11 simulations ( 0.02 %); accumulated: 98.99 %. From 472000 to 472999 games: 10 simulations ( 0.02 %); accumulated: 99.01 %. [...] From 502000 to 502999 games: 8 simulations ( 0.02 %); accumulated: 99.49 %. From 503000 to 503999 games: 8 simulations ( 0.02 %); accumulated: 99.50 %. From 504000 to 504999 games: 4 simulations ( 0.01 %); accumulated: 99.51 %. [...] From 511000 to 511999 games: 7 simulations ( 0.01 %); accumulated: 99.59 %. [...] From 568000 to 568999 games: 2 simulations ( 0.00 %); accumulated: 99.89 %. From 569000 to 569999 games: 1 simulation ( 0.00 %); accumulated: 99.90 %. From 570000 to 570999 games: 4 simulations ( 0.01 %); accumulated: 99.90 %. [...] From 732000 to 732999 games: 0 simulations ( 0.00 %); accumulated: 99.99 %. From 733000 to 733999 games: 1 simulation ( 0.00 %); accumulated: 100.00 %. From 734000 to 734999 games: 1 simulation ( 0.00 %); accumulated: 100.00 %. From 735000 to 735999 games: 0 simulations ( 0.00 %); accumulated: 100.00 %. [...] From 773000 to 773999 games: 0 simulations ( 0.00 %); accumulated: 100.00 %. From 774000 to 774999 games: 0 simulations ( 0.00 %); accumulated: 100.00 %. From 775000 to 775999 games: 1 simulation ( 0.00 %); accumulated: 100.00 %.
Regards from Spain.
Ajedrecista.
The result of the games are not independent variables assuming both programs play white and black from the same position.
I also think that nobody replied question 2 that I ask.
Note that b is not constant in question 2 and the idea is that you stop the test if you pass SPRT(0,b) for some 0<b<=6
or if the test fail SPRT(0,b) for some 0<b<=6:
You can calculate after every game if there is b<=6 when the test pass SPRT(0,b) and if there is b<=6 when the test fail SPRT(0,b)
This is clearly different test than normal SPRT and the expected number of games is clearly less than SPRT(0,6) because it is possible that after a lot of games SPRT(0,6) is not decided but the test passed SPRT(0,5).
The question is what is the price that you pay for it.
-
- Posts: 919
- Joined: Sat May 31, 2014 8:28 am
Re: SPRT questions.
You should direct this question to Kai Laskos as he has solved the equations analytically that govern this behavior. He is working on a model that can be used to predict various aspect including changing the ELO bounds.
Regards,
Zen
Regards,
Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.