2-SPRT

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
Michel
Posts: 2003
Joined: Sun Sep 28, 2008 11:50 pm

2-SPRT

Post by Michel » Wed Jan 28, 2015 12:25 pm

I started to run some tests again and just to do something different I decided to use the 2-SPRT instead of the SPRT. Recall that the 2-SPRT optimizes the worst case behaviour of a sequential test which is when the real elo is precisely in the middle of the upper and lower elo bounds. Also unlike the SPRT the 2-SPRT has an absolute upper bound on its running time.

Here is the script I wrote some time ago to calculate the 2-SPRT parameters.

http://hardy.uhasselt.be/Toga/sprt2.py

To validate the results the script can do simulation and it can even compute the operating characteristic (OC) and average sample number (ASN) of the test exactly!!

Here are some notes on the formulas

http://hardy.uhasselt.be/Toga/support_whitehead.pdf

These are mainly notes to myself, so they are neither very rigorous nor very readable. You can see that the formulas used by the script are based on approximations but these hold to a high degree of accuracy for small elo differences.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

User avatar
Ajedrecista
Posts: 1376
Joined: Wed Jul 13, 2011 7:04 pm
Location: Madrid, Spain.
Contact:

Re: 2-SPRT.

Post by Ajedrecista » Wed Jan 28, 2015 1:38 pm

Hello Michel:
Michel wrote:I started to run some tests again and just to do something different I decided to use the 2-SPRT instead of the SPRT. Recall that the 2-SPRT optimizes the worst case behaviour of a sequential test which is when the real elo is precisely in the middle of the upper and lower elo bounds. Also unlike the SPRT the 2-SPRT has an absolute upper bound on its running time.

Here is the script I wrote some time ago to calculate the 2-SPRT parameters.

http://hardy.uhasselt.be/Toga/sprt2.py

To validate the results the script can do simulation and it can even compute the operating characteristic (OC) and average sample number (ASN) of the test exactly!!

Here are some notes on the formulas

http://hardy.uhasselt.be/Toga/support_whitehead.pdf

These are mainly notes to myself, so they are neither very rigorous nor very readable. You can see that the formulas used by the script are based on approximations but these hold to a high degree of accuracy for small elo differences.
The paper is very interesting, thanks for share it. I guess that the script is also very valuable, but since I do not have Python and do not know how to run a script (my bad! :O), I can not get info from it.

I see that the point is to minimize the expected length of a test whose patch has an Bayeselo gain of e = (theta_0 + theta_1)/2 using the notation of the paper. I hope 2-SPRT does not harm the average run of other cases such as e = theta_0.

Could you post some examples of the saving of the number of games, please? I remember that it was posted in SF Google Group and I recall a number of 10% less games for the case of e = (theta_0 + theta_1)/2. Is it right? Here are two links that I found:

https://groups.google.com/forum/#!msg/f ... jQl8P_jAsJ

https://groups.google.com/forum/#!msg/f ... 4hXAj7t2IJ

For example, in a normal SPRT(-1.5, 4.5) with e = 1.5, d = 240, alpha = 0.05 = beta and 20000 simulations (446.06 seconds with a speed of 1305193 games/second), I get an average run of circa (29110 ± 326) games with 95% confidence, a median run of 22178 games, a shortest run of 1338 games (+211 -314 =813, fail), a longest run of 242686 games (+48992 -48427 =145267, fail), 10117 passes and 9883 fails (near to the expected 50% - 50%), and circa 99.39% of finished simulations with less than 128000 games (useful info for Fishtest). These values match with yours? If that, in this case I expect something like 26200 games on average using 2-SPRT.

Thanks in advance.

------------------------

On other side, I made a correction here. I think I am right... but I am very, very far from perfection. ;) Any confirmation is welcome.

Regards from Spain.

Ajedrecista.

Michel
Posts: 2003
Joined: Sun Sep 28, 2008 11:50 pm

Re: 2-SPRT.

Post by Michel » Wed Jan 28, 2015 2:44 pm

Yes it hurts a bit at theta0 and theta1. It has to because the SPRT is precisely optimal at these values. The choice between the SPRT and 2-SPRT is mainly psychological. For me improving the worst case behaviour is important since testing basically freezes my workflow.

Here is some sample output

Code: Select all

$ python sprt2.py --expected
Test parameters:
alpha=0.05
beta=0.05
elo0=-1.50
elo1=4.50
draw_elo=240

Continuation region:
1.00277097*N-83.412576 < S < 1.00000000*N+83.412576

Worst case expected running time&#58; 25363
Maximal running time&#58; 60205

Approximate expected running time&#58;

 Elo           SPRT       SPRT2     Hoeffding
=====          =====      =====     =========
-4.50           9570      12052           
-3.90          10587      13078           
-3.30          11815      14270           
-2.70          13309      15649           
-2.10          15129      17219           
-1.50&#40;H0&#41;      17322      18953           
-0.90          19891      20768           
-0.30          22714      22516           
 0.30          25459      24000           
 0.90          27544      25011           
 1.50          28336      25363      24448
 2.10          27544      25011           
 2.70          25458      24000           
 3.30          22714      22516           
 3.90          19890      20767           
 4.50&#40;H1&#41;      17322      18952           
 5.10          15128      17219           
 5.70          13309      15649           
 6.30          11814      14270           
 6.90          10586      13078           
 7.50           9570      12052 
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

Michel
Posts: 2003
Joined: Sun Sep 28, 2008 11:50 pm

Re: 2-SPRT.

Post by Michel » Mon Feb 02, 2015 8:46 am

The best sequential test is actually the "Schwarz" test.

http://hardy.uhasselt.be/Toga/ref_schwarz.pdf

Here is an old .pdf I found

http://hardy.uhasselt.be/Toga/schwarz_results.pdf

One can see that the Schwarz test is almost equivalent to the SPRT at elo0,elo1 but the worst case behaviour is substantially better (similar to the 2-SPRT).

The results in the above .pdf were established by simulation so they took a long time to generate. Meanwhile I know how to compute the parameters of a Schwarz test theoretically and I have a script for that as well. Since I now plan to switch to the Schwarz test for my own testing, I will clean up the script and post it here.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

User avatar
Ajedrecista
Posts: 1376
Joined: Wed Jul 13, 2011 7:04 pm
Location: Madrid, Spain.
Contact:

Comparison of our SPRT results.

Post by Ajedrecista » Thu Feb 05, 2015 12:18 pm

Hello Michel:
Michel wrote:The results in the above .pdf were established by simulation so they took a long time to generate.
Just to check my SPRT simulator, I ran it in my Intel Pentium D930 of year 2006. The programme is 32-bit and single core. I understand the SPRT power as the probability to accept the patch. Here are my results:

Code: Select all

alpha&#58;            0.0500
beta&#58;             0.0500

drawelo_fixed&#58;  240.0000

bayeselo_0&#58;      -3.0000
bayeselo_1&#58;       3.0000

Simulations&#58;   10000 &#40;each time&#41;

Code: Select all

Comparison of results &#40;M = Michel's; J = Jesús')&#58;

Bayeselo     <Games &#40;M&#41;>     <Games &#40;J&#41;>     Power&#40;M&#41;     Power&#40;J&#41;     Elapse time &#40;J&#41;
========     ===========     ===========     ========     ========     ===============
  -7.5          7709.9           7906         0.0006       0.0008         79.55 sec. 
  -7            8276.9           8470         0.0012       0.0005         83.23 sec.
  -6.5          8894.5           9101         0.0017       0.0011         88.23 sec.
  -6            9604.4           9840         0.0028       0.0012         93.72 sec.
  -5.5         10440.0          10619         0.0045       0.0035        102.09 sec.
  -5           11426.9          11617         0.0074       0.0060        104.10 sec.
  -4.5         12587.8          12787         0.0115       0.0091        112.02 sec.
  -4           13926.1          14122         0.0193       0.0194        121.12 sec.
  -3.5         15532.2          15755         0.0312       0.0294        132.37 sec.
  -3           17409.8          17593         0.0483       0.0476        149.39 sec.
  -2.5         19561.5          19713         0.0798       0.0771        162.78 sec.
  -2           21918.5          22128         0.1240       0.1210        179.91 sec.
  -1.5         24377.8          24903         0.1842       0.1790        196.77 sec.
  -1           26456.1          26916         0.2729       0.2687        214.55 sec.
  -0.5         27987.2          28501         0.3799       0.3791        228.82 sec.
   0           28431.7          28899         0.4996       0.5010        234.46 sec.
In my case: power(-7.5) > power(-7)... well, bad luck! I see that in general my average number of games is slightly higher than yours... I suppose that we choosed the same value for drawelo parameter. The opposite for the values of power: yours are higher in general.

The sum of my elapsed times is 2283.11 seconds = 0:38:03.11 if I am right. Of course I spent extra time in changing the Bayeselo value each time in the input notepad, but less than an hour for 160000 simulations in SPRT(-3, 3) is not bad in such an old computer IMHO. I do not know the number of simulations you ran, but I would say that we agree in our results in general terms.

Your PDF is trimmed at the right margin (the last numbers are not visible). For 0 Bayeselo, the four values are: 28431.7, 0.4996; 25349.3, 0.4993. Just as a side note: my results for average number of games are rounded to the closest integers.

Good luck with your Schwarz tests! I stay tuned.

Regards from Spain.

Ajedrecista.

Michel
Posts: 2003
Joined: Sun Sep 28, 2008 11:50 pm

Re: Comparison of our SPRT results.

Post by Michel » Fri Feb 06, 2015 7:40 pm

In my case: power(-7.5) > power(-7)... well, bad luck! I see that in general my average number of games is slightly higher than yours... I suppose that we choosed the same value for drawelo parameter. The opposite for the values of power: yours are higher in general.
Well I don't remember how I generated the SPRT results... This is just an old .pdf I found from the time I was learning about sequential testing.

Sequential testing involves really beautiful math. It is in a sense a solved problem as for a given prior there is a way to generate an optimal solution (using dynamic programming) that minimizes the average duration of the test for that prior, given overall type 1 and type 2 error bounds. Asymptotically things improve and the optimal solution depends only on the support of the prior (places where it is non-zero). The SPRT, 2-SPRT and the Schwarz tests are all asymptotic solutions of this type.
Your PDF is trimmed at the right margin (the last numbers are not visible). For 0 Bayeselo, the four values are: 28431.7, 0.4996; 25349.3, 0.4993. Just as a side note: my results for average number of games are rounded to the closest integers.
Ah. It is not really trimmed, but really stops at 0 (since the table is symmetric the values >0 can be derived from those <0). But I agree the layout could be much better.
Good luck with your Schwarz tests! I stay tuned.
I am adding a section to my document on the 2-SPRT about the Schwarz test. To simplify things I am again using Taylor series approximations to the likelihood functions. After that I will implement the formulas in a script.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

Michel
Posts: 2003
Joined: Sun Sep 28, 2008 11:50 pm

Re: Comparison of our SPRT results.

Post by Michel » Fri Feb 06, 2015 11:31 pm

The SPRT, 2-SPRT and the Schwarz tests are all asymptotic solutions of this type.
Addendum to avoid potential confusion: by a lucky mathematical accident the optimality properties of the SPRT are not just asymptotic. They hold always!
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

User avatar
Ajedrecista
Posts: 1376
Joined: Wed Jul 13, 2011 7:04 pm
Location: Madrid, Spain.
Contact:

Re: Comparison of our SPRT results.

Post by Ajedrecista » Sat Feb 07, 2015 9:22 am

Hello again:
Michel wrote:
Ajedrecista wrote:Your PDF is trimmed at the right margin (the last numbers are not visible). For 0 Bayeselo, the four values are: 28431.7, 0.4996; 25349.3, 0.4993. Just as a side note: my results for average number of games are rounded to the closest integers.
Ah. It is not really trimmed, but really stops at 0 (since the table is symmetric the values >0 can be derived from those <0). But I agree the layout could be much better.
I was not talking about the lack of values for Bayeselo = {0.5, 1, ...}. Since elo0 + elo1 = 0, I already understood that you said that the values for elo_i are the same (in number of games) to the values of -elo_i, while power(elo_i) = 1 - power(-elo_i) in this special case of elo0 = -elo1. What I wanted to say is that for 0 Bayeselo results I only see {28431, 0.499; 25349, 0.499} instead of {28431.7, 0.4996; 25349.3, 0.4993}, that is, the last number of each result is missing due to the margin, but I managed to get the full results just copying and pasting the full text of the PDF into a Notepad.

I agree: the layout could be better just writing the value of drawelo for this example (probably 240) and fixing the issue of the right margin, as well as the values of alpha and beta (I know that they could be derived easily with power(elo0) if we suppose that alpha = beta, and it looks like alpha = 1/20 = beta this time) and the number of simulations... but if this info is quite old and you do not remember the details, there is nothing to be done. Otherwise it is perfect.

Once again, good luck with your new tests.

Regards from Spain.

Ajedrecista.

Post Reply