LOS (again)

Rebel · Post by **Rebel** » Tue Oct 30, 2012 1:14 pm

I have a couple of questions about the LOS table at the CPW.

http://chessprogramming.wikispaces.com/LOS+Table

1. Are bigger tables available (2000-5000) games?

2. Or even better what's the formula of the table?

3. If a change you make is about a small estimated elo gain then at which LOS percentage do you decide to terminate a match? Arrived here there are 2 possibilities:

a) a negative match score. Say you have played 1000 games so far and you have a -17 score. The LOS table states a 74% reliability. Is that enough for you to abort the match and dump the change you made?

b) a positive match score. After 1000 games you have a +25 score indicating a reliable percentage of 83%. What do you decide, take the 83% for granted and count the change as an improvement or do you test further?

lucasart · Post by **lucasart** » Tue Oct 30, 2012 1:44 pm

Rebel wrote: 2. Or even better what's the formula of the table?

Yes, it's very easy, just calculate the empirical mean and stdev, and use the gaussian distribution to calculate probabilities.

hgm · Post by **hgm** » Tue Oct 30, 2012 2:33 pm

Cutting short a match erodes your confidence. For example, when you test until the the score exceeds the the 1.96 STD interval around 50%, and abort the test immediately when it does this, you can only have 90% confidence that they were indeed of different strength. While normally 1.96 STD corresponds to a confidence of 95%, when it happens over a predetermined number of games (without paying attention to intermediate results). The possibility to stop early exactly doubles the probability that you will accept a fluke (from 5% to 10%), because you don't give it the opportunity to correct itself by an average result in the remainder of the test.

That doesn't necessarily mean that it would be bad to do this. It is simply a trade-off. On changes that have a huge effect you will be able to stop very early, which will save an enormous number of games. But on near-neutral changes you will now need many more games to reach the same confidence. What is the most profitable depends on how many 'large' improvements you test compared to near-neutral changes. Where 'large' is a relative concept.

lucasart · Post by **lucasart** » Tue Oct 30, 2012 3:33 pm

if you download and compile the latest cutechess from github, you'll see that cutechess-cli now has the SPRT (Sequential Probability Ratio Test).

Michel · Post by **Michel** » Tue Oct 30, 2012 4:22 pm

if you download and compile the latest cutechess from github, you'll see that cutechess-cli now has the SPRT (Sequential Probability Ratio Test).

Great!

Ajedrecista · Post by **Ajedrecista** » Tue Oct 30, 2012 9:32 pm

Hello Ed:

Rebel wrote:I have a couple of questions about the LOS table at the CPW.

http://chessprogramming.wikispaces.com/LOS+Table

1. Are bigger tables available (2000-5000) games?

2. Or even better what's the formula of the table?

3. If a change you make is about a small estimated elo gain then at which LOS percentage do you decide to terminate a match? Arrived here there are 2 possibilities:

a) a negative match score. Say you have played 1000 games so far and you have a -17 score. The LOS table states a 74% reliability. Is that enough for you to abort the match and dump the change you made?

b) a positive match score. After 1000 games you have a +25 score indicating a reliable percentage of 83%. What do you decide, take the 83% for granted and count the change as an improvement or do you test further?

I do exactly what Lucas said: I calculate mean and standard deviation of a normal distribution using the match statistics (wins, draws and loses) and I calculate a probability.

Back in June I PM'ed Gerd Isenberg, who sent my PM to Edmund Moshammer: I calculated larger tables with this Fortran 95 code, assuming a draw ratio of 32%:

Code: Select all

program LOS_ 

implicit none 

integer, parameter &#58;&#58; partitions = 200000 
integer &#58;&#58; n, i, max, j 
real&#40;KIND=3&#41; &#58;&#58; score, sigma, three_sqrt_of_two_pi, S1, S2, x, h, h2, k_for_LOS, LOS, wins 

write&#40;*,*) 
write&#40;*,'&#40;A&#41;') 'Number of games&#58;' 
write&#40;*,*) 
read&#40;*,*) n 
write&#40;*,*) 
if &#40;n <= 0&#41; then 
  write&#40;*,'&#40;A&#41;') 'Incorrect number of games.' 
  write&#40;*,*) 
  write&#40;*,'&#40;A&#41;') 'Please close and try again. Press Enter to exit.' 
  read&#40;*,'()') 
  stop 
end if 

max = int&#40;3.24d0*sqrt&#40;n + 0d0&#41;)  ! This limit avoids to calculate lots of LOS of the order of 100%, when n is big enough. 
three_sqrt_of_two_pi = 3d0*sqrt&#40;2d0*acos&#40;-1d0&#41;) 

open&#40;unit=111, file='log.txt', status='unknown', action='write') 
write&#40;111,'&#40;A,I6&#41;') 'Number of games&#58; ', n 
write&#40;111,'&#40;A&#41;') 'Draw ratio&#58; 32% &#40;fixed&#41;.' 
write&#40;111,*) 
write&#40;111,'&#40;A&#41;') 'Wins - loses&#58;     LOS&#58;' 

do j = 0, max 
  wins = 3.4d-1*n + 5d-1*j  ! Draw ratio = 32%. 
  score = wins/n + 1.6d-1  ! Draw ratio = 32%. 
  sigma = sqrt&#40;&#40;score*&#40;1d0 - score&#41; - 8d-2&#41;/n&#41; 
  ! The formula for calculating sigma was taken from this thread &#40;first seen in post #22&#41;&#58; 
  ! http&#58;//immortalchess.net/forum/showthread.php?t=2237 

k_for_LOS = abs&#40;5d-1 - score&#41;/sigma 

LOS = 0d0 
h = k_for_LOS/partitions  ! Start of the composite Simpson's rule for approximate erf function. 
h2 = h + h 

x = -h 
S1 = 0d0 
do i = 1, partitions-1, 2 
  x = x + h2 
  S1 = S1 + exp&#40;-5d-1*x*x&#41; 
end do 

x = 0d0 
S2 = 0d0 
do i = 2, partitions-2, 2 
  x = x + h2 
  S2 = S2 + exp&#40;-5d-1*x*x&#41; 
end do 

LOS = 5d-1 + h*&#40;1d0 + 4d0*S1 + 2d0*S2 + exp&#40;-5d-1*k_for_LOS*k_for_LOS&#41;)/three_sqrt_of_two_pi 

write&#40;111,'&#40;A,I4,A,F6.2,A&#41;') '  ', j, '         ', 1d-2*nint&#40;1d4*LOS,KIND=3&#41;, ' %'  ! j = wins - loses 

end do 

close&#40;111&#41; 
write&#40;*,'&#40;A&#41;') 'End of the calculations.' 
write&#40;*,*) 

end program LOS_

So, answering to your questions:

1.- They were available for 2000, 5000, 10000, 20000, 50000 and 100000 games because I uploaded them in the PM, but this link expired in July. My method is an approximation which becomes closer with larger number of games, but with even 2000 games my numbers are clearly very close to Edmund's ones (Edmund is the author of that LOS table in CPW).

2.- I do the following:

Code: Select all

score = &#40;wins + draws/2&#41;/&#40;wins + draws + loses&#41;
sigma = sqrt&#123;score*&#40;1 - score&#41; - draw_ratio/4&#93;/&#40;wins + draws + loses&#41;&#125;

k = &#40;0.5 - score&#41;/sigma

I calculate erf function from -infinity to k, then I call this number LOS. A look to the code can be good.

3.- I understand LOS as the probability of being wrong in the assumption of supposing an engine better than the other when in reality is just the opposite. This probability is min.(LOS, 1 - LOS), but of course it is an own thought that does not have to be right.

3.- a) 74.27% is for the engine with +17; the engine with -17 has a LOS value of 100 - 74.27 = 25.73% more less (more than 1/4 of probability of being wrong, which is a lot).

3.- b) The probability of being wrong is 100 - 83.11 = 16.89% more less, which still is very high IMHO. I like long tests, so I would continue a little more... but you see that I am not being rigurous here: it is only a guess.

------------------------

If you like, I can recompute these larger tables or even I can release a programme when you can fix the number of games and the draw ratio to get a notepad with the aspect of the LOS table hosted at CPW.

Just for curiosity, I computed and printed LOS values for 1000 games in a few seconds (I think less than ten) and I obtain LOS ~ 74.28% for +17 and LOS ~ 83.13% for +25; both values are very close to the official ones of CPW.

Regards from Spain.

Ajedrecista.

Edmund · Post by **Edmund** » Wed Oct 31, 2012 1:34 am

Rebel wrote:1. Are bigger tables available (2000-5000) games?

2. Or even better what's the formula of the table?

3. If a change you make is about a small estimated elo gain then at which LOS percentage do you decide to terminate a match? Arrived here there are 2 possibilities:

a) a negative match score. Say you have played 1000 games so far and you have a -17 score. The LOS table states a 74% reliability. Is that enough for you to abort the match and dump the change you made?

b) a positive match score. After 1000 games you have a +25 score indicating a reliable percentage of 83%. What do you decide, take the 83% for granted and count the change as an improvement or do you test further?

As Jesus pointed out, he generated some tables for larger number of games. I generated the ones available on CPW. The difference between his and mine is that he used the normal distribution, which is (for large n) a good approximation to the actually needed multinomial distribution.
You can find the c++ code I used to generate the numbers under:
http://chessprogramming.wikispaces.com/ ... /149187701
IIRC I had to use an arbitrary-precision arithmetic library to calculate the statistics for larger n.

You can find a vivid discussion on your Q3 under
http://talkchess.com/forum/viewtopic.ph ... at&start=0

Ajedrecista · Post by **Ajedrecista** » Wed Oct 31, 2012 9:07 am

Hello:

In my second point, I wrote a wrong value of k (without absolute value, just the opposite to the code; if you use the code but changing the value of k to the one of my post, the result should be 1 - LOS instead of LOS). According to what I use in my code, k = |0.5 - score|/sigma (with absolute value, which is OK in the code). I do not know if I can optimize my code a little more regarding speed calculation, because my programming skills are very limited. The code I posted in my previous post is fully correct or at least I have not found errors yet.

Regards from Spain.

Ajedrecista.

Rebel · Post by **Rebel** » Wed Oct 31, 2012 11:55 am

Hi Jesús,

Ajedrecista wrote: 1.- They were available for 2000, 5000, 10000, 20000, 50000 and 100000 games because I uploaded them in the PM, but this link expired in July. My method is an approximation which becomes closer with larger number of games, but with even 2000 games my numbers are clearly very close to Edmund's ones (Edmund is the author of that LOS table in CPW).

Is it possible for you to email those larger tables? I like to put them on my website as a reference. My email address is in PM.

2.- I do the following:
Code: Select all
score = &#40;wins + draws/2&#41;/&#40;wins + draws + loses&#41;
sigma = sqrt&#123;score*&#40;1 - score&#41; - draw_ratio/4&#93;/&#40;wins + draws + loses&#41;&#125;
k = &#40;0.5 - score&#41;/sigma
I calculate erf function from -infinity to k, then I call this number LOS. A look to the code can be good.

I have tried to import your above code in my utility that on request calculates the current situation in a match running in 4 threads and something is wrong with the sigma calculation. The corrected code after solving the compiler errors I initially got:

Code: Select all

score = &#40;wins + draws/2&#41;/&#40;wins + draws + loses&#41;;
sigma = sqrt&#40;score*&#40;1 - score&#41; - draw_ratio/4&#41;/&#40;wins + draws + loses&#41;;
LOS = &#40;0.5 - score&#41;/sigma;
printf&#40;"LOS = %.1f  &#40;score = %.1f&#41; &#40;sigma = %1.f&#41;\n\n",LOS,score,sigma&#41;;

The output

Code: Select all

658-960-696 &#40;2314&#41; match score 1138.0 - 1176.0 &#40;49.2%)

Won-loss 658-696 = -38 &#40;2314 games&#41; draws 41.5

LOS = -nan  &#40;score = 0.5&#41; &#40;sigma = -nan&#41;

As you can see the draw ratio is 41.5% in this match. But what is wrong with the formula?

Rebel · Post by **Rebel** » Wed Oct 31, 2012 12:28 pm

And related to the subject at hand, I have this match running at 2 seconds average.

Code: Select all

658-960-696 &#40;2314&#41; match score 1138.0 - 1176.0 &#40;49.2%)

Won-loss 658-696 = -38 &#40;2314 games&#41; draws 41.5

A previous match on 0.5 seconds average after 4000 games gave 51.0% and now after using a 4 time more time control its extremely unlikely the remaining 1700 games will raise the current 49.2% to 51%, not even to 50% I bet

It's a fixed pattern (at least for me) since Chrilly introduced AUTO232 somewhere in the early 90's, longer time controls are much more reliable and while the 4000 games at 0.5 seconds indicate a possible small elo gain the result of a longer time control will be decisive in my considerations.

Any similar or opposite experiences?

LOS (again)

LOS (again)

Re: LOS (again)

Re: LOS (again)

Re: LOS (again)

Re: LOS (again)

Re: LOS (again).

Re: LOS (again)

Re: LOS (again).

Re: LOS (again).

Re: LOS (again).