## LOS (again)

**Moderators:** bob, hgm, Harvey Williamson

**Forum rules**

This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

### LOS (again)

I have a couple of questions about the LOS table at the CPW.

http://chessprogramming.wikispaces.com/LOS+Table

1. Are bigger tables available (2000-5000) games?

2. Or even better what's the formula of the table?

3. If a change you make is about a small estimated elo gain then at which LOS percentage do you decide to terminate a match? Arrived here there are 2 possibilities:

a) a negative match score. Say you have played 1000 games so far and you have a -17 score. The LOS table states a 74% reliability. Is that enough for you to abort the match and dump the change you made?

b) a positive match score. After 1000 games you have a +25 score indicating a reliable percentage of 83%. What do you decide, take the 83% for granted and count the change as an improvement or do you test further?

http://chessprogramming.wikispaces.com/LOS+Table

1. Are bigger tables available (2000-5000) games?

2. Or even better what's the formula of the table?

3. If a change you make is about a small estimated elo gain then at which LOS percentage do you decide to terminate a match? Arrived here there are 2 possibilities:

a) a negative match score. Say you have played 1000 games so far and you have a -17 score. The LOS table states a 74% reliability. Is that enough for you to abort the match and dump the change you made?

b) a positive match score. After 1000 games you have a +25 score indicating a reliable percentage of 83%. What do you decide, take the 83% for granted and count the change as an improvement or do you test further?

### Re: LOS (again)

Yes, it's very easy, just calculate the empirical mean and stdev, and use the gaussian distribution to calculate probabilities.Rebel wrote: 2. Or even better what's the formula of the table?

Theory and practice sometimes clash. And when that happens, theory loses. Every single time.

- hgm
**Posts:**23783**Joined:**Fri Mar 10, 2006 9:06 am**Location:**Amsterdam**Full name:**H G Muller-
**Contact:**

### Re: LOS (again)

Cutting short a match erodes your confidence. For example, when you test until the the score exceeds the the 1.96 STD interval around 50%, and abort the test immediately when it does this, you can only have 90% confidence that they were indeed of different strength. While normally 1.96 STD corresponds to a confidence of 95%, when it happens over a predetermined number of games (without paying attention to intermediate results). The possibility to stop early exactly doubles the probability that you will accept a fluke (from 5% to 10%), because you don't give it the opportunity to correct itself by an average result in the remainder of the test.

That doesn't necessarily mean that it would be bad to do this. It is simply a trade-off. On changes that have a huge effect you will be able to stop very early, which will save an enormous number of games. But on near-neutral changes you will now need many more games to reach the same confidence. What is the most profitable depends on how many 'large' improvements you test compared to near-neutral changes. Where 'large' is a relative concept.

That doesn't necessarily mean that it would be bad to do this. It is simply a trade-off. On changes that have a huge effect you will be able to stop very early, which will save an enormous number of games. But on near-neutral changes you will now need many more games to reach the same confidence. What is the most profitable depends on how many 'large' improvements you test compared to near-neutral changes. Where 'large' is a relative concept.

### Re: LOS (again)

if you download and compile the latest cutechess from github, you'll see that cutechess-cli now has the SPRT (Sequential Probability Ratio Test).

Theory and practice sometimes clash. And when that happens, theory loses. Every single time.

### Re: LOS (again)

Great!if you download and compile the latest cutechess from github, you'll see that cutechess-cli now has the SPRT (Sequential Probability Ratio Test).

- Ajedrecista
**Posts:**1401**Joined:**Wed Jul 13, 2011 7:04 pm**Location:**Madrid, Spain.-
**Contact:**

### Re: LOS (again).

Hello Ed:

Back in June I PM'ed Gerd Isenberg, who sent my PM to Edmund Moshammer: I calculated larger tables with this Fortran 95 code, assuming a draw ratio of 32%:

So, answering to your questions:

1.- They were available for 2000, 5000, 10000, 20000, 50000 and 100000 games because I uploaded them in the PM, but this link expired in July. My method is an approximation which becomes closer with larger number of games, but with even 2000 games my numbers are clearly very close to Edmund's ones (Edmund is the author of that LOS table in CPW).

2.- I do the following:

I calculate erf function from -infinity to k, then I call this number LOS. A look to the code can be good.

3.- I understand LOS as the probability of being wrong in the assumption of supposing an engine better than the other when in reality is just the opposite. This probability is min.(LOS, 1 - LOS), but of course it is an own thought that does not have to be right.

3.- a) 74.27% is for the engine with +17; the engine with -17 has a LOS value of 100 - 74.27 = 25.73% more less (more than 1/4 of probability of being wrong, which is a lot).

3.- b) The probability of being wrong is 100 - 83.11 = 16.89% more less, which still is very high IMHO. I like long tests, so I would continue a little more... but you see that I am not being rigurous here: it is only a guess.

------------------------

If you like, I can recompute these larger tables or even I can release a programme when you can fix the number of games and the draw ratio to get a notepad with the aspect of the LOS table hosted at CPW.

Just for curiosity, I computed and printed LOS values for 1000 games in a few seconds (I think less than ten) and I obtain LOS ~ 74.28% for +17 and LOS ~ 83.13% for +25; both values are very close to the official ones of CPW.

Regards from Spain.

Ajedrecista.

I do exactly what Lucas said: I calculate mean and standard deviation of a normal distribution using the match statistics (wins, draws and loses) and I calculate a probability.Rebel wrote:I have a couple of questions about the LOS table at the CPW.

http://chessprogramming.wikispaces.com/LOS+Table

1. Are bigger tables available (2000-5000) games?

2. Or even better what's the formula of the table?

3. If a change you make is about a small estimated elo gain then at which LOS percentage do you decide to terminate a match? Arrived here there are 2 possibilities:

a) a negative match score. Say you have played 1000 games so far and you have a -17 score. The LOS table states a 74% reliability. Is that enough for you to abort the match and dump the change you made?

b) a positive match score. After 1000 games you have a +25 score indicating a reliable percentage of 83%. What do you decide, take the 83% for granted and count the change as an improvement or do you test further?

Back in June I PM'ed Gerd Isenberg, who sent my PM to Edmund Moshammer: I calculated larger tables with this Fortran 95 code, assuming a draw ratio of 32%:

Code: Select all

```
program LOS_
implicit none
integer, parameter :: partitions = 200000
integer :: n, i, max, j
real(KIND=3) :: score, sigma, three_sqrt_of_two_pi, S1, S2, x, h, h2, k_for_LOS, LOS, wins
write(*,*)
write(*,'(A)') 'Number of games:'
write(*,*)
read(*,*) n
write(*,*)
if (n <= 0) then
write(*,'(A)') 'Incorrect number of games.'
write(*,*)
write(*,'(A)') 'Please close and try again. Press Enter to exit.'
read(*,'()')
stop
end if
max = int(3.24d0*sqrt(n + 0d0)) ! This limit avoids to calculate lots of LOS of the order of 100%, when n is big enough.
three_sqrt_of_two_pi = 3d0*sqrt(2d0*acos(-1d0))
open(unit=111, file='log.txt', status='unknown', action='write')
write(111,'(A,I6)') 'Number of games: ', n
write(111,'(A)') 'Draw ratio: 32% (fixed).'
write(111,*)
write(111,'(A)') 'Wins - loses: LOS:'
do j = 0, max
wins = 3.4d-1*n + 5d-1*j ! Draw ratio = 32%.
score = wins/n + 1.6d-1 ! Draw ratio = 32%.
sigma = sqrt((score*(1d0 - score) - 8d-2)/n)
! The formula for calculating sigma was taken from this thread (first seen in post #22):
! http://immortalchess.net/forum/showthread.php?t=2237
k_for_LOS = abs(5d-1 - score)/sigma
LOS = 0d0
h = k_for_LOS/partitions ! Start of the composite Simpson's rule for approximate erf function.
h2 = h + h
x = -h
S1 = 0d0
do i = 1, partitions-1, 2
x = x + h2
S1 = S1 + exp(-5d-1*x*x)
end do
x = 0d0
S2 = 0d0
do i = 2, partitions-2, 2
x = x + h2
S2 = S2 + exp(-5d-1*x*x)
end do
LOS = 5d-1 + h*(1d0 + 4d0*S1 + 2d0*S2 + exp(-5d-1*k_for_LOS*k_for_LOS))/three_sqrt_of_two_pi
write(111,'(A,I4,A,F6.2,A)') ' ', j, ' ', 1d-2*nint(1d4*LOS,KIND=3), ' %' ! j = wins - loses
end do
close(111)
write(*,'(A)') 'End of the calculations.'
write(*,*)
end program LOS_
```

1.- They were available for 2000, 5000, 10000, 20000, 50000 and 100000 games because I uploaded them in the PM, but this link expired in July. My method is an approximation which becomes closer with larger number of games, but with even 2000 games my numbers are clearly very close to Edmund's ones (Edmund is the author of that LOS table in CPW).

2.- I do the following:

Code: Select all

```
score = (wins + draws/2)/(wins + draws + loses)
sigma = sqrt{score*(1 - score) - draw_ratio/4]/(wins + draws + loses)}
k = (0.5 - score)/sigma
```

3.- I understand LOS as the probability of being wrong in the assumption of supposing an engine better than the other when in reality is just the opposite. This probability is min.(LOS, 1 - LOS), but of course it is an own thought that does not have to be right.

3.- a) 74.27% is for the engine with +17; the engine with -17 has a LOS value of 100 - 74.27 = 25.73% more less (more than 1/4 of probability of being wrong, which is a lot).

3.- b) The probability of being wrong is 100 - 83.11 = 16.89% more less, which still is very high IMHO. I like long tests, so I would continue a little more... but you see that I am not being rigurous here: it is only a guess.

------------------------

If you like, I can recompute these larger tables or even I can release a programme when you can fix the number of games and the draw ratio to get a notepad with the aspect of the LOS table hosted at CPW.

Just for curiosity, I computed and printed LOS values for 1000 games in a few seconds (I think less than ten) and I obtain LOS ~ 74.28% for +17 and LOS ~ 83.13% for +25; both values are very close to the official ones of CPW.

Regards from Spain.

Ajedrecista.

### Re: LOS (again)

As Jesus pointed out, he generated some tables for larger number of games. I generated the ones available on CPW. The difference between his and mine is that he used the normal distribution, which is (for large n) a good approximation to the actually needed multinomial distribution.Rebel wrote:1. Are bigger tables available (2000-5000) games?

2. Or even better what's the formula of the table?

3. If a change you make is about a small estimated elo gain then at which LOS percentage do you decide to terminate a match? Arrived here there are 2 possibilities:

a) a negative match score. Say you have played 1000 games so far and you have a -17 score. The LOS table states a 74% reliability. Is that enough for you to abort the match and dump the change you made?

b) a positive match score. After 1000 games you have a +25 score indicating a reliable percentage of 83%. What do you decide, take the 83% for granted and count the change as an improvement or do you test further?

You can find the c++ code I used to generate the numbers under:

http://chessprogramming.wikispaces.com/ ... /149187701

IIRC I had to use an arbitrary-precision arithmetic library to calculate the statistics for larger n.

You can find a vivid discussion on your Q3 under

http://talkchess.com/forum/viewtopic.ph ... at&start=0

- Ajedrecista
**Posts:**1401**Joined:**Wed Jul 13, 2011 7:04 pm**Location:**Madrid, Spain.-
**Contact:**

### Re: LOS (again).

Hello:

In my second point, I wrote a wrong value of k (without absolute value, just the opposite to the code; if you use the code but changing the value of k to the one of my post, the result should be 1 - LOS instead of LOS). According to what I use in my code, k = |0.5 - score|/sigma (with absolute value, which is OK in the code). I do not know if I can optimize my code a little more regarding speed calculation, because my programming skills are very limited. The code I posted in my previous post is fully correct or at least I have not found errors yet.

Regards from Spain.

Ajedrecista.

In my second point, I wrote a wrong value of k (without absolute value, just the opposite to the code; if you use the code but changing the value of k to the one of my post, the result should be 1 - LOS instead of LOS). According to what I use in my code, k = |0.5 - score|/sigma (with absolute value, which is OK in the code). I do not know if I can optimize my code a little more regarding speed calculation, because my programming skills are very limited. The code I posted in my previous post is fully correct or at least I have not found errors yet.

Regards from Spain.

Ajedrecista.

### Re: LOS (again).

Hi Jesús,

The output

As you can see the draw ratio is 41.5% in this match. But what is wrong with the formula?

Is it possible for you to email those larger tables? I like to put them on my website as a reference. My email address is in PM.Ajedrecista wrote: 1.- They were available for 2000, 5000, 10000, 20000, 50000 and 100000 games because I uploaded them in the PM, but this link expired in July. My method is an approximation which becomes closer with larger number of games, but with even 2000 games my numbers are clearly very close to Edmund's ones (Edmund is the author of that LOS table in CPW).

I have tried to import your above code in my utility that on request calculates the current situation in a match running in 4 threads and something is wrong with the sigma calculation. The corrected code after solving the compiler errors I initially got:2.- I do the following:I calculate erf function from -infinity to k, then I call this number LOS. A look to the code can be good.Code: Select all

`score = (wins + draws/2)/(wins + draws + loses) sigma = sqrt{score*(1 - score) - draw_ratio/4]/(wins + draws + loses)} k = (0.5 - score)/sigma`

Code: Select all

```
score = (wins + draws/2)/(wins + draws + loses);
sigma = sqrt(score*(1 - score) - draw_ratio/4)/(wins + draws + loses);
LOS = (0.5 - score)/sigma;
printf("LOS = %.1f (score = %.1f) (sigma = %1.f)\n\n",LOS,score,sigma);
```

Code: Select all

```
658-960-696 (2314) match score 1138.0 - 1176.0 (49.2%)
Won-loss 658-696 = -38 (2314 games) draws 41.5
LOS = -nan (score = 0.5) (sigma = -nan)
```

### Re: LOS (again).

And related to the subject at hand, I have this match running at 2 seconds average.

A previous match on 0.5 seconds average after 4000 games gave 51.0% and now after using a 4 time more time control its extremely unlikely the remaining 1700 games will raise the current 49.2% to 51%, not even to 50% I bet

It's a fixed pattern (at least for me) since Chrilly introduced AUTO232 somewhere in the early 90's, longer time controls are much more reliable and while the 4000 games at 0.5 seconds indicate a possible small elo gain the result of a longer time control will be decisive in my considerations.

Any similar or opposite experiences?

Code: Select all

```
658-960-696 (2314) match score 1138.0 - 1176.0 (49.2%)
Won-loss 658-696 = -38 (2314 games) draws 41.5
```

It's a fixed pattern (at least for me) since Chrilly introduced AUTO232 somewhere in the early 90's, longer time controls are much more reliable and while the 4000 games at 0.5 seconds indicate a possible small elo gain the result of a longer time control will be decisive in my considerations.

Any similar or opposite experiences?