CCRL 40/4 lists updated (11th August 2012)

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Vinvin
Posts: 5298
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: CCRL 40/4 lists updated (11th August 2012)

Post by Vinvin »

Modern Times wrote:I agree George, as I said before if SSE4 gives even 5 Elo I would be surprised
I've the number of "10% speed up" for SSE4 (mainly from the popcount instruction), that would mean +7 Elo.
Modern Times
Posts: 3748
Joined: Thu Jun 07, 2012 11:02 pm

Re: CCRL 40/4 lists updated (11th August 2012)

Post by Modern Times »

Vinvin wrote:
Modern Times wrote:I agree George, as I said before if SSE4 gives even 5 Elo I would be surprised
I've the number of "10% speed up" for SSE4 (mainly from the popcount instruction), that would mean +7 Elo.
How many thousands of games do you need to prove (or disprove) a +7 Elo improvement ? And have we seen such a test ?
User avatar
Ajedrecista
Posts: 2126
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: CCRL 40/4 lists updated (11th August 2012).

Post by Ajedrecista »

Hello Ray:
Modern Times wrote:
Vinvin wrote:
Modern Times wrote:I agree George, as I said before if SSE4 gives even 5 Elo I would be surprised
I've the number of "10% speed up" for SSE4 (mainly from the popcount instruction), that would mean +7 Elo.
How many thousands of games do you need to prove (or disprove) a +7 Elo improvement ? And have we seen such a test ?
I think that Vincent says a +7 Elo gain for a 10% speed due to this formula:
Gain = 50·ln(1.1)/ln(2) ~ 6.88 ~ 7
That 1.1 stands for the 10% of speed up (110/100 = 1.1). This formula assumes a gain of 50 Elo for each doubling (the number 2 in the formula), but some tests conducted by Peter Österlund and Adam Hair (just citing two of them) showed that the gain for doubling speed in modern engines can be more than 50 Elo. Links here and here.

Asking to your question on how many games should be needed for determine a 7 Elo gap: it depends of the confidence/LOS you want. I have ran my programme for a LOS of 97.5% in a one-sided test (a confidence of 95% in a two-sided test):

Code: Select all

Minimum_number_of_games, ® 2012.

 Calculation of the minimum number of games in a match between two engines to ensure an Elo gain with a given LOS value:

Write down the wanted Elo gain between 0.1 and 40 Elo (it will be rounded up to 0.01 Elo):

7

 Write down the likelihood of superiority (in percentage) between 90% and 99.9% (LOS will be rounded up to 0.01%):

97.5

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3
_______________________________________________________________________________

Score for a wanted gain of  7.00 Elo:  51.0072 %
Standard deviation for 97.50 % of LOS:  1.0072 %

A LOS value of 97.50 % is equivalent to 95.00 % confidence in a two-sided test.

Minimum number of needed games:       9464 games.
_______________________________________________________________________________

End of the calculations. Approximated elapsed time:  17 ms.

Thanks for using Minimum_number_of_games. Press Enter to exit.
You must run no less than 9000 games for verifying this improvement statistically. I hope that this info is useful. Please keep up the great work with CCRL rating lists!

Regards from Spain.

Ajedrecista.
Modern Times
Posts: 3748
Joined: Thu Jun 07, 2012 11:02 pm

Re: CCRL 40/4 lists updated (11th August 2012).

Post by Modern Times »

But the assumption here is that the +50 Elo due to speed doubling (or whatever the number is) is a linear progression. I believe (but can't prove) that this is not true.
lkaufman
Posts: 6258
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CCRL 40/4 lists updated (11th August 2012).

Post by lkaufman »

Our data for Komodo gave 90 per doubling, which is 1.3% per elo, which means 10% = 13 elo. But of course the other engines also gain something from SSE, maybe 5 elo, so we are back to 8 elo assuming both engines use SSE. But some of the opposing engines don't have SSE capability, so this means the value of SSE for Komodo is between 8 and 13, let's say 10. Yes, there is some decrease in the value of each doubling with greater depth for a given engine, but it seems that as the engines improve the value of doubling has not been shrinking. It all depends on the nature of the improvement. Since the doubling value has been calculated from recent versions where the range of speed is fairly small, I think this factor is negligible for the present calculation.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: CCRL 40/4 lists updated (11th August 2012).

Post by Laskos »

lkaufman wrote:Our data for Komodo gave 90 per doubling,
You must state the time control and the computer used. I believe it's around 90 points for like 2'+2'' TC, and around 60-70 at 40'/40 repeating on strong hardware. It could become even less than 50 at 120'/40 repeating on several cores. And that all pretty much independent of the absolute strength of the engine, somehow better engines see diminishing returns at higher Elo.
which is 1.3% per elo, which means 10% = 13 elo. But of course the other engines also gain something from SSE, maybe 5 elo, so we are back to 8 elo assuming both engines use SSE. But some of the opposing engines don't have SSE capability, so this means the value of SSE for Komodo is between 8 and 13, let's say 10. Yes, there is some decrease in the value of each doubling with greater depth for a given engine, but it seems that as the engines improve the value of doubling has not been shrinking. It all depends on the nature of the improvement. Since the doubling value has been calculated from recent versions where the range of speed is fairly small, I think this factor is negligible for the present calculation.
ernest
Posts: 2053
Joined: Wed Mar 08, 2006 8:30 pm

Re: CCRL 40/4 lists updated (11th August 2012).

Post by ernest »

Ajedrecista wrote: Calculation of the minimum number of games in a match between two engines to ensure an Elo gain with a given LOS value:
Don't you need, somewhere in your calculation, an assumption of the draw rate?
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: CCRL 40/4 lists updated (11th August 2012).

Post by Laskos »

Ajedrecista wrote:Hello Ray:
Modern Times wrote:
Vinvin wrote:
Modern Times wrote:I agree George, as I said before if SSE4 gives even 5 Elo I would be surprised
I've the number of "10% speed up" for SSE4 (mainly from the popcount instruction), that would mean +7 Elo.
How many thousands of games do you need to prove (or disprove) a +7 Elo improvement ? And have we seen such a test ?
I think that Vincent says a +7 Elo gain for a 10% speed due to this formula:
Gain = 50·ln(1.1)/ln(2) ~ 6.88 ~ 7
That 1.1 stands for the 10% of speed up (110/100 = 1.1). This formula assumes a gain of 50 Elo for each doubling (the number 2 in the formula), but some tests conducted by Peter Österlund and Adam Hair (just citing two of them) showed that the gain for doubling speed in modern engines can be more than 50 Elo. Links here and here.

Asking to your question on how many games should be needed for determine a 7 Elo gap: it depends of the confidence/LOS you want. I have ran my programme for a LOS of 97.5% in a one-sided test (a confidence of 95% in a two-sided test):

Code: Select all

Minimum_number_of_games, ® 2012.

 Calculation of the minimum number of games in a match between two engines to ensure an Elo gain with a given LOS value:

Write down the wanted Elo gain between 0.1 and 40 Elo (it will be rounded up to 0.01 Elo):

7

 Write down the likelihood of superiority (in percentage) between 90% and 99.9% (LOS will be rounded up to 0.01%):

97.5

Write down the clock rate of the CPU (in GHz), only for timing the elapsed time of the calculations:

3
_______________________________________________________________________________

Score for a wanted gain of  7.00 Elo:  51.0072 %
Standard deviation for 97.50 % of LOS:  1.0072 %

A LOS value of 97.50 % is equivalent to 95.00 % confidence in a two-sided test.

Minimum number of needed games:       9464 games.
_______________________________________________________________________________

End of the calculations. Approximated elapsed time:  17 ms.

Thanks for using Minimum_number_of_games. Press Enter to exit.
You must run no less than 9000 games for verifying this improvement statistically. I hope that this info is useful. Please keep up the great work with CCRL rating lists!

Regards from Spain.

Ajedrecista.
Rule of thumb (580/(gain))^2, with the "gain" here being 7 Elo points, therefore (580/7) ~ 7,000 games. The 580 is for ~35% draws, and it would be replaced by ~660 for 20% draws and ~450 for 50% draws.

Kai
User avatar
Ajedrecista
Posts: 2126
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: CCRL 40/4 lists updated (11th August 2012).

Post by Ajedrecista »

Hi Ernest!
ernest wrote:
Ajedrecista wrote: Calculation of the minimum number of games in a match between two engines to ensure an Elo gain with a given LOS value:
Don't you need, somewhere in your calculation, an assumption of the draw rate?
You say well, but for this special calculation I prefer to not taking into account draws. Here is an extract from the source code:

Code: Select all

sigma = (mu - 5d-1)/k  ! The original equation is: mu - k*sigma - 0.5 = 0.

n = (mu*(1d0 - mu))/(sigma*sigma)  ! The draw ratio is not taken into account: the results of n are more reliable.
! The formula for calculating sigma was taken from this thread (first seen in post #22):
! http://immortalchess.net/forum/showthread.php?t=2237
! A detailed explanation of the expression for the standard deviation (sigma) is found in the section 3.2 of this PDF:
! http://centaur.reading.ac.uk/4549/1/2003b_ICGA_J_H_Self-Play_Statistical_Significance.pdf
I do not use draw ratio here because high draw ratios distort the calculations: the extreme case (all draws) will bring 0 games if I am not wrong. I assume that draw ratio is 0%, but it is the worst case and I am only taking into account the worst case. Another very similar approach is seen here in the example 3 (I do not know why I do not have a link of this site in my source code): you will see that this example is solved with 2401 games (in my case: 2400 for those conditions), but then the author specifies that the number of games can be reduced a little with some assumptions (I guess that here the draw ratio appears in my model). The same with me: I only present the worst case, so more games means total statistical safety inside this LOS (if you calculate the result with 97.5% LOS, 1/40 of times will be wrong)... but I built that programme for getting a rough idea (I mean, not discussing if the minimum number of needed games is 5000 or 5020, but 3000 or 6000 (big differences)). Once you have the number of games for the worst case, you can reduce it a little... but how many? This task is up to you, sorry. I am too conservative in my predictions, this is why I consider the worst case.

In this post by Kai you see other form; in my programme LOS_and_Elo_uncertainties_calculator, I called K = gain · sqrt(n)... in fact, this K is around 560, 585, ... (I do not remeber well) for 95% confidence and 30 or 35% of draws. This way is also good and fast enough.

I hope that I have answered you. Please ask again if you still have doubts.

Regards from Spain.

Ajedrecista.
Last edited by Ajedrecista on Thu Aug 16, 2012 8:59 pm, edited 1 time in total.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: CCRL 40/4 lists updated (11th August 2012).

Post by Laskos »

Ajedrecista wrote:Hi Ernest!
ernest wrote:
Ajedrecista wrote: Calculation of the minimum number of games in a match between two engines to ensure an Elo gain with a given LOS value:
Don't you need, somewhere in your calculation, an assumption of the draw rate?
You say well, but for this special calculation I prefer to not taking into account draws. Here is an extract from the source code:

Code: Select all

sigma = (mu - 5d-1)/k  ! The original equation is: mu - k*sigma - 0.5 = 0.

n = (mu*(1d0 - mu))/(sigma*sigma)  ! The draw ratio is not taken into account: the results of n are more reliable.
! The formula for calculating sigma was taken from this thread (first seen in post #22):
! http://immortalchess.net/forum/showthread.php?t=2237
! A detailed explanation of the expression for the standard deviation (sigma) is found in the section 3.2 of this PDF:
! http://centaur.reading.ac.uk/4549/1/2003b_ICGA_J_H_Self-Play_Statistical_Significance.pdf
I do not use draw ratio here because high draw ratios distort the calculations: the extreme case (all draws) will bring 0 games if I am not wrong. I assume that draw ratio is 0%, but it is the worst case and I am only taking into account the worst case. Another very similar approach is seen here in the example 3 (I do not know why I do not have a link of this site in my source code): you will see that this example is solved with 2401 games (in my case: 2400 for those conditions), but then the author specifies that the number of games can be reduced a little with some assumptions (I guess that here the draw ratio appears in my model). The same with me: I only present the worst case, so more games means total statistical safety inside this LOS (if you calculate the result with 97.5% LOS, 1/40 of times will be wrong)... but I built that programme for getting a rough idea (I mean, not discussing if the minimum number of needed games is 5000 or 5020, but 3000 or 6000 (big differences)). Once you have the number of games for the worst case, you can reduce it a little... but how many? This task is up to you, sorry. I am too conservative in my predictions, this is why I consider the worst case.

I hope that I have answered you. Please ask again if you still have doubts.

Regards from Spain.

Ajedrecista.
For 0% draws the rule of thumb is (700/gain)^2 which gives for gain=7 points 10,000 games, close to your 9,464 games.

Kai