Current skill command (Crafty) results

bob · Post by **bob** » Wed Jul 21, 2010 5:44 pm

The other thread is the wrong place for the skill discussion, so I am starting things over here.

First, here is 23.3 results. the R07 version has a new skill change that slows the NPS down proportional to the skill level, to help minimize the Beal effect.

the -n after the R07 versions is the skill setting used, it varied from 1, to 10 and by 10 all the way to 100. R06 is our best 23.3 version so far and will likely be the release version once I get the skill feature into a usable state. It is about +55 over 23.2.

Code: Select all

Name                  Elo    +    - games score oppo. draws
Crafty-23.3R06-1     2924    5    5 30000   65%  2807   22%  
Crafty-23.3R07-100   2923    5    5 30000   65%  2807   22%  
Crafty-23.2          2867    5    5 30000   58%  2807   22% 
Crafty-23.3R07-90    2756    5    5 30000   43%  2807   22%  
Crafty-23.3R07-80    2753    6    6 11622   43%  2807   22%  
Crafty-23.3R07-70    2700    5    5 30000   36%  2807   20%  
Crafty-23.3R07-60    2555    5    5 30000   20%  2807   15%  
Crafty-23.3R07-50    2384    6    6 30000    8%  2807    9%   
Crafty-23.3R07-40    2215    9    9 30000    3%  2807    4%   
Crafty-23.3R07-30    1943   18   18 30000    1%  2807    1%   
Crafty-23.3R07-20    1804   28   28 30000    0%  2807    0%   
Crafty-23.3R07-10    1665   42   42 30000    0%  2807    0%   
Crafty-23.3R07-1     1548   60   60 30000    0%  2807    0%

The ratings at the bottom are not accurate, because the weakest opponent in this test is over 2600, but it does show that the Elo can be spread all over the place with this command. I'm not quite so happy with skill 70-90, as that is a pretty minimal change. But this is the first cut. I have an alternative way to reduce the Beal effect that I am testing next. The -80 test is not finished, but it seems to fit in with the others pretty well even with almost 20K games remaining.

Dann Corbit · Post by **Dann Corbit** » Wed Jul 21, 2010 8:30 pm

bob wrote:The other thread is the wrong place for the skill discussion, so I am starting things over here.

First, here is 23.3 results. the R07 version has a new skill change that slows the NPS down proportional to the skill level, to help minimize the Beal effect.

the -n after the R07 versions is the skill setting used, it varied from 1, to 10 and by 10 all the way to 100. R06 is our best 23.3 version so far and will likely be the release version once I get the skill feature into a usable state. It is about +55 over 23.2.
Code: Select all
Name                  Elo    +    - games score oppo. draws
Crafty-23.3R06-1     2924    5    5 30000   65%  2807   22%  
Crafty-23.3R07-100   2923    5    5 30000   65%  2807   22%  
Crafty-23.2          2867    5    5 30000   58%  2807   22% 
Crafty-23.3R07-90    2756    5    5 30000   43%  2807   22%  
Crafty-23.3R07-80    2753    6    6 11622   43%  2807   22%  
Crafty-23.3R07-70    2700    5    5 30000   36%  2807   20%  
Crafty-23.3R07-60    2555    5    5 30000   20%  2807   15%  
Crafty-23.3R07-50    2384    6    6 30000    8%  2807    9%   
Crafty-23.3R07-40    2215    9    9 30000    3%  2807    4%   
Crafty-23.3R07-30    1943   18   18 30000    1%  2807    1%   
Crafty-23.3R07-20    1804   28   28 30000    0%  2807    0%   
Crafty-23.3R07-10    1665   42   42 30000    0%  2807    0%   
Crafty-23.3R07-1     1548   60   60 30000    0%  2807    0%   
The ratings at the bottom are not accurate, because the weakest opponent in this test is over 2600, but it does show that the Elo can be spread all over the place with this command. I'm not quite so happy with skill 70-90, as that is a pretty minimal change. But this is the first cut. I have an alternative way to reduce the Beal effect that I am testing next. The -80 test is not finished, but it seems to fit in with the others pretty well even with almost 20K games remaining.

Here were my results (I don't have your mighty cluster so the significance is much lower):

Code: Select all

   Program                  Elo    +   -   Games   Score   Av.Op.  Draws
 1 Crafty-232ap00         : 3344  133 121    55    90.0 %   2963   12.7 %
 2 Crafty-23.2a-skill-mod : 3270  113 105    55    83.6 %   2986   14.5 %
 3 Crafty-232ap50         : 3179  102  97    55    75.5 %   2984   12.7 %
 4 Crafty-232ap10         : 3100   88  86    55    63.6 %   3003   18.2 %
 5 Crafty-232ap01         : 2945   87  88    55    39.1 %   3022   16.4 %
 6 Crafty-232am01         : 2889   90  94    55    30.0 %   3036   16.4 %
 7 Crafty-232am10         : 2788  113 126    55    18.2 %   3049    3.6 %
 8 Crafty-232am50         : 2486    0   0    55     0.0 %   3086    0.0 %

I see the same pattern that you do (and I extended to the negative and the pattern continues). However, I get a strange effect for a setting of zero. Can you run a skill of zero on your mighty cluster to see if you get the same behavior? The only real change I made to the code was to allow any skill number from -100 to +100 instead of from +1 to +100.

Dann Corbit · Post by **Dann Corbit** » Wed Jul 21, 2010 9:35 pm

Dann Corbit wrote:
bob wrote:The other thread is the wrong place for the skill discussion, so I am starting things over here.

First, here is 23.3 results. the R07 version has a new skill change that slows the NPS down proportional to the skill level, to help minimize the Beal effect.

the -n after the R07 versions is the skill setting used, it varied from 1, to 10 and by 10 all the way to 100. R06 is our best 23.3 version so far and will likely be the release version once I get the skill feature into a usable state. It is about +55 over 23.2.
Code: Select all
Name                  Elo    +    - games score oppo. draws
Crafty-23.3R06-1     2924    5    5 30000   65%  2807   22%  
Crafty-23.3R07-100   2923    5    5 30000   65%  2807   22%  
Crafty-23.2          2867    5    5 30000   58%  2807   22% 
Crafty-23.3R07-90    2756    5    5 30000   43%  2807   22%  
Crafty-23.3R07-80    2753    6    6 11622   43%  2807   22%  
Crafty-23.3R07-70    2700    5    5 30000   36%  2807   20%  
Crafty-23.3R07-60    2555    5    5 30000   20%  2807   15%  
Crafty-23.3R07-50    2384    6    6 30000    8%  2807    9%   
Crafty-23.3R07-40    2215    9    9 30000    3%  2807    4%   
Crafty-23.3R07-30    1943   18   18 30000    1%  2807    1%   
Crafty-23.3R07-20    1804   28   28 30000    0%  2807    0%   
Crafty-23.3R07-10    1665   42   42 30000    0%  2807    0%   
Crafty-23.3R07-1     1548   60   60 30000    0%  2807    0%   
The ratings at the bottom are not accurate, because the weakest opponent in this test is over 2600, but it does show that the Elo can be spread all over the place with this command. I'm not quite so happy with skill 70-90, as that is a pretty minimal change. But this is the first cut. I have an alternative way to reduce the Beal effect that I am testing next. The -80 test is not finished, but it seems to fit in with the others pretty well even with almost 20K games remaining.
Here were my results (I don't have your mighty cluster so the significance is much lower):
Code: Select all
   Program                  Elo    +   -   Games   Score   Av.Op.  Draws
 1 Crafty-232ap00         : 3344  133 121    55    90.0 %   2963   12.7 %
 2 Crafty-23.2a-skill-mod : 3270  113 105    55    83.6 %   2986   14.5 %
 3 Crafty-232ap50         : 3179  102  97    55    75.5 %   2984   12.7 %
 4 Crafty-232ap10         : 3100   88  86    55    63.6 %   3003   18.2 %
 5 Crafty-232ap01         : 2945   87  88    55    39.1 %   3022   16.4 %
 6 Crafty-232am01         : 2889   90  94    55    30.0 %   3036   16.4 %
 7 Crafty-232am10         : 2788  113 126    55    18.2 %   3049    3.6 %
 8 Crafty-232am50         : 2486    0   0    55     0.0 %   3086    0.0 %
I see the same pattern that you do (and I extended to the negative and the pattern continues). However, I get a strange effect for a setting of zero. Can you run a skill of zero on your mighty cluster to see if you get the same behavior? The only real change I made to the code was to allow any skill number from -100 to +100 instead of from +1 to +100.

I strongly suspect that the effect I am seeing is due to one of these assignments:

Code: Select all

option.c (   3397):       null_depth = null_depth * skill / 100;
option.c (   3398):       check_depth = check_depth * skill / 100;
option.c (   3399):       LMR_depth = LMR_depth * skill / 100;

Dann Corbit · Post by **Dann Corbit** » Wed Jul 21, 2010 11:18 pm

Mystery solved. Pilot error. See:
http://www.talkchess.com/forum/viewtopi ... 190#363190

Mangar · Post by **Mangar** » Thu Jul 22, 2010 6:27 pm

Hi,

for Spike I add a random value to eval and reduce nps to reach a given Elo value by the following formula:

RandF = max(0, min(150, (2800 - Elo) / 5))
(With Eval = Eval() +(rand() % RandF - RandF / 2)
(100 = pawn)

and

Nps = 20 ^ ((Elo - 1100.0) / 500.0 + 1.0)

I send the cpu to sleep for 1/16 sec. as often as neccessairy to reach the nps. This results in very low cpu usage for low elo values.

IMHO if this formula works for spike it should work for most engines. Sadly I only had some tests by human playes to tune the factors and no artificial test. The random value is not needed but it "smells" more like a weak human player if sometimes simple pawn losses are not seen.

Have you got a comparable formular to reduce strenth?

Greetings Volker

bob · Post by **bob** » Fri Jul 23, 2010 4:09 am

Mangar wrote:Hi,

for Spike I add a random value to eval and reduce nps to reach a given Elo value by the following formula:

RandF = max(0, min(150, (2800 - Elo) / 5))
(With Eval = Eval() +(rand() % RandF - RandF / 2)
(100 = pawn)

and

Nps = 20 ^ ((Elo - 1100.0) / 500.0 + 1.0)

I send the cpu to sleep for 1/16 sec. as often as neccessairy to reach the nps. This results in very low cpu usage for low elo values.

IMHO if this formula works for spike it should work for most engines. Sadly I only had some tests by human playes to tune the factors and no artificial test. The random value is not needed but it "smells" more like a weak human player if sometimes simple pawn losses are not seen.

Have you got a comparable formular to reduce strenth?

Greetings Volker

No. What I did was to come with an idea, and then test it on the cluster at various settings to see what happens. Problem is, if you want to take an engine like Crafty and get it down into the 800 range from its normal 2800, that is a _huge_ drop and it is difficult to come up with a suite of opponents that bracket ratings from sub-800 to 2800+, which is not so easy to come up with...

I'd like to find something that is hardware platform independent, but that seems even harder.

Mangar · Post by **Mangar** » Fri Jul 23, 2010 5:01 pm

Hm,

I think "my" way to reduce nodes searched per second is pretty hardware independent - not dependent of cpu speed - if you find a way to wait 1/16 second on every machine. But as far as I know you have plenty of experience with this kind of stuff. (I learned how to sync threads in linux from your code.)

Greetings Volker

jhaglund · Post by **jhaglund** » Mon Jul 26, 2010 5:58 pm

if you find a way to wait 1/16 second on every machine

Sleep(62); //62.5/1000

bob · Post by **bob** » Mon Jul 26, 2010 6:21 pm

jhaglund wrote:
if you find a way to wait 1/16 second on every machine
Sleep(62); //62.5/1000

Not guaranteed. In fact, sleep(1) is supposed to sleep for _one_ second, according to POSIX. nanosleep() is supposed to sleep for either (a) the indicated number of nanoseconds, or (b) the indicated number of nanoseconds rounded up to the operating system clock resolution, which for most Linux kernels is 100th of a second, but can vary from that.

jhaglund · Post by **jhaglund** » Tue Jul 27, 2010 4:30 pm

Posted: Mon Jul 26, 2010 4:21 pm Post subject: Re: Current skill command (Crafty) results

--------------------------------------------------------------------------------

jhaglund wrote:
Quote:
if you find a way to wait 1/16 second on every machine

Sleep(62); //62.5/1000

Not guaranteed. In fact, sleep(1) is supposed to sleep for _one_ second, according to POSIX. nanosleep() is supposed to sleep for either (a) the indicated number of nanoseconds, or (b) the indicated number of nanoseconds rounded up to the operating system clock resolution, which for most Linux kernels is 100th of a second, but can vary from that.

This was for "windoze"... works for me....

Sleep(1000); // = 1 sec.
Sleep(62); // about 1/16th
Sleep(125); // = 1/8th
etc...

so?

int x, skill;
cout << " Enter skill (1-100): ";
cin >> skill;
skill = x;
cout << " Level: " << x << endl;
if(x >= 100 && x <=1)
if(x == 100) //100% strength
nanosleep(0); // no sleep
if(x == 90)
nanosleep(10);
if(x == 80)
nanosleep(20);
if(x == 70)
nanosleep(30);
if(x == 60)
nanosleep(40);
if(x == 50)
nanosleep(50);
if(x == 40)
nanosleep(60);
if(x == 30)
nanosleep(70);
if(x == 20)
nanosleep(80);
if(x == 10)
nanosleep(90);
if(x == 1)
nanosleep(100);
else
nanosleep(x);
...