program style, risk aversion

Don · Post by **Don** » Wed Dec 26, 2012 2:21 pm

Adam Hair wrote:My test is under way and ~1200 of 7800 games have been played so far.

Here are my handicaps (40moves/Xseconds):

Code: Select all

Name                    TC
Houdini 3              40/14
Critter 1.4            40/22
Komodo 5               40/30
Rybka 4.1              40/30
Stockfish 2.2.2        40/40
Naum 4.2               40/84
Hannibal 1.2           40/108
Gull 1.2               40/110
Spike 1.4              40/130
Spark 1.0              40/140
Protector 1.4.0        40/175
Quazar 0.4             40/180
Zappa Mexico II        40/280

Mine are running as well. I wish I had as many as you do but at least I have 1 or 2 that you don't have.

I came pretty close to adjusting them after a couple of false starts. I am making my first micro-adjustment now to bring up the rear.

Code: Select all

Rank    ELO     +/-    Games    Score  Player
---- ------- ------ -------- --------  ----------------------------
   1  3003.6   10.0     3357   51.013  Houdini3     
   2  3002.7   10.0     3358   50.864  Ivanhoe9.47b 
   3  3000.0   10.0     3360   50.432  kdev-4518.00 
   4  2998.5   10.0     3360   50.193  c16          
   5  2996.2   10.0     3360   49.821  spike14      
   6  2996.1   10.0     3360   49.792  sf23         
   7  2995.0   10.0     3359   49.613  hiarcs14     
   8  2994.0   10.0     3358   49.464  TogaII_2.0   
   9  2990.0   10.0     3360   48.810  spark1-0

Laskos · Post by **Laskos** » Wed Dec 26, 2012 2:24 pm

Adam Hair wrote:My test is under way and ~1200 of 7800 games have been played so far.

Here are my handicaps (40moves/Xseconds):

Code: Select all

Name                    TC
Houdini 3              40/14
Critter 1.4            40/22
Komodo 5               40/30
Rybka 4.1              40/30
Stockfish 2.2.2        40/40
Naum 4.2               40/84
Hannibal 1.2           40/108
Gull 1.2               40/110
Spike 1.4              40/130
Spark 1.0              40/140
Protector 1.4.0        40/175
Quazar 0.4             40/180
Zappa Mexico II        40/280

Nice, you have pretty long time controls (some 6 times longer than mine), and a large variety of engines, therefore your results will be more useful.

Adam Hair · Post by **Adam Hair** » Wed Dec 26, 2012 2:38 pm

Don wrote:

Adam Hair wrote:My test is under way and ~1200 of 7800 games have been played so far.

Here are my handicaps (40moves/Xseconds):

Code: Select all

Name                    TC
Houdini 3              40/14
Critter 1.4            40/22
Komodo 5               40/30
Rybka 4.1              40/30
Stockfish 2.2.2        40/40
Naum 4.2               40/84
Hannibal 1.2           40/108
Gull 1.2               40/110
Spike 1.4              40/130
Spark 1.0              40/140
Protector 1.4.0        40/175
Quazar 0.4             40/180
Zappa Mexico II        40/280

Mine are running as well. I wish I had as many as you do but at least I have 1 or 2 that you don't have.

I came pretty close to adjusting them after a couple of false starts. I am making my first micro-adjustment now to bring up the rear.

Code: Select all

Rank    ELO     +/-    Games    Score  Player
---- ------- ------ -------- --------  ----------------------------
   1  3003.6   10.0     3357   51.013  Houdini3     
   2  3002.7   10.0     3358   50.864  Ivanhoe9.47b 
   3  3000.0   10.0     3360   50.432  kdev-4518.00 
   4  2998.5   10.0     3360   50.193  c16          
   5  2996.2   10.0     3360   49.821  spike14      
   6  2996.1   10.0     3360   49.792  sf23         
   7  2995.0   10.0     3359   49.613  hiarcs14     
   8  2994.0   10.0     3358   49.464  TogaII_2.0   
   9  2990.0   10.0     3360   48.810  spark1-0

My results will not be bunched quite as tightly as Kai's and yours. I am aiming for the scores to be between 45% and 55%, though it may turn out better than that.

Don · Post by **Don** » Wed Dec 26, 2012 3:12 pm

Adam Hair wrote:
Don wrote:
Adam Hair wrote:My test is under way and ~1200 of 7800 games have been played so far.

Here are my handicaps (40moves/Xseconds):
Code: Select all
Name                    TC
Houdini 3              40/14
Critter 1.4            40/22
Komodo 5               40/30
Rybka 4.1              40/30
Stockfish 2.2.2        40/40
Naum 4.2               40/84
Hannibal 1.2           40/108
Gull 1.2               40/110
Spike 1.4              40/130
Spark 1.0              40/140
Protector 1.4.0        40/175
Quazar 0.4             40/180
Zappa Mexico II        40/280
Mine are running as well. I wish I had as many as you do but at least I have 1 or 2 that you don't have.

I came pretty close to adjusting them after a couple of false starts. I am making my first micro-adjustment now to bring up the rear.
Code: Select all
Rank    ELO     +/-    Games    Score  Player
---- ------- ------ -------- --------  ----------------------------
   1  3003.6   10.0     3357   51.013  Houdini3     
   2  3002.7   10.0     3358   50.864  Ivanhoe9.47b 
   3  3000.0   10.0     3360   50.432  kdev-4518.00 
   4  2998.5   10.0     3360   50.193  c16          
   5  2996.2   10.0     3360   49.821  spike14      
   6  2996.1   10.0     3360   49.792  sf23         
   7  2995.0   10.0     3359   49.613  hiarcs14     
   8  2994.0   10.0     3358   49.464  TogaII_2.0   
   9  2990.0   10.0     3360   48.810  spark1-0     
My results will not be bunched quite as tightly as Kai's and yours. I am aiming for the scores to be between 45% and 55%, though it may turn out better than that.

My hope is that one that is run like I am running it (with time adjusted equality) can be used to check your numerical methods. For that reason I should have tried harder to run more of the same programs you are running. I noticed that you are running an older version of stockfish and critter and that we actually only have 2 or 3 programs in common.

So what I might do is run the same batch of programs again when this completes but with only perhaps 3/4 of the adjustments I used in this tournament. I would prefer to use no adjustment at all, but some of the adjustments required are so huge that I fear the test would be meaningless. It's hard to draw any conclusions when there are programs many hundreds of ELO apart because you are not going to get much of anything but losses. I would have to run hundreds of thousands of games to see past the noise.

So the first test gives data we can trust and the second test gives us data to experiment with various formula's.

program style, risk aversion

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi