program style, risk aversion

Don · Post by **Don** » Thu Dec 20, 2012 2:24 pm

Adam Hair wrote:
Ajedrecista wrote: Adam said that, taking his regression as the best estimator, then Kai's method is better than mine... at least I tried it! I am happy because I started this subtopic inside the main topic and some people came with their work... I do not know if Kai already used D/[µ*(1 - µ)] before the start of this thread, but I am sure that Adam thought about his regression model in these few days! Congratulations to both of you.

Regards from Spain.

Ajedrecista.
We can credit Kirill Kryukov for applying regression methods to determining the draw rate characteristics of engines. What I have done is simply a continuation of his ideas.

I was not aware that this had already been studied when I opened this thread.

If you want I can add your current formula to my crosstable script - I think I still have the data.

It is not certain that your method does not work better for other data. I do suspect some modification of your formula or Kai's formula is needed.

Thanks for starting this, Jesús (and Don too).

Laskos · Post by **Laskos** » Thu Dec 20, 2012 2:45 pm

Adam Hair wrote:
Ajedrecista wrote: Adam said that, taking his regression as the best estimator, then Kai's method is better than mine... at least I tried it! I am happy because I started this subtopic inside the main topic and some people came with their work... I do not know if Kai already used D/[µ*(1 - µ)] before the start of this thread, but I am sure that Adam thought about his regression model in these few days! Congratulations to both of you.

Regards from Spain.

Ajedrecista.
We can credit Kirill Kryukov for applying regression methods to determining the draw rate characteristics of engines. What I have done is simply a continuation of his ideas.

It is not certain that your method does not work better for other data. I do suspect some modification of your formula or Kai's formula is needed.

Thanks for starting this, Jesús (and Don too).

To both Jesus and Adam:

One thing bothers me in the regression is that it takes into account that the strength correlates with draw rate. For round-robin it simply means that it correlates with score (µ). I am not sure we have to account for this, or this is a property of engines, and it simply means that stronger engines are less draw averse.

I adapted my recipe D/(µ(1-µ)) to take account of this correlation (though I am not sure it's needed) to D/((1-µ)*µ^1.10). I will call this "Kai modified", for taking into account the strength of the engines.

The IPON table now looks:

Code: Select all


Name                     Score      D       D_max     k       k*u*&#40;1-u&#41;  D/&#40;u*&#40;1-u&#41;) Draw dev.  D/&#40;u^1.10*&#40;1-u&#41;)
Deep Junior 13.3         39.00%   34.00%   78.00%    0.4359   0.1037      1.429      -5.21%      1.57         
Houdini 3 STD            82.00%   24.00%   36.00%    0.6667   0.0984      1.626      -2.05%      1.66
Gull 1.2                 45.00%   39.00%   90.00%    0.4333   0.1073      1.576      -1.93%      1.71
Quazar 0.4               36.00%   37.00%   72.00%    0.5139   0.1184      1.606      -1.07%      1.78
HIARCS 14 WCSC 32b       48.00%   40.00%   96.00%    0.4167   0.1040      1.603      -1.06%      1.73
Komodo 5                 73.00%   34.00%   54.00%    0.6296   0.1241      1.725      -0.19%      1.78
Protector 1.4.0          39.00%   39.00%   78.00%    0.5      0.1190      1.639      -0.16%      1.80
Deep Shredder 12         45.00%   40.00%   90.00%    0.4444   0.1100      1.616      -0.08%      1.75
Deep Fritz 13 32b        51.00%   40.00%   98.00%    0.4082   0.1020      1.601      -0.07%      1.71
Hannibal 1.2             45.00%   40.00%   90.00%    0.4444   0.1100      1.616      -0.04%      1.75
spark-1.0                41.00%   39.00%   82.00%    0.4756   0.1151      1.612      -0.03%      1.76
Zappa Mexico II          32.00%   35.00%   64.00%    0.5469   0.1190      1.608       0.00%      1.80
Critter 1.4a             71.00%   37.00%   58.00%    0.6379   0.1314      1.797       0.10%      1.86
Spike 1.4 32b            42.00%   40.00%   84.00%    0.4762   0.1160      1.642       0.24%      1.79
Deep Sjeng c't 2010 32b  43.00%   41.00%   86.00%    0.4767   0.1169      1.673       0.45%      1.82
Naum 4.2                 50.00%   42.00%   100.00%   0.42     0.1050      1.680       0.63%      1.80
MinkoChess 1.3           31.00%   36.00%   62.00%    0.5806   0.1242      1.683       0.69%      1.89
Chiron 1.5               52.00%   42.00%   96.00%    0.4375   0.1092      1.683       0.87%      1.80
Stockfish 2.2.2 JA       69.00%   40.00%   62.00%    0.6452   0.1380      1.870       1.68%      1.94
Deep Rybka 4.1           68.00%   40.00%   64.00%    0.625    0.1360      1.838       2.41%      1.91

Now I give correlations:

Correlation[Adam, Jesus] = 0.61
Correlation[Adam, Kai] = 0.81
Correlation[Adam, Kai modified] = 0.89
Correlation[Jesus, Kai] = 0.79

So, my "strength adjusted" correlates well with the regression (0.89), but again, I don't know if we have to account for absolute strength, and not just for relative scores.

Kai

Don · Post by **Don** » Thu Dec 20, 2012 3:23 pm

Adam,

I am going to run another time adjusted match on my spare machine - a laptop quad.

I'm including the awesome Hiarcs program as well as critter, invahoe, komodo dev, spark, spike, stockfish and Houdini. Not a lot of variety but some. Every program at their defaults for contempt.

When this completes I will run another match that is not time adjusted. The goal will be to see if we can determine draw aversion without being forced to time-adjust programs.

I suspect that a compromise can be used if the formula's work but not perfectly. For example the formula's may work fine as long as the programs are withing 100 ELO of each other.

As I've already stated, I also believe it's valid to take a guess at the adjustments and then make minor corrections in the various handicaps as the match progresses if/when it is seen that you are off. The idea is to end the match with every program having the same score. I'm going to do that here, but I am going to start with handicaps that are already very good - so I will do a pass or two to get them as close as possible and then restart the match - after than very minor adjustments if one program pulls away or lags behind the others. I'll even document that. My autotester allows very fine control in the level of play.

It would be good if someone else repeats this procedure using some of the same programs.

This will likely take a few days. Please stand by .....

Don

Don wrote:
Adam Hair wrote:
Ajedrecista wrote: Adam said that, taking his regression as the best estimator, then Kai's method is better than mine... at least I tried it! I am happy because I started this subtopic inside the main topic and some people came with their work... I do not know if Kai already used D/[µ*(1 - µ)] before the start of this thread, but I am sure that Adam thought about his regression model in these few days! Congratulations to both of you.

Regards from Spain.

Ajedrecista.
We can credit Kirill Kryukov for applying regression methods to determining the draw rate characteristics of engines. What I have done is simply a continuation of his ideas.

It is not certain that your method does not work better for other data. I do suspect some modification of your formula or Kai's formula is needed.

Thanks for starting this, Jesús (and Don too).
Are you going to run that match? I would love to see it. Probably take you a couple of days to get the programs adjusted.

There is so much difference in strengths even in the IPON top 10 that you have to give enormous time advantages to some programs. So you might have to consider running the top programs at a pretty fast time control to have a test that does not take weeks.

Don · Post by **Don** » Thu Dec 20, 2012 3:39 pm

I think you are right that there may be a trend that the stronger programs may be less draw averse (even when time-adjusted) on average, but that needs to be studied with a really large variety of programs. It may come down to the quality of the evaluation function, regardless of the depth or search setting. But I am quite sure that there are other factors in play in the evaluation itself that could make a program more willing to lose to avoid a draw.

The other issue that makes this less interesting to me is that the contempt factor plays too big a role in how the numbers come out. In other words we are not quite measuring the "true" playing style of the engine. Houdini for example went from appearing very dynamic to begin a draw lover when I merely changed the contempt from the default 1 to 0. However setting it to zero in Houdini is still much more aggressive than most programs so it appears to me at the moment (still subject to more study) that Houdini is intrinsically a very cautious and not very dynamic program. That is masked by it's incredible strength because it is strong enough that it tends to win anyway so this may go unnoticed. Komodo uses a draw score of -7 (contempt 7) by default and many programs have it set to zero. For this study it would be ideal to set them all to zero but that is not always possible, such as in Houdini's case.

To draw any firm conclusions we need a lot more games with a bigger variety of programs.

Laskos wrote:
Adam Hair wrote:
Ajedrecista wrote: Adam said that, taking his regression as the best estimator, then Kai's method is better than mine... at least I tried it! I am happy because I started this subtopic inside the main topic and some people came with their work... I do not know if Kai already used D/[µ*(1 - µ)] before the start of this thread, but I am sure that Adam thought about his regression model in these few days! Congratulations to both of you.

Regards from Spain.

Ajedrecista.
We can credit Kirill Kryukov for applying regression methods to determining the draw rate characteristics of engines. What I have done is simply a continuation of his ideas.

It is not certain that your method does not work better for other data. I do suspect some modification of your formula or Kai's formula is needed.

Thanks for starting this, Jesús (and Don too).
To both Jesus and Adam:

One thing bothers me in the regression is that it takes into account that the strength correlates with draw rate. For round-robin it simply means that it correlates with score (µ). I am not sure we have to account for this, or this is a property of engines, and it simply means that stronger engines are less draw averse.

I adapted my recipe D/(µ(1-µ)) to take account of this correlation (though I am not sure it's needed) to D/((1-µ)*µ^1.10). I will call this "Kai modified", for taking into account the strength of the engines.

The IPON table now looks:
Code: Select all
Name                     Score      D       D_max     k       k*u*&#40;1-u&#41;  D/&#40;u*&#40;1-u&#41;) Draw dev.  D/&#40;u^1.10*&#40;1-u&#41;)
Deep Junior 13.3         39.00%   34.00%   78.00%    0.4359   0.1037      1.429      -5.21%      1.57         
Houdini 3 STD            82.00%   24.00%   36.00%    0.6667   0.0984      1.626      -2.05%      1.66
Gull 1.2                 45.00%   39.00%   90.00%    0.4333   0.1073      1.576      -1.93%      1.71
Quazar 0.4               36.00%   37.00%   72.00%    0.5139   0.1184      1.606      -1.07%      1.78
HIARCS 14 WCSC 32b       48.00%   40.00%   96.00%    0.4167   0.1040      1.603      -1.06%      1.73
Komodo 5                 73.00%   34.00%   54.00%    0.6296   0.1241      1.725      -0.19%      1.78
Protector 1.4.0          39.00%   39.00%   78.00%    0.5      0.1190      1.639      -0.16%      1.80
Deep Shredder 12         45.00%   40.00%   90.00%    0.4444   0.1100      1.616      -0.08%      1.75
Deep Fritz 13 32b        51.00%   40.00%   98.00%    0.4082   0.1020      1.601      -0.07%      1.71
Hannibal 1.2             45.00%   40.00%   90.00%    0.4444   0.1100      1.616      -0.04%      1.75
spark-1.0                41.00%   39.00%   82.00%    0.4756   0.1151      1.612      -0.03%      1.76
Zappa Mexico II          32.00%   35.00%   64.00%    0.5469   0.1190      1.608       0.00%      1.80
Critter 1.4a             71.00%   37.00%   58.00%    0.6379   0.1314      1.797       0.10%      1.86
Spike 1.4 32b            42.00%   40.00%   84.00%    0.4762   0.1160      1.642       0.24%      1.79
Deep Sjeng c't 2010 32b  43.00%   41.00%   86.00%    0.4767   0.1169      1.673       0.45%      1.82
Naum 4.2                 50.00%   42.00%   100.00%   0.42     0.1050      1.680       0.63%      1.80
MinkoChess 1.3           31.00%   36.00%   62.00%    0.5806   0.1242      1.683       0.69%      1.89
Chiron 1.5               52.00%   42.00%   96.00%    0.4375   0.1092      1.683       0.87%      1.80
Stockfish 2.2.2 JA       69.00%   40.00%   62.00%    0.6452   0.1380      1.870       1.68%      1.94
Deep Rybka 4.1           68.00%   40.00%   64.00%    0.625    0.1360      1.838       2.41%      1.91
Now I give correlations:

Correlation[Adam, Jesus] = 0.61
Correlation[Adam, Kai] = 0.81
Correlation[Adam, Kai modified] = 0.89
Correlation[Jesus, Kai] = 0.79

So, my "strength adjusted" correlates well with the regression (0.89), but again, I don't know if we have to account for absolute strength, and not just for relative scores.

Kai

Don · Post by **Don** » Thu Dec 20, 2012 5:30 pm

For my new study I am going to use these programs - here is my first attempt at adjusting them. I just restarted this with my second attempt, basically trying to get Toga into the game and beefing up Houdini and Stockfish. I want to get pretty close before I actually start the official run. So it's likely to be tomorrow before I even get the numbers rights as I have to run more games that this once I close in.

Code: Select all

Rank    ELO     +/-    Games    Score  Player
---- ------- ------ -------- --------  ----------------------------
   1  3020.8   67.9       75   59.333  hiarcs14     
   2  3012.1   67.9       75   58.000  spark1-0     
   3  3007.2   66.6       78   57.051  spike14      
   4  3000.0   65.7       80   56.250  kdev-4518.00 
   5  2982.0   67.9       75   53.333  Ivanhoe9.47b 
   6  2969.2   67.9       75   51.333  c16          
   7  2924.1   65.7       80   44.375  sf23         
   8  2916.8   68.4       74   43.243  Houdini3     
   9  2798.9   68.4       74   26.351  TogaII

Laskos · Post by **Laskos** » Thu Dec 20, 2012 7:03 pm

Don wrote:For my new study I am going to use these programs - here is my first attempt at adjusting them. I just restarted this with my second attempt, basically trying to get Toga into the game and beefing up Houdini and Stockfish. I want to get pretty close before I actually start the official run. So it's likely to be tomorrow before I even get the numbers rights as I have to run more games that this once I close in.
Code: Select all
Rank    ELO     +/-    Games    Score  Player
---- ------- ------ -------- --------  ----------------------------
   1  3020.8   67.9       75   59.333  hiarcs14     
   2  3012.1   67.9       75   58.000  spark1-0     
   3  3007.2   66.6       78   57.051  spike14      
   4  3000.0   65.7       80   56.250  kdev-4518.00 
   5  2982.0   67.9       75   53.333  Ivanhoe9.47b 
   6  2969.2   67.9       75   51.333  c16          
   7  2924.1   65.7       80   44.375  sf23         
   8  2916.8   68.4       74   43.243  Houdini3     
   9  2798.9   68.4       74   26.351  TogaII       

Do you have Junior? It comes close to first in draw-averseness in several tests, including mine.

Don · Post by **Don** » Thu Dec 20, 2012 7:10 pm

Laskos wrote:
Don wrote:For my new study I am going to use these programs - here is my first attempt at adjusting them. I just restarted this with my second attempt, basically trying to get Toga into the game and beefing up Houdini and Stockfish. I want to get pretty close before I actually start the official run. So it's likely to be tomorrow before I even get the numbers rights as I have to run more games that this once I close in.
Code: Select all
Rank    ELO     +/-    Games    Score  Player
---- ------- ------ -------- --------  ----------------------------
   1  3020.8   67.9       75   59.333  hiarcs14     
   2  3012.1   67.9       75   58.000  spark1-0     
   3  3007.2   66.6       78   57.051  spike14      
   4  3000.0   65.7       80   56.250  kdev-4518.00 
   5  2982.0   67.9       75   53.333  Ivanhoe9.47b 
   6  2969.2   67.9       75   51.333  c16          
   7  2924.1   65.7       80   44.375  sf23         
   8  2916.8   68.4       74   43.243  Houdini3     
   9  2798.9   68.4       74   26.351  TogaII       
Do you have Junior? It comes close to first in draw-averseness in several tests, including mine.

I wish I did, but I don't. I had this idea that it would be least draw-averse but already many of my misconception have been shattered.

Adam Hair · Post by **Adam Hair** » Fri Dec 21, 2012 2:11 am

Don wrote:
Adam Hair wrote:
Ajedrecista wrote: Adam said that, taking his regression as the best estimator, then Kai's method is better than mine... at least I tried it! I am happy because I started this subtopic inside the main topic and some people came with their work... I do not know if Kai already used D/[µ*(1 - µ)] before the start of this thread, but I am sure that Adam thought about his regression model in these few days! Congratulations to both of you.

Regards from Spain.

Ajedrecista.
We can credit Kirill Kryukov for applying regression methods to determining the draw rate characteristics of engines. What I have done is simply a continuation of his ideas.

It is not certain that your method does not work better for other data. I do suspect some modification of your formula or Kai's formula is needed.

Thanks for starting this, Jesús (and Don too).
Are you going to run that match? I would love to see it. Probably take you a couple of days to get the programs adjusted.

There is so much difference in strengths even in the IPON top 10 that you have to give enormous time advantages to some programs. So you might have to consider running the top programs at a pretty fast time control to have a test that does not take weeks.

I will probably start tomorrow night.

Adam Hair · Post by **Adam Hair** » Fri Dec 21, 2012 4:28 am

Laskos wrote:
Adam Hair wrote:
Ajedrecista wrote: Adam said that, taking his regression as the best estimator, then Kai's method is better than mine... at least I tried it! I am happy because I started this subtopic inside the main topic and some people came with their work... I do not know if Kai already used D/[µ*(1 - µ)] before the start of this thread, but I am sure that Adam thought about his regression model in these few days! Congratulations to both of you.

Regards from Spain.

Ajedrecista.
We can credit Kirill Kryukov for applying regression methods to determining the draw rate characteristics of engines. What I have done is simply a continuation of his ideas.

It is not certain that your method does not work better for other data. I do suspect some modification of your formula or Kai's formula is needed.

Thanks for starting this, Jesús (and Don too).
To both Jesus and Adam:

One thing bothers me in the regression is that it takes into account that the strength correlates with draw rate. For round-robin it simply means that it correlates with score (µ). I am not sure we have to account for this, or this is a property of engines, and it simply means that stronger engines are less draw averse.

I adapted my recipe D/(µ(1-µ)) to take account of this correlation (though I am not sure it's needed) to D/((1-µ)*µ^1.10). I will call this "Kai modified", for taking into account the strength of the engines.

The IPON table now looks:
Code: Select all
Name                     Score      D       D_max     k       k*u*&#40;1-u&#41;  D/&#40;u*&#40;1-u&#41;) Draw dev.  D/&#40;u^1.10*&#40;1-u&#41;)
Deep Junior 13.3         39.00%   34.00%   78.00%    0.4359   0.1037      1.429      -5.21%      1.57         
Houdini 3 STD            82.00%   24.00%   36.00%    0.6667   0.0984      1.626      -2.05%      1.66
Gull 1.2                 45.00%   39.00%   90.00%    0.4333   0.1073      1.576      -1.93%      1.71
Quazar 0.4               36.00%   37.00%   72.00%    0.5139   0.1184      1.606      -1.07%      1.78
HIARCS 14 WCSC 32b       48.00%   40.00%   96.00%    0.4167   0.1040      1.603      -1.06%      1.73
Komodo 5                 73.00%   34.00%   54.00%    0.6296   0.1241      1.725      -0.19%      1.78
Protector 1.4.0          39.00%   39.00%   78.00%    0.5      0.1190      1.639      -0.16%      1.80
Deep Shredder 12         45.00%   40.00%   90.00%    0.4444   0.1100      1.616      -0.08%      1.75
Deep Fritz 13 32b        51.00%   40.00%   98.00%    0.4082   0.1020      1.601      -0.07%      1.71
Hannibal 1.2             45.00%   40.00%   90.00%    0.4444   0.1100      1.616      -0.04%      1.75
spark-1.0                41.00%   39.00%   82.00%    0.4756   0.1151      1.612      -0.03%      1.76
Zappa Mexico II          32.00%   35.00%   64.00%    0.5469   0.1190      1.608       0.00%      1.80
Critter 1.4a             71.00%   37.00%   58.00%    0.6379   0.1314      1.797       0.10%      1.86
Spike 1.4 32b            42.00%   40.00%   84.00%    0.4762   0.1160      1.642       0.24%      1.79
Deep Sjeng c't 2010 32b  43.00%   41.00%   86.00%    0.4767   0.1169      1.673       0.45%      1.82
Naum 4.2                 50.00%   42.00%   100.00%   0.42     0.1050      1.680       0.63%      1.80
MinkoChess 1.3           31.00%   36.00%   62.00%    0.5806   0.1242      1.683       0.69%      1.89
Chiron 1.5               52.00%   42.00%   96.00%    0.4375   0.1092      1.683       0.87%      1.80
Stockfish 2.2.2 JA       69.00%   40.00%   62.00%    0.6452   0.1380      1.870       1.68%      1.94
Deep Rybka 4.1           68.00%   40.00%   64.00%    0.625    0.1360      1.838       2.41%      1.91
Now I give correlations:

Correlation[Adam, Jesus] = 0.61
Correlation[Adam, Kai] = 0.81
Correlation[Adam, Kai modified] = 0.89
Correlation[Jesus, Kai] = 0.79

So, my "strength adjusted" correlates well with the regression (0.89), but again, I don't know if we have to account for absolute strength, and not just for relative scores.

Kai

Let me describe in more detail how I derived my numbers. First, I found a regression model for draw rate versus Elo difference. Then, I found the average deviation for each engine from the regression model over all the matches it played. I threw out 3 outliers. Then, I plotted draw deviation versus engine Elo. This is what I got:

The draw deviation numbers would seem to indicate that the stronger engines are less draw averse. However, we expect the weaker engines, when given longer time controls, to have more draws. Leaving out any consideration of style, the stronger engines can in some sense be considered the equivalent of weaker engines with longer thinking times. This leads me to believe that engine Elo is positively correlated to draw rate.

I proceeded to throw out the outliers, determined a linear model for draw deviation vs Elo, and used that to correct the draw deviation for each engine.

I believe that the time adjusted data will prove whether or not I am right. The same engines that were found to be less draw averse (or more draw averse) from the Elo corrected data should fall into those same categories in the time adjusted test.

Don · Post by **Don** » Fri Dec 21, 2012 3:33 pm

Adam Hair wrote: I will probably start tomorrow night.

I finally got my players time-adjusted. I think I am pretty close.

There is great disparity between the top programs and the weakest, Houdini 3 is adjusted to 6.5 seconds whereas Toga II 2.0 is the weakest program requiring 124 seconds.

By the way, an interesting side issue here:

If anyone thinks progress in computer chess is mostly about hardware, I think this disproves that as Toga II is stronger than the original Fruit program it is based on, which amazed everyone when it was released. That is about a 20 to 1 handicap. This takes us back about 6.5 years and I don't think hardware was 20 times slower then.

program style, risk aversion

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi

Re: My numeric method for determine draw trends of each engi