CLOP for Noisy Black-Box Parameter Optimization

Rémi Coulom · Post by **Rémi Coulom** » Mon Oct 17, 2011 9:56 am

Daniel Shawul wrote:
Note that, in general, you cannot use CLOP to estimate the strength of the optimised program accurately. Win rates produced by CLOP are biased. Win rate over all samples is pessimistic. Local win rate tends to be optimistic. The real win rate of optimal parameters is somewhere in-between.
Is that the reason why the elo I calculate from winningRate percentage sometimes do not match the one displayed on the gui with 95% confidence? For example I saw a 47% winning rate suggesting -21 elo, but only -5 elo is displayed. Also since the current selected parameters do not get all the games played (some early games are probably truncated), how are the mean and confidence intervals estimated ?

That is strange. This is the code to compute the numbers:

Code: Select all

/////////////////////////////////////////////////////////////////////////////
void MainWindow&#58;&#58;updateWinRate&#40;int Column, double W, double D, double L&#41;
/////////////////////////////////////////////////////////////////////////////
&#123;
 double Total = W + D + L;
 double Score = W + 0.5 * D;
 double Rate = Score / Total;

 double TotalVariance = W * &#40;1.0 - Rate&#41; * &#40;1.0 - Rate&#41; +
                        D * &#40;0.5 - Rate&#41; * &#40;0.5 - Rate&#41; +
                        L * &#40;0.0 - Rate&#41; * &#40;0.0 - Rate&#41;;
 double Margin = 1.96 * std&#58;&#58;sqrt&#40;TotalVariance&#41; / Total;

 const double EloMul = 400.0 / std&#58;&#58;log&#40;10.0&#41;;

 addTableItem&#40; 0, Column, W&#41;;
 addTableItem&#40; 1, Column, D&#41;;
 addTableItem&#40; 2, Column, L&#41;;
 addTableItem&#40; 3, Column, Total&#41;;
 addTableItem&#40; 5, Column, EloMul * pexperiment->reg.Rating&#40;Rate + Margin&#41;);
 addTableItem&#40; 6, Column, EloMul * pexperiment->reg.Rating&#40;Rate&#41;);
 addTableItem&#40; 7, Column, EloMul * pexperiment->reg.Rating&#40;Rate - Margin&#41;);
 addTableItem&#40; 9, Column, Rate + Margin&#41;;
 addTableItem&#40;10, Column, Rate&#41;;
 addTableItem&#40;11, Column, Rate - Margin&#41;;
&#125;

Code: Select all

/////////////////////////////////////////////////////////////////////////////
// Outcome model &#40;missing multiplier for draws, should do 1 - w - l?)
/////////////////////////////////////////////////////////////////////////////
double CRegression&#58;&#58;ResultProbability&#40;double Rating, COutcome outcome&#41; const
&#123;
 switch&#40;outcome&#41;
 &#123;
  case COutcome&#58;&#58;Win&#58;
   return CLogistic&#58;&#58;f&#40;Rating - DrawRating&#41;;
  case COutcome&#58;&#58;Loss&#58;
   return CLogistic&#58;&#58;f&#40;-Rating - DrawRating&#41;;
  case COutcome&#58;&#58;Draw&#58;
   return CLogistic&#58;&#58;f&#40;Rating - DrawRating&#41; *
          CLogistic&#58;&#58;f&#40;-Rating - DrawRating&#41;;
 &#125;
 return 1.0;
&#125;

/////////////////////////////////////////////////////////////////////////////
// Compute win rate for a rating
/////////////////////////////////////////////////////////////////////////////
double CRegression&#58;&#58;WinRate&#40;double Rating&#41; const
&#123;
 double W = ResultProbability&#40;Rating, COutcome&#58;&#58;Win&#41;;
 double L = ResultProbability&#40;Rating, COutcome&#58;&#58;Loss&#41;;
 return W + 0.5 * &#40;1.0 - W - L&#41;;
&#125;

/////////////////////////////////////////////////////////////////////////////
double CRegression&#58;&#58;Rating&#40;double Rate&#41; const
/////////////////////////////////////////////////////////////////////////////
&#123;
 const int Iter = 30;
 double Max = 10.0;
 double Min = -10.0;
 double MaxRate = WinRate&#40;Max&#41;;
 double MinRate = WinRate&#40;Min&#41;;

 for &#40;int i = Iter; --i >= 0;)
 &#123;
  double Middle = &#40;Max + Min&#41; * 0.5;
  double MiddleRate = WinRate&#40;Middle&#41;;
  if &#40;Rate > MiddleRate&#41;
  &#123;
   Min = Middle;
   MinRate = MiddleRate;
  &#125;
  else
  &#123;
   Max = Middle;
   MaxRate = MiddleRate;
  &#125;
 &#125;

 return &#40;Max + Min&#41; * 0.5;
&#125;

So, compared to bayeselo, I don't "scale" the ratings. This may cause a small difference compared to usual Elo, when EloDraw is not zero.

In the next version, I will scale the ratings like in bayeselo. This may avoid confusion. But the difference you observe seems bigger than it should be. Anyway, you have all the necessary code above to understand how ratings are computed.

Rémi

Rémi Coulom · Post by **Rémi Coulom** » Mon Oct 17, 2011 10:02 am

Rémi Coulom wrote: That is strange. This is the code to compute the numbers:

Now that I re-read the code, I understand it is probably buggy when EloDraw != 0, because probabilities don't sum to 1. I will fix it.

I never use EloDraw != 0.

This should be only a display bug: it should not affect the optimization algorithm itself.

Rémi

Rémi Coulom · Post by **Rémi Coulom** » Mon Oct 17, 2011 10:11 am

Rémi Coulom wrote:
Rémi Coulom wrote: That is strange. This is the code to compute the numbers:
Now that I re-read the code, I understand it is probably buggy when EloDraw != 0, because probabilities don't sum to 1. I will fix it.

I never use EloDraw != 0.

This should be only a display bug: it should not affect the optimization algorithm itself.

Rémi

OK, I re-re-read the code (including comments), and it is not a bug, sorry for the confusion.

Rémi

Daniel Shawul · Post by **Daniel Shawul** » Mon Oct 17, 2011 11:54 am

Here is a snapshot

Code: Select all

Samples = 15880
TotalWeight = 7584.93
Wins = 5725
Draws = 4024
Losses = 6131
WinningRate = 0.487217

And Elo displayed is +18 when it clearly should have been less than 0.

Code: Select all

LCB=-9 < mean=+18 < UCB=+45

I should mention that I have re-started the experiment once around the 7000 mark. And also that the winning rate has been improving steadily.
It started from a winning rate of 0.43 and reached here so maybe it is really better if we disregard the old games.

BTW I am using QLR not CLOP.
Edit: I see QLR uses a different ELO calculating than CLOP so maybe that is the problem.

Rémi Coulom · Post by **Rémi Coulom** » Mon Oct 17, 2011 1:10 pm

Daniel Shawul wrote: BTW I am using QLR not CLOP.
Edit: I see QLR uses a different ELO calculating than CLOP so maybe that is the problem.

You should definitely use CLOP instead of QLR. They use the same file format, so data obtained with QLR can be opened with CLOP, and you can continue with CLOP an experiment started with QLR. The algorithm of CLOP is better than the algorithm of QLR.

I remember I fixed some things related to Elo calculation between QLR and CLOP, but I don't remember exactly what. So it may fix your problem.

Rémi

Ferdy · Post by **Ferdy** » Sun Oct 23, 2011 9:39 pm

Rémi Coulom wrote:
Ferdy wrote: Hi Remi, from the read me below, I can not find the dummy.exe file from the CLOP-0.08.tar.bz2? Is there any CLOP compilaton for windows? Thanks.
I don't have easy access to a Windows machine, so I did not release the Windows binary, sorry. You'll have to compile everything from source.

Dummy.exe is produced by compiling that file:
CLOP-0.0.8/programs/clop/src/real/Dummy.c

Rémi

I was able to compile clop gui for for windows and have done some test, I have some questions here.

1. Is there a way for this gui to plot the number of games along x-axis and win rate along y-axis? I just want to know what happened to the win rate as we increased number of games.

2. I am doing some interaction like for example if I give queen_value 960 1040 (default 1000, delta -40/+40), and then after say around 5000 games not happy with this range and I change to 95 1005 (delta -5/+5) would this be just fine?

3. If I will use for example only 1 parameter to tune say queen_value 95 1005 (-5/+5 around default) how many games would you like to run so that you can say that the result is reliable?

4. If I run and test only for 1 starting position for example sicilian najdorf -
e2e4 c7c5 g1f3 d7d6 d2d4 c5d4 f3d4 g8f6 b1c3 a7a6. Then test it against 20 different engines. The engine to be tuned should play both colors. Then add some search and eval parameters like fulity_margin and piece mobilitites for the eval, is this also that clop can handle optimizing parameters on a certain opening?

Thank you Remi for sharing this application.

Rémi Coulom · Post by **Rémi Coulom** » Sun Oct 23, 2011 10:55 pm

Ferdy wrote: 1. Is there a way for this gui to plot the number of games along x-axis and win rate along y-axis? I just want to know what happened to the win rate as we increased number of games.

You can not do it with the gui. But you can open the data file with the command-line version, and redirect output to a file, then plot it with a tool such as gnuplot, for instance.

2. I am doing some interaction like for example if I give queen_value 960 1040 (default 1000, delta -40/+40), and then after say around 5000 games not happy with this range and I change to 95 1005 (delta -5/+5) would this be just fine?

No, there is a problem with shrinking range. I will fix it. But you should not want to do that. CLOP is good at figuring out the right range. For tuning a parameter such as Queen value, just use a wide range (100-2000, for instance). It is very likely CLOP will do better by itself than if you try to tweak the range manually.

The only good reason to manually change the range during optimization is to start with a narrow range in order to indicate a good point in parameter space, then widen the range and let CLOP do its magic.

3. If I will use for example only 1 parameter to tune say queen_value 95 1005 (-5/+5 around default) how many games would you like to run so that you can say that the result is reliable?

You should not use such a narrow range. The more games, the more accurate the result. You can get an idea of how CLOP is uncertain about the value by looking at the interval over which it takes samples. But use a wide default interval.

4. If I run and test only for 1 starting position for example sicilian najdorf -
e2e4 c7c5 g1f3 d7d6 d2d4 c5d4 f3d4 g8f6 b1c3 a7a6. Then test it against 20 different engines. The engine to be tuned should play both colors. Then add some search and eval parameters like fulity_margin and piece mobilitites for the eval, is this also that clop can handle optimizing parameters on a certain opening?

If your connection script plays games from that position only, then yes, CLOP will optimize for that position. But you should make sure there is some randomness in the experiment. If the program and opening are deterministic, it is likely to cause problems because the same game will be played many times. The usual approach is to randomize openings. You can also use opponents with a random evaluation, or randomize your own.

Rémi

Ferdy · Post by **Ferdy** » Mon Oct 24, 2011 12:47 am

You can not do it with the gui. But you can open the data file with the command-line version, and redirect output to a file, then plot it with a tool such as gnuplot, for instance.

I got this one, nice table.

No, there is a problem with shrinking range. I will fix it. But you should not want to do that. CLOP is good at figuring out the right range. For tuning a parameter such as Queen value, just use a wide range (100-2000, for instance). It is very likely CLOP will do better by itself than if you try to tweak the range manually.

The only good reason to manually change the range during optimization is to start with a narrow range in order to indicate a good point in parameter space, then widen the range and let CLOP do its magic.

I will note that start with narrow then wider.

You should not use such a narrow range. The more games, the more accurate the result. You can get an idea of how CLOP is uncertain about the value by looking at the interval over which it takes samples. But use a wide default interval.

One of my concerns here is to speed up the process, minimize playing large amount of games, and get a respectable value not necessarily very optimal. For example I am working on getting good approximate piece value of capablanca chess pieces such as the archbishop and chancellor. I am targetting to play only 5k games after that I will use what clop has got so far. Of course I will test the setting produced by clop vs my orig setting.
For normal chess I can only play around 10k games, stop the clop tunning and test clop setting vs my default. From this I got a promising result for clop tuned version, but as I increased the number of test games > 2k the advantage of clop-tuned engine is getting reduced thru the use of bayeselo. But I am satisfied with situation because clop-tuned engine is always leading. Also like a handicap to clop-tuned engine, I use clop to tune parameter vs. with engine 1, engine 2, and engine 3 only using xxx book, but when I do the one-on-one test, I use clop-tune engine vs engine 1, engine 2, engine 3, engine 4, engine 5, engine 6 using yyy book, and then test default engine vs engine 1 - 6 using also same yyy book even with this the clop-tuned engine leads as was mentioned.

Zlaire · Post by **Zlaire** » Fri Nov 04, 2011 10:42 am

After 20,000 games I got values that would result in this PSQ for king endgame:

Code: Select all

-1604 -1233  -863  -863  -863  -863 -1233 -1604
-1233  -492  -122  -122  -122  -122  -492 -1233
 -863  -122   248   248   248   248  -122  -863
 -863  -122   248   619   619   248  -122  -863
 -863  -122   248   619   619   248  -122  -863
 -863  -122   248   248   248   248  -122  -863
-1233  -492  -122  -122  -122  -122  -492 -1233
-1604 -1233  -863  -863  -863  -863 -1233 -1604

(values in centipawns, using the ampli-bias approach mentioned by Joona Kiiski, so two parameters for the table)

In total for this run I have 11 parameters, so 20,000 games isn't enough, but this is really far off? (white king in center vs black king in corner would be worth 2.5 queens...)

Also the suggested values are swinging quite wildly.

Rémi Coulom · Post by **Rémi Coulom** » Fri Nov 04, 2011 10:58 am

Zlaire wrote:After 20,000 games I got values that would result in this PSQ for king endgame:
Code: Select all
-1604 -1233  -863  -863  -863  -863 -1233 -1604
-1233  -492  -122  -122  -122  -122  -492 -1233
 -863  -122   248   248   248   248  -122  -863
 -863  -122   248   619   619   248  -122  -863
 -863  -122   248   619   619   248  -122  -863
 -863  -122   248   248   248   248  -122  -863
-1233  -492  -122  -122  -122  -122  -492 -1233
-1604 -1233  -863  -863  -863  -863 -1233 -1604
(values in centipawns, using the ampli-bias approach mentioned by Joona Kiiski, so two parameters for the table)

In total for this run I have 11 parameters, so 20,000 games isn't enough, but this is really far off? (white king in center vs black king in corner would be worth 2.5 queens...)

Also the suggested values are swinging quite wildly.

I am curious to take a look if you can send your .dat and .clop file to me.

Rémi

CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization