Tool for automatic black-box parameter optimization released

Rémi Coulom · Post by **Rémi Coulom** » Sun Jun 20, 2010 2:19 am

Hi,

A new version of my QLR tool is available. Download and screenshots are there:
http://remi.coulom.free.fr/QLR/
I first made a version that only works for Go (Win/Loss), but the only feedback I got from the computer-go mailing list was from chess programmers who tried to apply it to their programs. So now you can use the "chess" version of QLR. I thank Gian-Carlo in particular, for his enthusiastic beta testing.

If you have to optimize parameters, this tool should be much more efficient than manual optimization with bayeselo. Especially if you have a few correlated parameters. QLR should work very well up to a dozen parameters (not many more yet).

QLR is a bit complicated to set up, because you have to write a script for running games with values passed as parameters. In the future, I'll probably provide a ready-made script for xboard programs.

Rémi

Edmund · Post by **Edmund** » Sun Jun 20, 2010 8:26 am

Thats a great tool, thank you for this contribution!

I just started a sample run of 1300 games using your sample program, with its highest probability to win at x=0.3456789, y=0.3456789

Though I removed the sleep-call to speedup the process. My first observation was that it still just plays around 1 to 2 games per second. Now I am wondering is this due to my old computer, the slow startup of the sample-program which needs restarting after every game or the complexity of the algorithm? Should point 2 be decisive, maybe in a next version of the tool it would be possible to chance the protocol, to keep the communication alive from test to test. Should 3 be the reason, I would be very much interested in the complexity of the algorithms involved. I don't know much about Quadratic Logistic Regression, but your pdf slides on the side were very interesting.

Next, in Chess engine testing, I (and I would imagine many others too) use multiple opponents for testing. Does this change anything on the layout of the test?

Furthermore, I can post the output after my 1300 games sample run. Would you mind commenting a little on the information gained out of the given variables. (I rounded the values to 2 decimal places)

Code: Select all

Hessian&#58;
    p1      p2
p1  -2.16   1.92
p2  1.92    -27.77


Eigenvektor&#58;
            1       2
Eigenvalue  -27.91  -2.01

p1          -0.07   0.997
p2          0.997   0.07

Statistics&#58;
            Mean    Max
p1          0.32    0.31
p2          0.34    0.34
95% UCB     44.29   44.51
Elo         9.68    9.88
95% LCB     -24.93  -24.75

TotalWeight = 815.11

Look · Post by **Look** » Sun Jun 20, 2010 9:01 am

Hi,

Can someone provide clear instructions on how to use this for a UCI engine like StockFish?

Edmund · Post by **Edmund** » Sun Jun 20, 2010 9:18 am

Look wrote:Hi,

Can someone provide clear instructions on how to use this for a UCI engine like StockFish?

If you first read the "Readme.txt", then the "DummyExperiment.qlr" and finally the "DummyScript.py", you should have a pretty good idea of what is going on.

Readme wrote:First step: connection script
-----------------------------
The first step to connect a program to QLR consists in writing a connection
script.

Rémi Coulom · Post by **Rémi Coulom** » Sun Jun 20, 2010 11:31 am

Edmund wrote:Thats a great tool, thank you for this contribution!

I just started a sample run of 1300 games using your sample program, with its highest probability to win at x=0.3456789, y=0.3456789

Though I removed the sleep-call to speedup the process. My first observation was that it still just plays around 1 to 2 games per second. Now I am wondering is this due to my old computer, the slow startup of the sample-program which needs restarting after every game or the complexity of the algorithm? Should point 2 be decisive, maybe in a next version of the tool it would be possible to chance the protocol, to keep the communication alive from test to test. Should 3 be the reason, I would be very much interested in the complexity of the algorithms involved. I don't know much about Quadratic Logistic Regression, but your pdf slides on the side were very interesting.

I just ran DummyExperiment.qlr in linux with my old PIV, and 1300 games were played in 80 seconds after I commented out the "sleep". So I am a bit surprised that you are so slow. QLR also has a console version that is faster because it does not have to update the GUI after each sample. The console version ran 1300 samples in 40 seconds. Most of this time is due to the overhead of running the python interpreter for each game. Running 1300 samples of an artificial problem (with game results generated internally) takes 0.1 seconds.

Collecting more than one sample in one run is on the TODO list.

Next, in Chess engine testing, I (and I would imagine many others too) use multiple opponents for testing. Does this change anything on the layout of the test?

In the current version, you can use the "Replications" parameter to run one game for each opponent. If opponents are of similar strength (say, within 100 Elo of each other), this should work very well. In the future, I will let QLR estimate the strength of each replication, so that it works well even against opponents of very different strengths.

Code: Select all

Hessian&#58;
    p1      p2
p1  -2.16   1.92
p2  1.92    -27.77

This is the matrix of second-order derivatives of the regression. Negative values on the diagonal indicate that the quadratic regression has a maximum in that direction. Colors in the matrix display indicate confidence in the sign of the second-order derivative. You want the diagonal to be blue. The second derivative for variable p2 has a larger magnitude. That is because the effect of p2 on strength is more important. This Hessian matrix is normalized, that is to say each parameter is scaled to take values between -1 and 1.

Off-diagonal values indicate correlation. p1 and p2 are not strongly correlated.

Code: Select all

Eigenvektor&#58;
            1       2
Eigenvalue  -27.91  -2.01

p1          -0.07   0.997
p2          0.997   0.07

A more meaningful representation of the Hessian matrix is made of its eigenvectors and eigenvalues. When all the eigenvalues are negative, the quadratic regression has a maximum. Otherwise, it does not.

Code: Select all

Statistics&#58;
            Mean    Max
p1          0.32    0.31
p2          0.34    0.34
95% UCB     44.29   44.51
Elo         9.68    9.88
95% LCB     -24.93  -24.75

TotalWeight = 815.11

UCB and LCB are "Upper Confidence Bound" and "Lower Confidence Bound". "Mean" is the weighted sample mean. It is the most robust estimate of optimal parameters. "Max" is the maximum of the quadratic regression. "Max" may be unavailable or unstable when the Hessian matrix is indefinite or close to indefinite.

Rémi

Look · Post by **Look** » Sun Jun 20, 2010 2:38 pm

Edmund wrote:
Look wrote:Hi,

Can someone provide clear instructions on how to use this for a UCI engine like StockFish?
If you first read the "Readme.txt", then the "DummyExperiment.qlr" and finally the "DummyScript.py", you should have a pretty good idea of what is going on.

Readme wrote:First step: connection script
-----------------------------
The first step to connect a program to QLR consists in writing a connection
script.

Thanks, but I still do not know how to approach this. Point is I am not expert on GUI, and script demanded here seems to be like a GUI(or UI). It seems to me that there could be a single project with two engines and UI all compiled in one binary file where UI is changed so that it makes engines play and report the result. Another approach seems that one can have UI and engines independently compiled, and UI communicate with engines via their UCI options. Maybe it makes sense to have specialized UI and have engines with options prefixed with something like QLR_option .This way it could be better cause authors do not need their specialized UI and they can use this general UI for their work. ideas?

marcelk · Post by **marcelk** » Sun Jun 20, 2010 2:58 pm

Rémi Coulom wrote:Hi,

QLR should work very well up to a dozen parameters (not many more yet).

Rémi

Hello Rémi,

It looks interesting. Two questions immediately arise:

1. What is holding it back in scaling up to more parameters?

2. Do you have measurements that can give an indication of the effect that applying QLR has on existing engines? For example, can we see and judge your or your testers' results?

Edmund · Post by **Edmund** » Sun Jun 20, 2010 2:58 pm

Look wrote:
Edmund wrote:
Look wrote:Hi,

Can someone provide clear instructions on how to use this for a UCI engine like StockFish?
If you first read the "Readme.txt", then the "DummyExperiment.qlr" and finally the "DummyScript.py", you should have a pretty good idea of what is going on.

Readme wrote:First step: connection script
-----------------------------
The first step to connect a program to QLR consists in writing a connection
script.
Thanks, but I still do not know how to approach this. Point is I am not expert on GUI, and script demanded here seems to be like a GUI(or UI). It seems to me that there could be a single project with two engines and UI all compiled in one binary file where UI is changed so that it makes engines play and report the result. Another approach seems that one can have UI and engines independently compiled, and UI communicate with engines via their UCI options. Maybe it makes sense to have specialized UI and have engines with options prefixed with something like QLR_option .This way it could be better cause authors do not need their specialized UI and they can use this general UI for their work. ideas?

Probably the quickest solution would be to take an existing UI like Winboard that handles the engine match.
So what is needed is a script that calls winboard (with the match settings as parameters) and finally parses the output pgn file created for the outcome of the match.

Rémi Coulom · Post by **Rémi Coulom** » Sun Jun 20, 2010 3:26 pm

marcelk wrote:1. What is holding it back in scaling up to more parameters?

A quadratic regression has n*(n+1) parameters. The method for optimizing parameters has a cost that is the cube of this number. So, the algorithmic cost is O(n^6). This problem could be solved in the future with more clever numerical methods.

Also, I don't do any kind of clever regularization, so really many samples would be necessary to estimate those n*(n+1) parameters. This also can be solved by using a stronger prior on the regression.

So, in short, there is not fundamental obstacle to making QLR work better with a high number of parameters in the future.

2. Do you have measurements that can give an indication of the effect that applying QLR has on existing engines? For example, can we see and judge your or your testers' results?

Gian-Carlo sent some interesting data to me, but I'd rather not publish it without his agreement. Maybe he'll make some comments. Most of my own experiments were with artificial problems.

The plot on the web page of QLR shows such an artificial problem:
http://remi.coulom.free.fr/QLR/
The thick blue line is the "true" probability of winning. I can measure the performance of QLR by estimating the average distance to optimal over multiple replications of the experiment.

QLR is on average 10 Elo points from optimal after 250 games, and 1 Elo point from optimal after 5000 games.

A magic aspect of QLR is that it approaches optimal performance faster than 1/sqrt(n) when the function to be maximized is smooth. I'll write a paper with more details.

Rémi

Daniel Shawul · Post by **Daniel Shawul** » Sun Jun 20, 2010 5:25 pm

Thanks a lot. Exactly the tool I was looking for for some time.
I haven't tried it yet but my plan is to hook it to cutechess-cli/bayeselo combo I am already using, and use a cluster to run many games in parallel.
Will this setup work? (probably GCP used it already). Thanks

Tool for automatic black-box parameter optimization released

Tool for automatic black-box parameter optimization released

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele