Tool for automatic black-box parameter optimization released

UncombedCoconut · Post by **UncombedCoconut** » Wed Jun 23, 2010 10:33 pm

Daniel Shawul wrote:P.S : I don't have admin access to install software on the cluster. Cutechess-cli required QT
too , but Illari suggested that only 2 or more of dlls are required.
But don't worry about it, I will get somebody to do the comple for console QLR for 64 bit linux.
Thanks again.

If this is under Linux, you should be fine building (or installing) the necessary libraries in your home directory and setting the LD_LIBRARY_PATH environment variable (e.g., export LD_LIBRARY_PATH=$HOME/lib/).
This is how the cutechess-cli.sh script works, in fact. (It allows the program to load cutechess's libraries without requiring you to install them.)

Rémi Coulom · Post by **Rémi Coulom** » Wed Jun 23, 2010 10:39 pm

Daniel Shawul wrote:Evolutionary methods for optimization have been discussed here before so my first thought
was that yours was a GA. These can indeed find a global optima despite the users initial guesses
for the parameters. It takes a lot more objective function evaluations (games) though.
I dont know how important this is to the topic of chess evaluation terms as we have already have
a "good" estimate of the parameters before the experiment anyway..

If anybody has ever observed a local optimum when optimizing parameters, I'd be curious to know. I never observed any. But it is very probable that some might exist.

In fact, I implemented plenty of algorithms to compare their performances. Local quadratic regression performs better than the others for any function that is smooth and flat at the maximum. The second best algorithm in my tests is the cross-entropy method (CEM). CEM is a population-based method, a bit similar to a genetic algorithm.

For newton-raphson I recall that computation of the hessian matrix can be done in O(n^2 + n/2).
To decrease the cost of more function evaluations, we can use quasi-newton methods like BFGS.
Also We can use radial basis functions to approximate the response and determine optimal values
with much less number of samples. The reason for my question was that I had set a limit of n^2
samples per iteration for the methods I am familiar with.. I now understand that your method
makes many function calls per iteration but converges faster than the rest overall, which is the final goal.

In fact, I compute the full inverse of the Hessian, with a Cholesky decomposition. The full inverse is not required for Newton-Raphson, but some other algorithms I implemented require it. With the full inverse, it is easy to generate random samples according to the posterior distribution, or estimate variance.

Rémi

marcelk · Post by **marcelk** » Wed Jun 23, 2010 11:20 pm

Rémi Coulom wrote: If anybody has ever observed a local optimum when optimizing parameters, I'd be curious to know.
Rémi

I have seen them in 1-parameter hill climbing: Sometimes one must change two parameters simultaneously to continue climbing.

The Deep Thought team has also reported having such local optima in ICCA Dec. 1997. That's why they added linear combinations of parameters to the process.

I would expect that the impact of local optima reduces when increasing the number of parameters that participate in the local search.

Rémi Coulom · Post by **Rémi Coulom** » Thu Jun 24, 2010 10:41 am

marcelk wrote:I have seen them in 1-parameter hill climbing: Sometimes one must change two parameters simultaneously to continue climbing.

Can you tell for which parameter you observed a local optimum?

The Deep Thought team has also reported having such local optima in ICCA Dec. 1997. That's why they added linear combinations of parameters to the process.

IIRC, they used supervised learning from game records. The function they optimize is deterministic, and discrete. I am not surprised they may have local optima, when they get close to the maximum.

This is a very different approach from black-box optimization of winning rate. I would like to know a parameter that has local optima of playing strength.

Rémi

Don · Post by **Don** » Thu Jun 24, 2010 3:08 pm

Rémi Coulom wrote:
marcelk wrote:I have seen them in 1-parameter hill climbing: Sometimes one must change two parameters simultaneously to continue climbing.
Can you tell for which parameter you observed a local optimum?

The Deep Thought team has also reported having such local optima in ICCA Dec. 1997. That's why they added linear combinations of parameters to the process.
IIRC, they used supervised learning from game records. The function they optimize is deterministic, and discrete. I am not surprised they may have local optima, when they get close to the maximum.

This is a very different approach from black-box optimization of winning rate. I would like to know a parameter that has local optima of playing strength.

Rémi

I have to assume local optima exist. There are many terms in chess that must be tuned in combination, for instance the bishop and knight values must be properly balanced. It's more important to get them correct in relation to each other than to get the correct average precisely. There is not that much different ELO-wise between something close to 3 pawns and something close to 4 pawns for the average value but there is a lot of difference if you choose 3.6 for the knight and 3.0 for the bishop.

It's not clear to me however if this would prevent individual optimizations of terms - for instance if both values are too high but the relation between them is more or less correct it may still be an optimization to lower just one of them (and the other will follow.) Would that be considered a local hill if it were will still possible to make progress without changing them both at the same time?

Rémi Coulom · Post by **Rémi Coulom** » Thu Jun 24, 2010 4:00 pm

Don wrote:I have to assume local optima exist. There are many terms in chess that must be tuned in combination, for instance the bishop and knight values must be properly balanced. It's more important to get them correct in relation to each other than to get the correct average precisely. There is not that much different ELO-wise between something close to 3 pawns and something close to 4 pawns for the average value but there is a lot of difference if you choose 3.6 for the knight and 3.0 for the bishop.

It's not clear to me however if this would prevent individual optimizations of terms - for instance if both values are too high but the relation between them is more or less correct it may still be an optimization to lower just one of them (and the other will follow.) Would that be considered a local hill if it were will still possible to make progress without changing them both at the same time?

What you describe is not a local optimum, but correlation. There is indeed a very strong correlation between the values of knight and bishop. QLR is particularly good at figuring out that correlation, and will have no problem with it.

QLR may fail if the performance has two bumps like this:

Note that QLR is still very likely to find the right maximum with that kind of function, but it has a non-zero probability to get stuck in the suboptimal bump.

Rémi

Look · Post by **Look** » Thu Jun 24, 2010 5:40 pm

In fact, I implemented plenty of algorithms to compare their performances. Local quadratic regression performs better than the others for any function that is smooth and flat at the maximum. The second best algorithm in my tests is the cross-entropy method (CEM). CEM is a population-based method, a bit similar to a genetic algorithm.

Feature Request: Could you include CEM in next version too? That would give user an option to try both methods.

Zach Wegner · Post by **Zach Wegner** » Thu Jun 24, 2010 5:49 pm

User request: could you actually look at the code, or even just the scripts? CEM is already included.

Rémi Coulom · Post by **Rémi Coulom** » Thu Jun 24, 2010 6:05 pm

Zach Wegner wrote:User request: could you actually look at the code, or even just the scripts? CEM is already included.

Other methods included in QLR will in fact not work with chess data. They only work for binary outcome. Also, CEM does not work in parallel (with more than one processor). I'll think about providing the option later.

Right now, there are more important problems. Some users reported to me that QLR may fail on some data. This usually happens when a parameter has a small influence on playing strength. The regression may then become positive, and QLR will only sample at both extremities of the parameter range. I have to fix that first.

If you try QLR and observe something strange, please send your data to me, I am curious to take a look. Well, even if you try QLR and it works well for you, I would be glad to hear about it.

Rémi

Look · Post by **Look** » Thu Jun 24, 2010 6:46 pm

If you try QLR and observe something strange, please send your data to me, I am curious to take a look. Well, even if you try QLR and it works well for you, I would be glad to hear about it.

Rémi

Ok since you ask, current situation is that, I am playing around with it, should be promising, unfortunately problem is time control, I did not have any obvious success with something like 1sec TC for games. Now trying longer TCs. Also there are many details to be taken into consideration, basically IMO all stuff needs sufficient trying and experiments.

Tool for automatic black-box parameter optimization released

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele