Tool for automatic black-box parameter optimization released

Daniel Shawul · Post by **Daniel Shawul** » Mon Apr 11, 2011 3:54 am

I can email it to you if you want.

mathmoi · Post by **mathmoi** » Mon Apr 11, 2011 3:57 am

Bonjour Rémi,

It would be much appreciated.

My email is mathieu@mathieupage.com

Thanks

Daniel Shawul · Post by **Daniel Shawul** » Mon Apr 11, 2011 4:07 am

Dammit Mathieu, please read poster's name

I had the same problem once but fortunately found them stashed on
a dirty harddrive. Hope I am not 'infringing' copyright or anything .
Enjoy

IQ · Post by IQ » Mon Apr 11, 2011 7:57 am

Hi,

i just glanced over the presentation pdf and wonder why you do not use a full Markov Chain Monte Carlo Approach as you already stated your problem in a Baysian context. Advantages might be:
a) much less computational burden (no hessian, hill climbing etc.)
b) more parsimonious model as you already use a sigma-sampler and give the posterior
c) convergence is faster especially for large parameter spaces and is robust to identification problems
d) its easy to extend the model to incorporate all kind of dependencies/correlations of the parameter space

The key insight is that a mcmc chain converges to the full posterior distribution, so that calculation expected values ist real easy -> and the bernstein van-mieses theorem ensures that the expected values converge to the maximum liklihood if n is large enough. As the "largeness" of n is purely technical -> you can just replicate the existing data multiple times to get this kind of convergence with the net result usually still being faster than many other optimization methods in case of large parameter spaces, especially with interdependencies/correlations.

Did you investigate this? Or are there some reasons against this, that i am missing?

Rémi Coulom · Post by **Rémi Coulom** » Mon Apr 11, 2011 10:36 am

mathmoi wrote:Sorry for the thread necromancy, but is this tool still avaible somewhere?

Thanks

I became a commercial programmer recently, so I decided to not give away my secret weapons any more. Still the previous version was GPL, and I am sure a few members of this forum have a copy. I don't mind if they distribute it.

Rémi

Rémi Coulom · Post by **Rémi Coulom** » Mon Apr 11, 2011 10:50 am

IQ wrote:Hi,

i just glanced over the presentation pdf and wonder why you do not use a full Markov Chain Monte Carlo Approach as you already stated your problem in a Baysian context. Advantages might be:
a) much less computational burden (no hessian, hill climbing etc.)
b) more parsimonious model as you already use a sigma-sampler and give the posterior
c) convergence is faster especially for large parameter spaces and is robust to identification problems
d) its easy to extend the model to incorporate all kind of dependencies/correlations of the parameter space

The key insight is that a mcmc chain converges to the full posterior distribution, so that calculation expected values ist real easy -> and the bernstein van-mieses theorem ensures that the expected values converge to the maximum liklihood if n is large enough. As the "largeness" of n is purely technical -> you can just replicate the existing data multiple times to get this kind of convergence with the net result usually still being faster than many other optimization methods in case of large parameter spaces, especially with interdependencies/correlations.

Did you investigate this? Or are there some reasons against this, that i am missing?

Hi Roberto,

My method computes the maximum of the posterior with gradient descent (conjugate gradient, in fact), and it works very well. QLR also has an MCMC sampler for the posterior. Efficiency of MCMC sampling algorithms depend on having a reasonable approximation of the posterior distribution. By computing the maximum and the Hessian at the maximum I can get a good Gaussian approximation that is then used by MCMC. In fact, the posterior samples I plotted in those slides were obtained by MCMC IIRC.

But estimating the posterior distribution is not the main problem at all in this kind of algorithm. The main problem is converging to parameters that produce the optimal winning rate. It is different. Each sample of the posterior is a quadratic form. Most of these sample don't even have a maximum. This is what makes the problem difficult.

The last version of QLR I made public had convergence problems. It is still very usable, and should behave well in most simple cases. If you wish to optimize just one parameter that has a strong influence on playing strength, then there should be no problem.

Rémi

Daniel Shawul · Post by **Daniel Shawul** » Mon Apr 11, 2011 1:06 pm

Hi Remi
My apologies, I thought it was offline for other reasons. Ofcourse I won't give it to anyone anymore.

FlavusSnow · Post by **FlavusSnow** » Mon Apr 11, 2011 5:15 pm

I'll first comment that I wasn't able to get a copy before it fell from the face of the earth

But regardless, a lot of people were instantly trying to apply this to evaluation parameters. Couldn't this also be applied specifically to search parameters? Essentially parameters whose sole purpose were to determine when to extend or reduce, or when to apply a certain pruning method or quiescent search. Including the number of ply to extend or reduce, etc.

Eval is certainly important, but if past history is any indication, then large ELO breakthroughs are usually found in the search.

Rémi Coulom · Post by **Rémi Coulom** » Mon Apr 11, 2011 5:34 pm

Daniel Shawul wrote:Hi Remi
My apologies, I thought it was offline for other reasons. Ofcourse I won't give it to anyone anymore.

Don't worry. No problem with that. I distributed it under GPL, so in fact anybody could re-distribute it with or without modifications. I simply won't distribute my new version.

Rémi

IQ · Post by IQ » Tue Apr 12, 2011 10:16 am

Rémi Coulom wrote: My method computes the maximum of the posterior with gradient descent (conjugate gradient, in fact), and it works very well. QLR also has an MCMC sampler for the posterior. Efficiency of MCMC sampling algorithms depend on having a reasonable approximation of the posterior distribution. By computing the maximum and the Hessian at the maximum I can get a good Gaussian approximation that is then used by MCMC. In fact, the posterior samples I plotted in those slides were obtained by MCMC IIRC.
i

This is the point i was talking about. You already use MCMC, but only as a second step after you approximate with a gaussian. I suggested to skip this approximation step and integrate the step 1 maximization into a full bayesian framework. The fact that the quadratic forms sometimes do not have a maximum, would make it even more suited, as f.e. a metropolis hastings algorithm have a higher accept probability in regions with more probability mass. Usually this approximates the posterior and makes it easy to calculate expectations -> but as n gets larger it converges to maximum liklihood. This would get rid of the intermediate maximization step, in a robust way. And if n is not large enough, you could just replicate the data. But maybe its difficult to find a usefull prior for the quadratic forms?

Tool for automatic black-box parameter optimization released

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele