Tool for automatic black-box parameter optimization released

mcostalba · Post by **mcostalba** » Thu Jun 24, 2010 7:56 pm

Rémi Coulom wrote:This usually happens when a parameter has a small influence on playing strength.

This is absolutely the common case, it is much more rare the contrary. It is very easy to stumble upon parameters that give almost zero contribution to playing strenght.

In real world, chess evaluation functions are, for the most part, an unnecessary mess. They have grown like that for historycal reasons, because authors wrote the variuos sub-parts in a whole, without adding one parameter at a time. This is not necessarly bad design, is not always possible to distill one drop after the other while writing, so you have a sub-part, say king safety, that taken as an whole it works, but it doesn't mean that all its parameters have the same weight, some works, some have little impact, many have no measurable impact.

I would suggest that if your method has to be applied to real world engines it is a mandatory property to be robust to "no-effect" parameters because, as I have said, they are the norm rather then the exception.

Don · Post by **Don** » Thu Jun 24, 2010 8:08 pm

I suspect that Komodo has a few positional terms that contribute nothing to the strength of the program.

That's because we attempt to keep changes that add 1-3 ELO and of course it's very difficult to accurately determine that as it would take over 100,000 to know with very much certainty when the change is so small.

mcostalba wrote:
Rémi Coulom wrote:This usually happens when a parameter has a small influence on playing strength.
This is absolutely the common case, it is much more rare the contrary. It is very easy to stumble upon parameters that give almost zero contribution to playing strenght.

In real world, chess evaluation functions are, for the most part, an unnecessary mess. They have grown like that for historycal reasons, because authors wrote the variuos sub-parts in a whole, without adding one parameter at a time. This is not necessarly bad design, is not always possible to distill one drop after the other while writing, so you have a sub-part, say king safety, that taken as an whole it works, but it doesn't mean that all its parameters have the same weight, some works, some have little impact, many have no measurable impact.

I would suggest that if your method has to be applied to real world engines it is a mandatory property to be robust to "no-effect" parameters because, as I have said, they are the norm rather then the exception.

mcostalba · Post by **mcostalba** » Thu Jun 24, 2010 8:25 pm

Don wrote:I suspect that Komodo has a few positional terms that contribute nothing to the strength of the program.

That's because we attempt to keep changes that add 1-3 ELO and of course it's very difficult to accurately determine that as it would take over 100,000 to know with very much certainty when the change is so small.

I understand your point very well and I second that 100%

In evaluation you end up with say, a 10 parameters sub-set, that taken in an whole you know it works, becasue if you remove that part engine turns out weaker, but you don't have the magnifying lens to "resolve" that 10 parameters sub-set in its constitutes, you have to treat that sub-set as an atomic part, this is due to "test resolution" not be enough, but this is a fundamental problem becasue test resolution could never be enough in real world when you are talking of 1-3 ELO. So you need a different approach there, an approach that takes in consideration this point.

Look · Post by **Look** » Thu Jun 24, 2010 8:26 pm

Right now, there are more important problems. Some users reported to me that QLR may fail on some data. This usually happens when a parameter has a small influence on playing strength. The regression may then become positive, and QLR will only sample at both extremities of the parameter range. I have to fix that first.

Rémi

Have they set window too narrow? Is window off the (main) optimum?

bob · Post by **bob** » Thu Jun 24, 2010 8:29 pm

Don wrote:I suspect that Komodo has a few positional terms that contribute nothing to the strength of the program.

That's because we attempt to keep changes that add 1-3 ELO and of course it's very difficult to accurately determine that as it would take over 100,000 to know with very much certainty when the change is so small.

That seems dangerous to me. In the same way using 20 games to measure a +100 improvement is dangerous. To solve this, I run my usual 30K game matches, and when we get one of those "looks like 2-3 Elo at best" I just run a second test that adds another 90K games to the total to get us to that +/-2 range, and if that is not enough, another 90K to get us to the +/- 1 range. I've decided that it is better to discard changes, even if they might be a slight improvement, unless we verify they are good. You can still make several -1, -2 or -3 changes and go rapidly in the wrong direction.

If we try something and it is "close" we make a choice to discard or improve the accuracy depending on how we feel about the change. Some are of the form "Wonder if this will improve the play?" and without verification, those go away. Others are of the form "I am sure this is better." and when it comes out only +/- 2 or 3, we go for more accuracy to prove it is better, or we toss it out.

Seems safer.

mcostalba wrote:
Rémi Coulom wrote:This usually happens when a parameter has a small influence on playing strength.
This is absolutely the common case, it is much more rare the contrary. It is very easy to stumble upon parameters that give almost zero contribution to playing strenght.

In real world, chess evaluation functions are, for the most part, an unnecessary mess. They have grown like that for historycal reasons, because authors wrote the variuos sub-parts in a whole, without adding one parameter at a time. This is not necessarly bad design, is not always possible to distill one drop after the other while writing, so you have a sub-part, say king safety, that taken as an whole it works, but it doesn't mean that all its parameters have the same weight, some works, some have little impact, many have no measurable impact.

I would suggest that if your method has to be applied to real world engines it is a mandatory property to be robust to "no-effect" parameters because, as I have said, they are the norm rather then the exception.

marcelk · Post by **marcelk** » Thu Jun 24, 2010 9:06 pm

Rémi Coulom wrote:
marcelk wrote:I have seen them in 1-parameter hill climbing: Sometimes one must change two parameters simultaneously to continue climbing.
Can you tell for which parameter you observed a local optimum?

The entire vector. The hill climbing continued after injecting some noise (the annealing principle).

marcelk · Post by **marcelk** » Thu Jun 24, 2010 9:48 pm

Rémi Coulom wrote:
marcelk wrote:I have seen them in 1-parameter hill climbing: Sometimes one must change two parameters simultaneously to continue climbing.
Can you tell for which parameter you observed a local optimum?

The Deep Thought team has also reported having such local optima in ICCA Dec. 1997. That's why they added linear combinations of parameters to the process.
IIRC, they used supervised learning from game records. The function they optimize is deterministic, and discrete. I am not surprised they may have local optima, when they get close to the maximum.

They got stuck more than 100 elo points below the maximum.

I would like to know a parameter that has local optima of playing strength.

Got one here below. This is a parameter related to faker-pawns. (Faker = candidate pawn that doesn't have sufficient own-pawn support to become a passer). Optimum value in this case would be v=23 as I'm minimizing an error function in this method instead of maximizing an elo number, but that is an artificial difference.

Almost all other parameters I tune with hill "climbing" (descending) display nice, almost perfect, parabolic error functions as it should be. But this is one of the few parameters that behave more erratic and in this case you can observe local minima at v=21 and v=32

I investigated: As far as I can tell, the wobbly curve is real and not the result of undersampling or an implementation bug (it might at best indicate that more parameters are needed to describe this feature completely).

Daniel Shawul · Post by **Daniel Shawul** » Fri Jun 25, 2010 1:10 am

I would also imagine that if there is high correlation between the parameters to be tuned , more than one optima could be found. For instance, one parameter for rook on open file (RO), and another for general rook mobility(RM). Two local optima could occur (higher RO and lower RM or vice versa) which could give equal or different performance. If we believed that higher RO bonus would be good and started tuning from there, it may be very difficult for the tuner to try out lower RO higher RM combinations... Unlike the population based methods which try out multiple hills at the same time, the hill climbing method can only pick the global optima by chance.. A combination of the population and gradient based methods have been used to avoid this specific problem. Pick up hills with the former and use the later for faster convergence.

Look · Post by **Look** » Fri Jun 25, 2010 5:15 am

I would also imagine that if there is high correlation between the parameters to be tuned , more than one optima could be found. For instance, one parameter for rook on open file (RO), and another for general rook mobility(RM). Two local optima could occur (higher RO and lower RM or vice versa) which could give equal or different performance. If we believed that higher RO bonus would be good and started tuning from there, it may be very difficult for the tuner to try out lower RO higher RM combinations... Unlike the population based methods which try out multiple hills at the same time, the hill climbing method can only pick the global optima by chance.. A combination of the population and gradient based methods have been used to avoid this specific problem. Pick up hills with the former and use the later for faster convergence.

I was thinking about this theory too. That is more correlation may introduce local optimum points. After Marcel van Kervinck posts about Faker pawn, I think other theories could be given too. Say parameters which relate considerably to search and eval. To tune them, one has to search enough. If not, the cure may not be as expected.

mathmoi · Post by **mathmoi** » Mon Apr 11, 2011 3:17 am

Sorry for the thread necromancy, but is this tool still avaible somewhere?

Thanks

Tool for automatic black-box parameter optimization released

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele

Re: Tool for automatic black-box parameter optimization rele