tuning for the uninformed

flok · Post by **flok** » Fri Nov 24, 2017 10:15 am

sandermvdb wrote:The basic idea is pretty simple: calculate the error of the evaluation when it is compared to the actual outcome of the positions.

What do you mean by that?
Do you mean the following:
- fen as input
- calc move with an eval val
- calc eval of the move that should've been moved
- compare these two (how? percentual difference? or what?)

sandermvdb · Post by **sandermvdb** » Fri Nov 24, 2017 10:30 am

flok wrote:
sandermvdb wrote:The basic idea is pretty simple: calculate the error of the evaluation when it is compared to the actual outcome of the positions.
What do you mean by that?
Do you mean the following:
- fen as input
- calc move with an eval val
- calc eval of the move that should've been moved
- compare these two (how? percentual difference? or what?)

No. I mean:
- fen as input
- calculate evaluation
- calculate error: compare evaluation score with the actual outcome. If the evaluation calculates that white has a big advantage but black wins -> big error. The exact formula is described on the cpw. This is my (pseudo) implementation, where K=1.3:

Code: Select all

public double calculateTotalError&#40;) &#123;
	double totalError = 0;
	for &#40;Entry<String, Double> entry &#58; fens.entrySet&#40;)) &#123; // fens contains all positions, including the outcome
		ChessBoard cb = new ChessBoard&#40;entry.getKey&#40;));
		totalError += Math.pow&#40;entry.getValue&#40;) - calculateSigmoid&#40;Eval.calculateScore&#40;cb&#41;), 2&#41;;
	&#125;
	totalError /= fens.size&#40;);
	return totalError;
&#125;

public double calculateSigmoid&#40;int score&#41; &#123;
	return 1 / &#40;1 + Math.pow&#40;10, -1.3 * score / 400&#41;);
&#125;

CheckersGuy · Post by **CheckersGuy** » Sat Nov 25, 2017 12:04 am

sandermvdb wrote:
flok wrote:If anyone is willing to explain the Texel tuning method tht would be great!

Sofar I understand I have to let it play (well, run QS + eval on FENs) millions of games and then do something with the evaluation-value. But what? I don't understand the wiki explanation.
The basic idea is pretty simple: calculate the error of the evaluation when it is compared to the actual outcome of the positions. Lower a particular evaluation parameter and check if the error has improved, if not, higher the parameter, if again not improved, keep the original value. Do this for all parameters until you have reached the lowest error.

This is the local search algorithm but I would assume that it is better to run some gradient based algorithm first (Maybe gradient descent or gauss-newton). Then if the error doesn't change by much anymore I would switch to local search.

I am going to implement texel tuning this week and see what I get

Henk · Post by **Henk** » Sat Nov 25, 2017 12:30 pm

All does not work if search space has a great many local optima and only very few global optima that you are interested in. But simulated annealing taking too long.

CheckersGuy · Post by **CheckersGuy** » Sat Nov 25, 2017 2:58 pm

Henk wrote:All does not work if search space has a great many local optima and only very few global optima that you are interested in. But simulated annealing taking too long.

Local search and any other practical algorithm to minimize the error will end up in a local optimum.

Henk · Post by **Henk** » Sat Nov 25, 2017 3:48 pm

CheckersGuy wrote:
Henk wrote:All does not work if search space has a great many local optima and only very few global optima that you are interested in. But simulated annealing taking too long.
Local search and any other practical algorithm to minimize the error will end up in a local optimum.

Wasn't it that if it optimizes enough parameters you won't get trapped in a local optimum. I can't remember.

Robert Pope · Post by **Robert Pope** » Mon Nov 27, 2017 6:38 pm

Henk wrote:
CheckersGuy wrote:
Henk wrote:All does not work if search space has a great many local optima and only very few global optima that you are interested in. But simulated annealing taking too long.
Local search and any other practical algorithm to minimize the error will end up in a local optimum.
Wasn't it that if it optimizes enough parameters you won't get trapped in a local optimum. I can't remember.

I don't think there is ever a guarantee of that without additional information about the domain. There's always the chance of a global minimum that is far from the rest of the "good" solutions that you will never hit except by very good chance.

Consider trying to find the minimum of this function (without actually knowing the function ahead of time):

y=-x, for 4999<x<=5000
y=x^2, for all other x

There is basically no chance someone is going to find the global minimum at x=5000. Then add another 100 dimensions for chess tuning.

AlvaroBegue · Post by **AlvaroBegue** » Mon Nov 27, 2017 6:52 pm

Robert Pope wrote:
Henk wrote:
CheckersGuy wrote:
Henk wrote:All does not work if search space has a great many local optima and only very few global optima that you are interested in. But simulated annealing taking too long.
Local search and any other practical algorithm to minimize the error will end up in a local optimum.
Wasn't it that if it optimizes enough parameters you won't get trapped in a local optimum. I can't remember.
I don't think there is ever a guarantee of that without additional information about the domain. There's always the chance of a global minimum that is far from the rest of the "good" solutions that you will never hit except by very good chance.

Consider trying to find the minimum of this function (without actually knowing the function ahead of time):

y=-x, for 4999<x<=5000
y=x^2, for all other x

There is basically no chance someone is going to find the global minimum at x=5000. Then add another 100 dimensions for chess tuning.

That function is not qualitatively similar to the loss function being minimized in chess tuning. Adding many more dimensions actually ameliorates the problem of getting stuck in local minima.

The fear of getting stuck in a local minimum is likely overblown. If your evaluation function is linear, the corresponding optimization problem is convex, which implies there is only one critical point, which is the global minimum. If your evaluation function is something like a deep neural network with ReLU activations, the minimization problem is not convex and there are gazillions of critical points, but because of the high dimensionality most of them are saddle points and not minima. There are results from solid-state physics (something about randomized polynomials) that indicate that all the local minima have values contained in a narrow region above the true minimum, so it doesn't really matter which one you find.

brtzsnr · Post by **brtzsnr** » Mon Nov 27, 2017 10:56 pm

AlvaroBegue wrote: That function is not qualitatively similar to the loss function being minimized in chess tuning. Adding many more dimensions actually ameliorates the problem of getting stuck in local minima.

Agreed. Most articles on local minima in high dimentional spaces point to this paper https://arxiv.org/pdf/1406.2572.pdf which I quote from related work

These works indicate that for typical, generic functions chosen from a random Gaussian ensemble of functions, local minima with high error are exponentially rare in the dimensionality of the problem, but saddle points with many negative and approximate plateau directions are exponentially likely.

So most local minima are very close to global minima. Choosing a test set of positions that exhibits a good distribution of your features is more important.

Robert Pope · Post by **Robert Pope** » Tue Nov 28, 2017 5:41 pm

brtzsnr wrote:
AlvaroBegue wrote: That function is not qualitatively similar to the loss function being minimized in chess tuning. Adding many more dimensions actually ameliorates the problem of getting stuck in local minima.
Agreed. Most articles on local minima in high dimentional spaces point to this paper https://arxiv.org/pdf/1406.2572.pdf which I quote from related work

These works indicate that for typical, generic functions chosen from a random Gaussian ensemble of functions, local minima with high error are exponentially rare in the dimensionality of the problem, but saddle points with many negative and approximate plateau directions are exponentially likely.
So most local minima are very close to global minima. Choosing a test set of positions that exhibits a good distribution of your features is more important.

I don't disagree at all. I was just commenting on Henk's supposition that there is a way to avoid getting stuck in local minima. And there isn't, unless your domain is very clean. The local mimimum you end up in might be good enough, but that doesn't mean that you'll ever find the true global minimum.

tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed