tuning for the uninformed

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Robert Pope
Posts: 558
Joined: Sat Mar 25, 2006 8:27 pm

Re: tuning for the uninformed

Post by Robert Pope »

Henk wrote:
CheckersGuy wrote:
Henk wrote:All does not work if search space has a great many local optima and only very few global optima that you are interested in. But simulated annealing taking too long.
Local search and any other practical algorithm to minimize the error will end up in a local optimum.
Wasn't it that if it optimizes enough parameters you won't get trapped in a local optimum. I can't remember.
I don't think there is ever a guarantee of that without additional information about the domain. There's always the chance of a global minimum that is far from the rest of the "good" solutions that you will never hit except by very good chance.

Consider trying to find the minimum of this function (without actually knowing the function ahead of time):

y=-x, for 4999<x<=5000
y=x^2, for all other x

There is basically no chance someone is going to find the global minimum at x=5000. Then add another 100 dimensions for chess tuning.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: tuning for the uninformed

Post by AlvaroBegue »

Robert Pope wrote:
Henk wrote:
CheckersGuy wrote:
Henk wrote:All does not work if search space has a great many local optima and only very few global optima that you are interested in. But simulated annealing taking too long.
Local search and any other practical algorithm to minimize the error will end up in a local optimum.
Wasn't it that if it optimizes enough parameters you won't get trapped in a local optimum. I can't remember.
I don't think there is ever a guarantee of that without additional information about the domain. There's always the chance of a global minimum that is far from the rest of the "good" solutions that you will never hit except by very good chance.

Consider trying to find the minimum of this function (without actually knowing the function ahead of time):

y=-x, for 4999<x<=5000
y=x^2, for all other x

There is basically no chance someone is going to find the global minimum at x=5000. Then add another 100 dimensions for chess tuning.
That function is not qualitatively similar to the loss function being minimized in chess tuning. Adding many more dimensions actually ameliorates the problem of getting stuck in local minima.

The fear of getting stuck in a local minimum is likely overblown. If your evaluation function is linear, the corresponding optimization problem is convex, which implies there is only one critical point, which is the global minimum. If your evaluation function is something like a deep neural network with ReLU activations, the minimization problem is not convex and there are gazillions of critical points, but because of the high dimensionality most of them are saddle points and not minima. There are results from solid-state physics (something about randomized polynomials) that indicate that all the local minima have values contained in a narrow region above the true minimum, so it doesn't really matter which one you find.
brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 4:02 pm

Re: tuning for the uninformed

Post by brtzsnr »

AlvaroBegue wrote: That function is not qualitatively similar to the loss function being minimized in chess tuning. Adding many more dimensions actually ameliorates the problem of getting stuck in local minima.
Agreed. Most articles on local minima in high dimentional spaces point to this paper https://arxiv.org/pdf/1406.2572.pdf which I quote from related work
These works indicate that for typical, generic functions chosen from a random Gaussian ensemble of functions, local minima with high error are exponentially rare in the dimensionality of the problem, but saddle points with many negative and approximate plateau directions are exponentially likely.
So most local minima are very close to global minima. Choosing a test set of positions that exhibits a good distribution of your features is more important.
Robert Pope
Posts: 558
Joined: Sat Mar 25, 2006 8:27 pm

Re: tuning for the uninformed

Post by Robert Pope »

brtzsnr wrote:
AlvaroBegue wrote: That function is not qualitatively similar to the loss function being minimized in chess tuning. Adding many more dimensions actually ameliorates the problem of getting stuck in local minima.
Agreed. Most articles on local minima in high dimentional spaces point to this paper https://arxiv.org/pdf/1406.2572.pdf which I quote from related work
These works indicate that for typical, generic functions chosen from a random Gaussian ensemble of functions, local minima with high error are exponentially rare in the dimensionality of the problem, but saddle points with many negative and approximate plateau directions are exponentially likely.
So most local minima are very close to global minima. Choosing a test set of positions that exhibits a good distribution of your features is more important.
I don't disagree at all. I was just commenting on Henk's supposition that there is a way to avoid getting stuck in local minima. And there isn't, unless your domain is very clean. The local mimimum you end up in might be good enough, but that doesn't mean that you'll ever find the true global minimum.
CheckersGuy
Posts: 273
Joined: Wed Aug 24, 2016 9:49 pm

Re: tuning for the uninformed

Post by CheckersGuy »

I do have another question about how you guyz deal with that problem.
How do you generate a lot of games which aren't too smiliar ??? Or to phrase it differently, how do you generator a lot of unique positions ?
Do you use a very large opening book as a starting point ?

Greetings

Robin
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: tuning for the uninformed

Post by AlvaroBegue »

CheckersGuy wrote:I do have another question about how you guyz deal with that problem.
How do you generate a lot of games which aren't too smiliar ??? Or to phrase it differently, how do you generator a lot of unique positions ?
Do you use a very large opening book as a starting point ?

Greetings

Robin
I downloaded games from CCRL, took positions from those games and analyzed them with my program RuyDos. I saved positions on which the evaluation function was being called after searching 1000 nodes. I then labelled each position by running one very quick SF8-vs-SF8 game.

https://bitbucket.org/alonamaloh/ruy_tu ... th_results

EDIT: In that file each position has been replaced by the position from which quiescence search got its score.
zenpawn
Posts: 349
Joined: Sat Aug 06, 2016 8:31 pm
Location: United States

Re: tuning for the uninformed

Post by zenpawn »

CheckersGuy wrote: This is the local search algorithm but I would assume that it is better to run some gradient based algorithm first (Maybe gradient descent or gauss-newton). Then if the error doesn't change by much anymore I would switch to local search.
Could you describe these pre-localization algorithms for a non-mathematician? :)

Also, suppose a file of positions labeled with evals from an oracle engine instead of game results. Is there a way to convert the error formula used in Texel tuning to take an eval instead of a game result? Perhaps sigmoid of that minus the tuning engine's sigmoid instead of result - sigmoid?
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: tuning for the uninformed

Post by jdart »

Texel tuning is basically a form of logistic regression. There is a large literature on this, and a lot of explanatory material online. As initially described it is slightly unorthodox because it is using a squared-error loss function to handle the 3 categories of win, loss, draw. But the general principle is still the same.

--Jon
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: tuning for the uninformed

Post by jdart »

Henk wrote:All does not work if search space has a great many local optima and only very few global optima that you are interested in. But simulated annealing taking too long.
If the loss function is convex, and it generally is, then you are going to be able to converge to a minimum using local search, or gradient descent.

Nobody uses simulated annealing. There are much better algorithms.

For large training sets IMO you want to use a gradient descent method such as ADAM.

--Jon
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: tuning for the uninformed

Post by jdart »

Proof of strict convexity for logistic loss function:

http://qwone.com/~jason/writing/convexLR.pdf

--Jon