Texel tuning method question

jdart · Post by **jdart** » Sun Jul 23, 2017 11:18 pm

Typically with gradient descent you tune all parameters simultaneously.

The better methods for this use a variable step size, one that decreases as the method approaches convergence.

This has been an area in which there have been some fairly recent developments. See http://ruder.io/optimizing-gradient-descent/ for an overview.

A popular method is ADAM. That is what I use.

--Jon

AlvaroBegue · Post by **AlvaroBegue** » Sun Jul 23, 2017 11:26 pm

jdart wrote:Typically with gradient descent you tune all parameters simultaneously.

The better methods for this use a variable step size, one that decreases as the method approaches convergence.

This has been an area in which there have been some fairly recent developments. See http://ruder.io/optimizing-gradient-descent/ for an overview.

A popular method is ADAM. That is what I use.

--Jon

One important observation is that you don't need to evaluate the error on your entire training dataset before you start changing your parameters. Instead, you can just look at a small random subsample of your training dataset (say, 100 positions) and change the parameters a little bit to reduce the error when evaluated in that subset. This is called stochastic gradient descent with mini-batches. There are many learning algorithms (procedures to decide how to change the parameters in each step after you evaluate the gradient). Adam is a reasonable choice.

If you were using batch gradient descent (i.e., evaluating the gradient over the entire training set at each step), there are other options that might be better. In RuyTune I used L-BFGS, but not I am not convinced that was a good choice. In any case, I would recommend using mini-batches for most situations.

Ferdy · Post by **Ferdy** » Mon Jul 24, 2017 5:47 am

Cheney wrote:A question on this is if I can walk one parameter until it no longer improves and then walk the next one?

In doing so you probably deprived the other parameters to achieve their optimal values. This is just the same as tuning the parameter one by one

.
Peter had mentioned and I use it in tuning to sort the parameter according to the number of successful reduction of Error. Example:

Code: Select all

c = 0
param = &#91;
&#91;pawn, 100, c&#93;,
&#91;knight, 300, c&#93;,
&#91;bishop, 300, c&#93;
&#93;

Change pawn to 101, and if it is successful, increment the counter c.
That would become,

Code: Select all

param = &#91;
&#91;pawn, 101, 1&#93;,
&#91;knight, 300, 0&#93;,
&#91;bishop, 300, 0&#93;
&#93;

After 1 param cycle you may get for example.

Code: Select all

param = &#91;
&#91;pawn, 101, 1&#93;,
&#91;knight, 300, 0&#93;,
&#91;bishop, 301, 1&#93;
&#93;

That means the trials with knight value changes does not improve the E.

So instead of pawn, knight bishop sequence, you change it to pawn, bishop, knight as in.

Code: Select all

param = &#91;
&#91;pawn, 100, 1&#93;,
&#91;bishop,301, 1&#93;,
&#91;knight, 300, 0&#93;
&#93;

The idea is to test first those param that are sensitive to the training sets that you use. This would allow those param to find its optimal values as early as possible, the knight can be tested late as perhaps its value seemed close to optimal already.

You may try GA too it is interesting.

http://talkchess.com/forum/viewtopic.ph ... 37&t=57246

Cheney · Post by **Cheney** » Thu Jul 27, 2017 1:15 am

I will certainly try this out as well to see if this helps but for some reason, I am just not getting positive results with this. I am thinking I am missing something. I reread all the posts to this thread. Would someone mind shedding some light on my scenario? I am sorry if some of this is repetitive.

What I have read about is some use "end of PV" nodes, use quiet nodes, comments about not running qsearch and just using the eval, and comments about comparing my results against results of a better engine. All of this confuses me

and makes me think my process is wrong.

Based on what I have read on the process on how to test, this is what I know and have done:

First, My qsearch is a plain fail soft - no delta pruning, no TT, etc.

I have engine version 1 and version 2. They played 64k games. Version 2 is a bit better than version 1 as it has the pawn parameters I wish to test and tune.

From these games, I extracted 8+ million positions, excluded opening book positions and positions when mate was found. The fen was saved with the game result (0 for black won that game, 0.5 for draw, 1 for white won).

I had the engine compute a K value. All the engine did was load each FEN (and the game result), run qsearch, get the returned score, store it and the result into a vector, and once all fens were searched, I calculated a K with minimal E.

So far, at no point did I compare anything with the real scores from the real games that were played or any other scores from other engines.

Next, I used the gradient search (local search seems to be another name used) to tune a few values.

First, with K set (as discovered in a previous paragraph), I'd qsearch all the positions again with all the eval parameters at their "base" settings. The E determined here is now the best E.

Second, I'd have eval-param1 increased by 1, qsearch all the positions, and using all the new returned scores and the known K, I could determine a new E and if it is better than the best E.

Third, based on if a new best E was found or not, I'd either decrease eval-param1 or move onto the next eval-param while saving the new best E.

This would repeat until all parameters were tested in a round and there was no improvement on E. The final tuned parameters, I'd then set into the eval function as the new base/default values and then play the engine against a previous version but do not have a winning engine.

That's the process. Am I missing something? Should I compute K from the scores from the real games first? Should I only tune parameters when my engine loses to the older version? Etc.?

Thank you

jdart · Post by **jdart** » Thu Jul 27, 2017 3:55 am

qsearch is a plain fail soft - no delta pruning, no TT, etc.

Well, delta pruning is probably good for a sizeable ELO gain, right there. So I would do that.

There is no need to tune parameters one at a time and you don't have to limit yourself to tuning a few values. A reasonable optimization algorithm can tune several hundred simultaneously, no problem. Look at Adagrad or ADAM - they are not really hard to code. Github has some example code.

The process I believe should be relatively insensitive to K. K is just determining how scores map to predicted outcomes. I use 0.75 and so the predicted outcome is :

1.0/(1.0+exp(-0.75*val))

where val is the score.

What is your validation procedure? Once you have done a tuning run, you should have a method, self-play or game play against an opponent gauntlet, to validate that changes do, or don't improve strength.

--Jon

Ferdy · Post by **Ferdy** » Thu Jul 27, 2017 4:30 am

Cheney wrote:I will certainly try this out as well to see if this helps but for some reason, I am just not getting positive results with this. I am thinking I am missing something. I reread all the posts to this thread. Would someone mind shedding some light on my scenario? I am sorry if some of this is repetitive.

If you cannot get a positive result, that could mean your parameter values are already optimal or there are bugs in the eval (I spent more time reviewing my eval code before running the tuning), or your training sets are not good like in extreme case all wins for side with passer and you are tuning a passer, so every time the position is encountered, increasing the passer bonus is always good. It should be balance by having a passer in the position but the result is draw or a loss.

Cheney wrote:What I have read about is some use "end of PV" nodes, use quiet nodes, comments about not running qsearch and just using the eval, and comments about comparing my results against results of a better engine. All of this confuses me and makes me think my process is wrong.

Texel method uses qsearch score. Another method is just using eval score, this is the method that obviously requires quiet training sets.

Cheney wrote:Based on what I have read on the process on how to test, this is what I know and have done:

First, My qsearch is a plain fail soft - no delta pruning, no TT, etc.

I have engine version 1 and version 2. They played 64k games. Version 2 is a bit better than version 1 as it has the pawn parameters I wish to test and tune.

From these games, I extracted 8+ million positions, excluded opening book positions and positions when mate was found. The fen was saved with the game result (0 for black won that game, 0.5 for draw, 1 for white won).

I had the engine compute a K value. All the engine did was load each FEN (and the game result), run qsearch, get the returned score, store it and the result into a vector, and once all fens were searched, I calculated a K with minimal E.

So far, at no point did I compare anything with the real scores from the real games that were played or any other scores from other engines.

Next, I used the gradient search (local search seems to be another name used) to tune a few values.

First, with K set (as discovered in a previous paragraph), I'd qsearch all the positions again with all the eval parameters at their "base" settings. The E determined here is now the best E.

Second, I'd have eval-param1 increased by 1, qsearch all the positions, and using all the new returned scores and the known K, I could determine a new E and if it is better than the best E.

Third, based on if a new best E was found or not, I'd either decrease eval-param1 or move onto the next eval-param while saving the new best E.

If there is new best E, save the param values and move on to the next param and use all best params in the succeeding test. If best E is not reduced, change the delta of the param and test again. If all deltas have been tried move on to the next param of course.

Cheney wrote:Should I only tune parameters when my engine loses to the older version?

Tune parameters when you have changes to eval and search.

Sample development scenario:

Code: Select all

1. Create base engine
   a. Create base-a engine, based from base engine
   b. Test base vs base-a, this would test the reliability of your test environment
       and randomness of the engine.
   c. Make sure that the result is even
   d. If result is even goto &#40;2&#41;
2. Create base-1 engine based from base engine, add bonus x for knight outpost. value x is your first estimate.
3. Test base vs base-1
4. If result is not really good but a small elo reduction or equal
    a. Create base-1a, based from base-1 and tune it.
    b. Tune all/whatever param with your auto-tuner
    c. Apply good param to base-1a
    d. Test base vs base-1a
5. Then next eval modification ...

To test that your auto-tuner is working, set your pawn to 30, knight to 100. Then tune, just this 2 param lets see if it can increase your pawn value
close to 100 and knight value to 300 for example. This would also check the goodness of your training sets used. After tuning test it in actual games.

Cheney · Post by **Cheney** » Thu Jul 27, 2017 1:08 pm

There is no need to tune parameters one at a time

I did look at the pseduo code on CPW for the local search and have a similar model. Are you saying I could, for example:

1. Set chain pawn +1
2. Set Backward pawn +1
3. Set rook on 7th -1
4. ... set a few other parameters to +/-1
5. Now run qsearch across all stored positions and look for a better E and repeat steps

Is this accurate?

Thank you

Cheney · Post by **Cheney** » Thu Jul 27, 2017 1:43 pm

Ferdy wrote:If you cannot get a positive result, that could mean your parameter values are already optimal or there are bugs in the eval

I went as far as starting over with my original eval - tapered eval with PSQ. I added the pawn code for pawn structure. In base, I set bonus/penalties to 0 and in base-1 I set the best values from tuning; base-1 loses at 48% over 5000 games. If I set the manually tuned values, base-1 wins at 53%. I thought this was odd but I leave it as the values were over tuned. Some parameters were auto tuned close to what I manually tuned, but two others went completely in the opposite direction (a bonus, for example, went negative).

Ferdy wrote:Sample development scenario

It seems like I follow this process. I went right into tuning the pawn items but not tuning just a a single parameter like knight outpost or changing the pawn/knight piece values. I will test those and see where it goes.

Ferdy wrote:This would also check the goodness of your training sets used.

Would it be better to download a bunch of games from CCRL or from another database and tune against that? Peter's explanation is one can play two versions of his engine to get a test set, but thinking about it, those positions are probably not ideal as the engine is likely to play similar positions. Using your passer example:

or your training sets are not good like in extreme case all wins for side with passer and you are tuning a passer

I guess I have to review the games to determine this somehow.

Thank you for the details and help

jdart · Post by **jdart** » Thu Jul 27, 2017 3:49 pm

What gradient descent does is find the minimum E, which is basically finding the bottom of a big multi-dimensional bowl. As long as the method always steers the parameters towards a point with lower E, it will eventually converge to a minimum (this is not guaranteed to be a global minimum, though, except under certain conditions). But do to this you need to use the gradient (derivative) to select the direction in which to tune parameters.

--Jon

Cheney · Post by **Cheney** » Thu Jul 27, 2017 5:09 pm

Thanks Jon

I do follow that concept but what I am stumped on is the idea of testing multiple parameters simultaneously.

Jon wrote: A reasonable optimization algorithm can tune several hundred simultaneously

If I take four parameters and increase them all by 1, then test, and I find a lower E, maybe two of the parameters were adjusted in the correct direction but the other two were not. If the idea of setting multiple at once is correct, I am stumped on how to figure out which parameters need to be adjusted in the other direction. Threading is the first thing that comes to mind - adjust param1 in thread 1, adjust param2 in thread 2, and so on, - but I am not sure I need to go to this extent.

Texel tuning method question

Re: Texel tuning method question

Re: Texel tuning method question

Re: Texel tuning method question

Re: Texel tuning method question

Re: Texel tuning method question

Re: Texel tuning method question

Re: Texel tuning method question

Re: Texel tuning method question

Re: Texel tuning method question

Re: Texel tuning method question