Texel tuning method question

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Texel tuning method question

Post by jdart »

qsearch is a plain fail soft - no delta pruning, no TT, etc.
Well, delta pruning is probably good for a sizeable ELO gain, right there. So I would do that.

There is no need to tune parameters one at a time and you don't have to limit yourself to tuning a few values. A reasonable optimization algorithm can tune several hundred simultaneously, no problem. Look at Adagrad or ADAM - they are not really hard to code. Github has some example code.

The process I believe should be relatively insensitive to K. K is just determining how scores map to predicted outcomes. I use 0.75 and so the predicted outcome is :

1.0/(1.0+exp(-0.75*val))

where val is the score.

What is your validation procedure? Once you have done a tuning run, you should have a method, self-play or game play against an opponent gauntlet, to validate that changes do, or don't improve strength.

--Jon
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Texel tuning method question

Post by Ferdy »

Cheney wrote:I will certainly try this out as well to see if this helps but for some reason, I am just not getting positive results with this. I am thinking I am missing something. I reread all the posts to this thread. Would someone mind shedding some light on my scenario? I am sorry if some of this is repetitive.
If you cannot get a positive result, that could mean your parameter values are already optimal or there are bugs in the eval (I spent more time reviewing my eval code before running the tuning), or your training sets are not good like in extreme case all wins for side with passer and you are tuning a passer, so every time the position is encountered, increasing the passer bonus is always good. It should be balance by having a passer in the position but the result is draw or a loss.
Cheney wrote:What I have read about is some use "end of PV" nodes, use quiet nodes, comments about not running qsearch and just using the eval, and comments about comparing my results against results of a better engine. All of this confuses me :) and makes me think my process is wrong.
Texel method uses qsearch score. Another method is just using eval score, this is the method that obviously requires quiet training sets.
Cheney wrote:Based on what I have read on the process on how to test, this is what I know and have done:

First, My qsearch is a plain fail soft - no delta pruning, no TT, etc.

I have engine version 1 and version 2. They played 64k games. Version 2 is a bit better than version 1 as it has the pawn parameters I wish to test and tune.

From these games, I extracted 8+ million positions, excluded opening book positions and positions when mate was found. The fen was saved with the game result (0 for black won that game, 0.5 for draw, 1 for white won).

I had the engine compute a K value. All the engine did was load each FEN (and the game result), run qsearch, get the returned score, store it and the result into a vector, and once all fens were searched, I calculated a K with minimal E.

So far, at no point did I compare anything with the real scores from the real games that were played or any other scores from other engines.

Next, I used the gradient search (local search seems to be another name used) to tune a few values.

First, with K set (as discovered in a previous paragraph), I'd qsearch all the positions again with all the eval parameters at their "base" settings. The E determined here is now the best E.

Second, I'd have eval-param1 increased by 1, qsearch all the positions, and using all the new returned scores and the known K, I could determine a new E and if it is better than the best E.

Third, based on if a new best E was found or not, I'd either decrease eval-param1 or move onto the next eval-param while saving the new best E.
If there is new best E, save the param values and move on to the next param and use all best params in the succeeding test. If best E is not reduced, change the delta of the param and test again. If all deltas have been tried move on to the next param of course.
Cheney wrote:Should I only tune parameters when my engine loses to the older version?
Tune parameters when you have changes to eval and search.

Sample development scenario:

Code: Select all

1. Create base engine
   a. Create base-a engine, based from base engine
   b. Test base vs base-a, this would test the reliability of your test environment
       and randomness of the engine.
   c. Make sure that the result is even
   d. If result is even goto (2)
2. Create base-1 engine based from base engine, add bonus x for knight outpost. value x is your first estimate.
3. Test base vs base-1
4. If result is not really good but a small elo reduction or equal
    a. Create base-1a, based from base-1 and tune it.
    b. Tune all/whatever param with your auto-tuner
    c. Apply good param to base-1a
    d. Test base vs base-1a
5. Then next eval modification ...

To test that your auto-tuner is working, set your pawn to 30, knight to 100. Then tune, just this 2 param lets see if it can increase your pawn value
close to 100 and knight value to 300 for example. This would also check the goodness of your training sets used. After tuning test it in actual games.
Cheney
Posts: 104
Joined: Thu Sep 27, 2012 2:24 am

Re: Texel tuning method question

Post by Cheney »

There is no need to tune parameters one at a time
I did look at the pseduo code on CPW for the local search and have a similar model. Are you saying I could, for example:

1. Set chain pawn +1
2. Set Backward pawn +1
3. Set rook on 7th -1
4. ... set a few other parameters to +/-1
5. Now run qsearch across all stored positions and look for a better E and repeat steps

Is this accurate?

Thank you :)
Cheney
Posts: 104
Joined: Thu Sep 27, 2012 2:24 am

Re: Texel tuning method question

Post by Cheney »

Ferdy wrote:If you cannot get a positive result, that could mean your parameter values are already optimal or there are bugs in the eval
I went as far as starting over with my original eval - tapered eval with PSQ. I added the pawn code for pawn structure. In base, I set bonus/penalties to 0 and in base-1 I set the best values from tuning; base-1 loses at 48% over 5000 games. If I set the manually tuned values, base-1 wins at 53%. I thought this was odd but I leave it as the values were over tuned. Some parameters were auto tuned close to what I manually tuned, but two others went completely in the opposite direction (a bonus, for example, went negative).
Ferdy wrote:Sample development scenario
It seems like I follow this process. I went right into tuning the pawn items but not tuning just a a single parameter like knight outpost or changing the pawn/knight piece values. I will test those and see where it goes.
Ferdy wrote:This would also check the goodness of your training sets used.
Would it be better to download a bunch of games from CCRL or from another database and tune against that? Peter's explanation is one can play two versions of his engine to get a test set, but thinking about it, those positions are probably not ideal as the engine is likely to play similar positions. Using your passer example:
or your training sets are not good like in extreme case all wins for side with passer and you are tuning a passer
I guess I have to review the games to determine this somehow.

Thank you for the details and help :)
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Texel tuning method question

Post by jdart »

What gradient descent does is find the minimum E, which is basically finding the bottom of a big multi-dimensional bowl. As long as the method always steers the parameters towards a point with lower E, it will eventually converge to a minimum (this is not guaranteed to be a global minimum, though, except under certain conditions). But do to this you need to use the gradient (derivative) to select the direction in which to tune parameters.

--Jon
Cheney
Posts: 104
Joined: Thu Sep 27, 2012 2:24 am

Re: Texel tuning method question

Post by Cheney »

Thanks Jon :)

I do follow that concept but what I am stumped on is the idea of testing multiple parameters simultaneously.
Jon wrote: A reasonable optimization algorithm can tune several hundred simultaneously
If I take four parameters and increase them all by 1, then test, and I find a lower E, maybe two of the parameters were adjusted in the correct direction but the other two were not. If the idea of setting multiple at once is correct, I am stumped on how to figure out which parameters need to be adjusted in the other direction. Threading is the first thing that comes to mind - adjust param1 in thread 1, adjust param2 in thread 2, and so on, - but I am not sure I need to go to this extent.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Texel tuning method question

Post by AlvaroBegue »

Cheney wrote:Thanks Jon :)

I do follow that concept but what I am stumped on is the idea of testing multiple parameters simultaneously.
Jon wrote: A reasonable optimization algorithm can tune several hundred simultaneously
If I take four parameters and increase them all by 1, then test, and I find a lower E, maybe two of the parameters were adjusted in the correct direction but the other two were not. If the idea of setting multiple at once is correct, I am stumped on how to figure out which parameters need to be adjusted in the other direction. Threading is the first thing that comes to mind - adjust param1 in thread 1, adjust param2 in thread 2, and so on, - but I am not sure I need to go to this extent.
Compute the derivative of the loss function (the thing you are trying to minimize) with respect to each of the parameters. The resulting vector is called the gradient. That will give you the direction of steepest increase, so try going a little bit in the opposite direction.

The gradient can be computed with automatic differentiation, or with the chain rule and some patience. :)
Robert Pope
Posts: 558
Joined: Sat Mar 25, 2006 8:27 pm

Re: Texel tuning method question

Post by Robert Pope »

Cheney wrote:Thanks Jon :)

I do follow that concept but what I am stumped on is the idea of testing multiple parameters simultaneously.
Jon wrote: A reasonable optimization algorithm can tune several hundred simultaneously
If I take four parameters and increase them all by 1, then test, and I find a lower E, maybe two of the parameters were adjusted in the correct direction but the other two were not. If the idea of setting multiple at once is correct, I am stumped on how to figure out which parameters need to be adjusted in the other direction. Threading is the first thing that comes to mind - adjust param1 in thread 1, adjust param2 in thread 2, and so on, - but I am not sure I need to go to this extent.
Instead of randomly changing parameters, you need to calculate which direction each parameter needs to change to reduce E, plus how strongly they move them (maybe a shift of 10 on P1 changes the score by 10, but a change on P2 only changes it by 2). Then you can shift all of the parameters that influence E at the same time, and in the right direction. Assuming your steps aren't too big, that should get you closer to the goal weights on all of those parameters at once.

I cheat in my optimization, and rather than change them all at once, I do them one after another, but limit how far any one parameter can shift each time. And maybe one pass will focus on just material weights, and then another pass on PieceSquare tables.
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Texel tuning method question

Post by jdart »

The simplest case is one where the parameter you are tuning is just multiplied by some factor (in the position) to get the score.

So score = x*p where x is something like the number of doubled pawns in the position, and p is the doubled pawn parameter value.

You also have a value for the position (score in fractions of pawn) and the actual result of the game (0 for Black win, 1/2 for draw, 1 for White win). Now, with that, plus the x value for White and Black, you can actually compute the derivative of the error function (how much the error changes, and in what direction, when p changes. for a particular position), with some calculus - it is not actually very complex. See https://github.com/jdart1/arasan-chess/ ... /tuner.cpp around line 1357. Or you can estimate it.

The gradient for the test set is actually a vector, and the entry for parameter p has the gradient value for each position, summed over all the positions in the test set, divided by the number of test set entries. This is how much the mean error will change when p changes.

Now with that gradient vector and an optimization method you can compute the step size and direction for each parameter. Apply those changes, then repeat the whole gradient computation process.

--Jon
Cheney
Posts: 104
Joined: Thu Sep 27, 2012 2:24 am

Re: Texel tuning method question

Post by Cheney »

Thanks for the replies :). I have been trying to study up a bit on calculus and read some docs/examples on derivatives and gradient descent.
Alvaro wrote:Compute the derivative of the loss function (the thing you are trying to minimize) with respect to each of the parameters.
Here's where I am lost.

First, just to be clear, isn't my loss function (the "sum of the squares"):
E= 1/N * Sum(i=1,N, (R(i) - Sigmoid(Q(i))^ 2)

If this is correct:

* All the derivative examples I covered were all basics, like y=x^2 + x - 4, so I do not know how to calculate this equation's derivative "with respect" to anything :). I need to keep studying.

* Also, since any qsearch result is a single value based on a bunch of parameters, how could one calculate the derivative with respect to a single parameter that is fed into the QSearch function? I believe Peter Osterlund wrote it as "this is a black box".

I looked at Arasan's code and it looks like "computeTexelDeriv" is where the dT is actually calculated. Unfortunately, I need to look at this code a lot longer to understand what is going on.

Because of this, I looked for additional answers and read Peter wrote something like this (going from memory here :)) :
dE / dP ~= E(p(i)+1) - E(p(i))

This is the direction I have been thinking and am starting to test. Basically, with all parameters at their defaults, calculate E, then adjust each parameter one at a time while getting a new E for each parameter. This dE/dP is more like a delta or change in E. This tells me a direction which to increase or decrease that parameter but not a rate or pace in which to make the change. Once I collect the dE/dP for each parameter, I could start to walk all the parameters simultaneously but I do not know the rate at which each parameter needs to change.

Is this line of thinking OK or valid? Maybe this is where and why I need to understand the derivative and its purpose?

Thanks again :)