Tuning parameters.

laurietunnicliffe · Post by **laurietunnicliffe** » Sat Jul 06, 2024 6:56 am

Can someone please explain like i'm a 6 year old.

If I want to tune my PST's, which contain 64(squares) X 6(pieces)...that is 384 values......
Can I only tune 1 parameter at a time ? (that's a lot of tests to tune all 384 values).
If I only tune one at a time, when I tune the next one, it may be dependent of the previous value, so the previous value is not longer
best.
?????

shawn · Post by **shawn** » Sat Jul 06, 2024 7:12 am

No, that will take too much resources. Texel tuning is the way to tune PSQTs and HCE evaluation in general. Andrew Grant has a very nice paper on this: https://github.com/AndyGrant/Ethereal/b ... Tuning.pdf

laurietunnicliffe · Post by **laurietunnicliffe** » Sun Jul 07, 2024 7:12 am

In section 2:
"Peter would loop
over the set of evaluation weights and adjust them slightly. He would then recompute an error using the
original value of K."

This seems to be the only reference to modifying the weights (e.g. the PST) but when you "adjust them slightly" do you increase them, or decrease them, do they all go in one direction, or do some increase and some decrease, or do you randomly increase/decrease? Do you adjust all the weights in the Eval.

I kinda get the process of minimizing errors and if it was only 1 parameter then its obvious, but 1000 parameters, how can you change them ALL together and get any indication of which way to adjust next ?

Maybe the answer is in the math, but remember I'm a 6 year old.

op12no2 · Post by **op12no2** » Sun Jul 07, 2024 10:07 am

Think of it in terms of features and weights.

Start with just material.

For each position measure the feature value for PNBRQK as (number of white - number of black). So for each position you have 6 feature values. You also have 6 global weights (not per position) - start with say 100,300,300,500,900. For each position measure the loss then adjust the weights (all of them) based on a fraction of that loss and the corresponding feature value: weight[j] -= loss * 0.001 * feature[j].

Repeat.

To extend for PSTs you would add 48 features for the pawn PST and 64 features for each of the other pieces, measured as white piece on square k - black piece on square k.

The loss is typically sigmoid(eval) - WDL, where WDL is 1 for white win, 0.5 for a draw and 0 for a black win.

Every now and again you can measure the overall loss, but it's not actually necessary.

Non-linear features are more tricky. It's common to batch the updates using mean loss over a batch. It's common to shuffle the positions between 'epochs' - one pass through the positions. 0.001 is called the learning rate, tweak as required. Optimisers like Adagrad and ADAM can give more stability. You could leave the pawn weight as 100 to give the whole thing a root to hang on to.

laurietunnicliffe · Post by **laurietunnicliffe** » Sun Jul 07, 2024 10:21 am

weight[j] -= loss * 0.001 * weight[j].

So you are subtracting a small portion to create the next weight, but what if reducing the weight is the wrong direction.
What if it should have been increased. And why should ALL of the weights be reduced ???

hgm · Post by **hgm** » Sun Jul 07, 2024 10:21 am

One way is to use 'steepest descent': From your currently best set of parameters you calculate the effect of a small change in each of those, while keeping all others the same. After having done that you change them all simultaneously, in proportion to the effect they had when changed alone. If this makes it better, you can try a bigger step in that same 'direction' (i.e. multiplying all changes by the same factor). If it gets worse you can take a smaller step in the same direction until it gets better. Once you cannot improve any further by stepping in the same direction, you repeat this process from the new best set of parameters.

E.g. if you have 5 weights, and a change of +1% in one of those resulted in a decrease of the mean error of 2, 5, -3, 0, 1, respectively, you would continue by changing them by +2%, +5%, -3%, none, +1%. Or +0.2%, +0.5%, -0.3%, none, +0.1%, respectively.

The reason you have to change all parameters at once is that the function you are trying to minimize only gets smaller if you change the parametes a lot in a very special combination. I call this the 'river-vally problem. If you want to find the sea starting at the river Rhine somewhere in Germany, you would need very many steps if you alternately moved precisely North-South or East-West. It would be much better to determine first in which direction the valley runs. Then you can take a big step downstream, until you hit the next bend, rather than stranding on the banks almost immediately.

shawn · Post by **shawn** » Sun Jul 07, 2024 10:26 am

Based on the description I would say that Peter adjusted every weight randomly, but don't take my word for it

. The better method of tuning the weights would be to define an objective function, a score representing the error of your evaluation from a certain, pre-scored dataset, and then use gradient descent to optimize your values. Admittedly my understanding of gradient descent is not yet enough to convey it without misleading you, but I can provide you some additional resources.

Code implementation of the tuner: https://github.com/GediminasMasaitis/texel-tuner
Discord servers where you can find HCE experts: https://www.chessprogramming.org/Comput ... d_Channels

op12no2 · Post by **op12no2** » Sun Jul 07, 2024 10:35 am

laurietunnicliffe wrote: ↑Sun Jul 07, 2024 10:21 am weight[j] -= loss * 0.001 * weight[j].

So you are subtracting a small portion to create the next weight, but what if reducing the weight is the wrong direction.
What if it should have been increased. And why should ALL of the weights be reduced ???

Sorry, that was a typo - I fixed the original post, but not quick enough!

You can adjust all the weights because you know the corresponding feature values for each position.

If the result is too big for a position, decrease the weights a little proportional to the corresponding feature values for that position and visa versa - that's essentially it.

mar · Post by **mar** » Sun Jul 07, 2024 11:20 am

laurietunnicliffe wrote: ↑Sat Jul 06, 2024 6:56 am Can someone please explain like i'm a 6 year old.

If I want to tune my PST's, which contain 64(squares) X 6(pieces)...that is 384 values......
Can I only tune 1 parameter at a time ? (that's a lot of tests to tune all 384 values).
If I only tune one at a time, when I tune the next one, it may be dependent of the previous value, so the previous value is not longer
best.
?????

well, you can do something much simpler, it's going to take longer (you'll probably get stuck in a local optimum either way) but it doesn't matter for a proof of concept:
say your parameters are in centipawns, you can do several steps (say +-1 or going incremental like +1, +2 and so on) in the direction that lowers the error and then loop through the rest of the parameters. this will be really slow but you won't have to worry about non-linear parameters, for example
I'd recommend to use prefiltered quiet positions to avoid doing qsearch as in original texel tuning to make things run faster.
and of course, you definitely want to paralellize this
as others have pointed out I'd probably start with something simpler like material so that you know it converges to sane values and that your implementation is correct - probably a good idea to pin pawn opening value to a fixed number, say 75cp
you also don't have to start "from scratch" but rather use your existing values and let the optimizer correct them
and you don't have to wait for full convergence either, you can run as many "epochs" (=cycles) as feasible and then verify your new values by playing games
a good idea is to turn all your eval parameters into a single "feature vector" that you can tune and unpack back into arrays, constants.
this will be the output of the tuner that you can print/copy out

another option would be use spsa instead of texel-style tuning:
then you'd probably want to generate your psqts from a small set of parameters, fruit-style and basically using a simple formulas instead of lookup tables that are much harder to tune (like for mobility, for example)

of course, these days handcrafted evaluation is completely obsolete so I'd suggest to invest time elsewhere (like switching to nn-based evaluation instead)

F. Bluemers · Post by **F. Bluemers** » Sun Jul 07, 2024 12:34 pm

pdf wrote: In section 2:
"Peter would loop
over the set of evaluation weights and adjust them slightly. He would then recompute an error using the
original value of K."

He would recompute the error after each adjustment and use it if the error after it was smaller.
He would then loop over the weights as long as there was an improvement.
See https://www.chessprogramming.org/Texel% ... ing_Method

Tuning parameters.

Tuning parameters.

Re: Tuning parameters.

Re: Tuning parameters.

Re: Tuning parameters.

Re: Tuning parameters.

Re: Tuning parameters.

Re: Tuning parameters.

Re: Tuning parameters.

Re: Tuning parameters.

Re: Tuning parameters.