Texel tuning speed

xr_a_y · Post by **xr_a_y** » Thu Aug 30, 2018 12:16 am

I am trying Texel tuning and have a question about expected speed of the method.

I seems Weini is able to run the qsearch needed for each position and each evaluation of the error in around 0.06 millisecond.

Let's say I have only 100 000 positions and want to optimize 10 parameters.
It will requiere let's say at least 100 000 x 10 x 100 qsearch, so 2h40min of computation.

The same thing for 1M positions would take a least one day.
For 10M positions it would take a least 10 days.

Is that the timing you also get ?

AndrewGrant · Post by **AndrewGrant** » Thu Aug 30, 2018 2:01 am

xr_a_y wrote: ↑Thu Aug 30, 2018 12:16 am I am trying Texel tuning and have a question about expected speed of the method.

I seems Weini is able to run the qsearch needed for each position and each evaluation of the error in around 0.06 millisecond.

Let's say I have only 100 000 positions and want to optimize 10 parameters.
It will requiere let's say at least 100 000 x 10 x 100 qsearch, so 2h40min of computation.

The same thing for 1M positions would take a least one day.
For 10M positions it would take a least 10 days.

Is that the timing you also get ?

In Ethereal I use Stochastic Gradient Decent for the Texel process. All of my evaluation terms are linear in nature, and the entire evaluation function is wrapped in a tracing function. So when I call the evaluation (after using a qsearch to ensure a quiet position), I am given back a vector of coefficients for each eval term. IE if White has 5 pawns and black has 7, then the first entry in the vector, which corresponds to the Pawn value, is -2.

Using this method it only takes 1 qsearch call per position, indifferent of the number of terms. Startup for that is usually < 10 seconds.

From there, the actual Gradient Decent can usually reach ~ convergence in < 10,000 iterations, which is no more than a couple minutes on my 32 thread machine.

I would advise looking at this possibility, as the base texel tuning method is horrendously slow and inefficient.

jdart · Post by **jdart** » Thu Aug 30, 2018 3:42 am

I am using batch gradient descent with ADAM. This method tunes all parameters simultaneously.

I actually don't use qsearch but a one-ply search during tuning.

10 million tuning positions x 750 or so parameters takes about 130 iterations to converge.

My tuner is multithreaded. On a 24 core machine tuning takes 1.5-2 hrs.

--Jon

Joost Buijs · Post by **Joost Buijs** » Thu Aug 30, 2018 7:58 am

I use plain GD, SGD seems to be sensitive to the order of the training positions, so I never tried. My error function calls the evaluator (with pawn hash disabled) and not the quiescence search, of course I tried both but the difference in final tuning is negligible.

When I start with reasonable values the number of iterations through the GD needed to reach good convergence usually lies between 100-200. Tuning ~400 evaluation terms with 7.5 million positions takes roughly 2 hrs on my 4GHz. 6950X using 20 threads (with hyperthreading).

There are still a number of things that could be improved to increase tuning speed, e.g. each iteration I read and parse the training positions from disk, reading them once and storing them in binary form in memory would already increase speed.

Joost Buijs · Post by **Joost Buijs** » Thu Aug 30, 2018 8:36 am

I meant of course that I read and parse the examples from disk for each call to the error function and not for each iteration. Windows probably caches the examples in memory, but parsing them takes quite some time.

mar · Post by **mar** » Thu Aug 30, 2018 9:23 am

I don't remember the times but IIRC eval cache helps a ton. Of course you want to parallelize as well.
Texel tuning is very fast because you don't have to play actual games.

xr_a_y · Post by **xr_a_y** » Thu Aug 30, 2018 9:55 am

Thanks for the advices. I'll first try a simple GD then look at this very interesting Andrew's method.

Joost Buijs · Post by **Joost Buijs** » Thu Aug 30, 2018 9:57 am

mar wrote: ↑Thu Aug 30, 2018 9:23 am I don't remember the times but IIRC eval cache helps a ton. Of course you want to parallelize as well.
Texel tuning is very fast because you don't have to play actual games.

I think using eval cache when tuning is an error, each time you modify a term and call the evaluator you will get the cached value instead of the new value. The same holds for quiescence, if you use TT pruning in quiescence you have to disable it.

Texel tuning works better than I expected, but you really need a high number of training samples covering all terms you want to tune otherwise you'll get strange results.

Ronald · Post by **Ronald** » Thu Aug 30, 2018 10:14 am

I used the "quiet-labeled.epd" set created by Zurichess for Texel tuning, which contains 750.000 quiet positions. Because they are quiet you don't need to call quiescence but you can call the eval function directly. This saves a lot of time.

Joost Buijs · Post by **Joost Buijs** » Thu Aug 30, 2018 10:41 am

Ronald wrote: ↑Thu Aug 30, 2018 10:14 am I used the "quiet-labeled.epd" set created by Zurichess for Texel tuning, which contains 750.000 quiet positions. Because they are quiet you don't need to call quiescence but you can call the eval function directly. This saves a lot of time.

When I started with Texel tuning I used the same set, recently I generated a larger (quiet) set from the many computer games I collected over the years, this in the hope that these positions have a wider spread because they are played by many different engines with different opening lines. On the other hand you only want to have games played (or at least have the outcome analyzed) by the strongest engines, so I don't know if it is any better.

Texel tuning speed

Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed