## Texel tuning speed

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
xr_a_y
Posts: 1104
Joined: Sat Nov 25, 2017 1:28 pm
Location: France

### Texel tuning speed

I am trying Texel tuning and have a question about expected speed of the method.

I seems Weini is able to run the qsearch needed for each position and each evaluation of the error in around 0.06 millisecond.

Let's say I have only 100 000 positions and want to optimize 10 parameters.
It will requiere let's say at least 100 000 x 10 x 100 qsearch, so 2h40min of computation.

The same thing for 1M positions would take a least one day.
For 10M positions it would take a least 10 days.

Is that the timing you also get ?

AndrewGrant
Posts: 558
Joined: Tue Apr 19, 2016 4:08 am
Location: U.S.A
Full name: Andrew Grant
Contact:

### Re: Texel tuning speed

xr_a_y wrote:
Wed Aug 29, 2018 10:16 pm
I am trying Texel tuning and have a question about expected speed of the method.

I seems Weini is able to run the qsearch needed for each position and each evaluation of the error in around 0.06 millisecond.

Let's say I have only 100 000 positions and want to optimize 10 parameters.
It will requiere let's say at least 100 000 x 10 x 100 qsearch, so 2h40min of computation.

The same thing for 1M positions would take a least one day.
For 10M positions it would take a least 10 days.

Is that the timing you also get ?
In Ethereal I use Stochastic Gradient Decent for the Texel process. All of my evaluation terms are linear in nature, and the entire evaluation function is wrapped in a tracing function. So when I call the evaluation (after using a qsearch to ensure a quiet position), I am given back a vector of coefficients for each eval term. IE if White has 5 pawns and black has 7, then the first entry in the vector, which corresponds to the Pawn value, is -2.

Using this method it only takes 1 qsearch call per position, indifferent of the number of terms. Startup for that is usually < 10 seconds.

From there, the actual Gradient Decent can usually reach ~ convergence in < 10,000 iterations, which is no more than a couple minutes on my 32 thread machine.

I would advise looking at this possibility, as the base texel tuning method is horrendously slow and inefficient.

jdart
Posts: 3923
Joined: Fri Mar 10, 2006 4:23 am
Location: http://www.arasanchess.org

### Re: Texel tuning speed

I am using batch gradient descent with ADAM. This method tunes all parameters simultaneously.

I actually don't use qsearch but a one-ply search during tuning.

10 million tuning positions x 750 or so parameters takes about 130 iterations to converge.

My tuner is multithreaded. On a 24 core machine tuning takes 1.5-2 hrs.

--Jon

Joost Buijs
Posts: 1035
Joined: Thu Jul 16, 2009 8:47 am
Location: Almere, The Netherlands

### Re: Texel tuning speed

I use plain GD, SGD seems to be sensitive to the order of the training positions, so I never tried. My error function calls the evaluator (with pawn hash disabled) and not the quiescence search, of course I tried both but the difference in final tuning is negligible.

When I start with reasonable values the number of iterations through the GD needed to reach good convergence usually lies between 100-200. Tuning ~400 evaluation terms with 7.5 million positions takes roughly 2 hrs on my 4GHz. 6950X using 20 threads (with hyperthreading).

There are still a number of things that could be improved to increase tuning speed, e.g. each iteration I read and parse the training positions from disk, reading them once and storing them in binary form in memory would already increase speed.

Joost Buijs
Posts: 1035
Joined: Thu Jul 16, 2009 8:47 am
Location: Almere, The Netherlands

### Re: Texel tuning speed

I meant of course that I read and parse the examples from disk for each call to the error function and not for each iteration. Windows probably caches the examples in memory, but parsing them takes quite some time.

mar
Posts: 2108
Joined: Fri Nov 26, 2010 1:00 pm
Location: Czech Republic
Full name: Martin Sedlak

### Re: Texel tuning speed

I don't remember the times but IIRC eval cache helps a ton. Of course you want to parallelize as well.
Texel tuning is very fast because you don't have to play actual games.
Martin Sedlak

xr_a_y
Posts: 1104
Joined: Sat Nov 25, 2017 1:28 pm
Location: France

### Re: Texel tuning speed

Thanks for the advices. I'll first try a simple GD then look at this very interesting Andrew's method.

Joost Buijs
Posts: 1035
Joined: Thu Jul 16, 2009 8:47 am
Location: Almere, The Netherlands

### Re: Texel tuning speed

mar wrote:
Thu Aug 30, 2018 7:23 am
I don't remember the times but IIRC eval cache helps a ton. Of course you want to parallelize as well.
Texel tuning is very fast because you don't have to play actual games.
I think using eval cache when tuning is an error, each time you modify a term and call the evaluator you will get the cached value instead of the new value. The same holds for quiescence, if you use TT pruning in quiescence you have to disable it.

Texel tuning works better than I expected, but you really need a high number of training samples covering all terms you want to tune otherwise you'll get strange results.

Ronald
Posts: 105
Joined: Tue Jan 23, 2018 9:18 am
Location: Rotterdam
Full name: Ronald Friederich
Contact:

### Re: Texel tuning speed

I used the "quiet-labeled.epd" set created by Zurichess for Texel tuning, which contains 750.000 quiet positions. Because they are quiet you don't need to call quiescence but you can call the eval function directly. This saves a lot of time.

Joost Buijs
Posts: 1035
Joined: Thu Jul 16, 2009 8:47 am
Location: Almere, The Netherlands

### Re: Texel tuning speed

Ronald wrote:
Thu Aug 30, 2018 8:14 am
I used the "quiet-labeled.epd" set created by Zurichess for Texel tuning, which contains 750.000 quiet positions. Because they are quiet you don't need to call quiescence but you can call the eval function directly. This saves a lot of time.
When I started with Texel tuning I used the same set, recently I generated a larger (quiet) set from the many computer games I collected over the years, this in the hope that these positions have a wider spread because they are played by many different engines with different opening lines. On the other hand you only want to have games played (or at least have the outcome analyzed) by the strongest engines, so I don't know if it is any better.