Texel tuning speed

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Texel tuning speed

Post by xr_a_y »

I am trying Texel tuning and have a question about expected speed of the method.

I seems Weini is able to run the qsearch needed for each position and each evaluation of the error in around 0.06 millisecond.

Let's say I have only 100 000 positions and want to optimize 10 parameters.
It will requiere let's say at least 100 000 x 10 x 100 qsearch, so 2h40min of computation.

The same thing for 1M positions would take a least one day.
For 10M positions it would take a least 10 days.

Is that the timing you also get ?
AndrewGrant
Posts: 1750
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: Texel tuning speed

Post by AndrewGrant »

xr_a_y wrote: Thu Aug 30, 2018 12:16 am I am trying Texel tuning and have a question about expected speed of the method.

I seems Weini is able to run the qsearch needed for each position and each evaluation of the error in around 0.06 millisecond.

Let's say I have only 100 000 positions and want to optimize 10 parameters.
It will requiere let's say at least 100 000 x 10 x 100 qsearch, so 2h40min of computation.

The same thing for 1M positions would take a least one day.
For 10M positions it would take a least 10 days.

Is that the timing you also get ?
In Ethereal I use Stochastic Gradient Decent for the Texel process. All of my evaluation terms are linear in nature, and the entire evaluation function is wrapped in a tracing function. So when I call the evaluation (after using a qsearch to ensure a quiet position), I am given back a vector of coefficients for each eval term. IE if White has 5 pawns and black has 7, then the first entry in the vector, which corresponds to the Pawn value, is -2.

Using this method it only takes 1 qsearch call per position, indifferent of the number of terms. Startup for that is usually < 10 seconds.

From there, the actual Gradient Decent can usually reach ~ convergence in < 10,000 iterations, which is no more than a couple minutes on my 32 thread machine.

I would advise looking at this possibility, as the base texel tuning method is horrendously slow and inefficient.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Texel tuning speed

Post by jdart »

I am using batch gradient descent with ADAM. This method tunes all parameters simultaneously.

I actually don't use qsearch but a one-ply search during tuning.

10 million tuning positions x 750 or so parameters takes about 130 iterations to converge.

My tuner is multithreaded. On a 24 core machine tuning takes 1.5-2 hrs.

--Jon
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Texel tuning speed

Post by Joost Buijs »

I use plain GD, SGD seems to be sensitive to the order of the training positions, so I never tried. My error function calls the evaluator (with pawn hash disabled) and not the quiescence search, of course I tried both but the difference in final tuning is negligible.

When I start with reasonable values the number of iterations through the GD needed to reach good convergence usually lies between 100-200. Tuning ~400 evaluation terms with 7.5 million positions takes roughly 2 hrs on my 4GHz. 6950X using 20 threads (with hyperthreading).

There are still a number of things that could be improved to increase tuning speed, e.g. each iteration I read and parse the training positions from disk, reading them once and storing them in binary form in memory would already increase speed.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Texel tuning speed

Post by Joost Buijs »

I meant of course that I read and parse the examples from disk for each call to the error function and not for each iteration. Windows probably caches the examples in memory, but parsing them takes quite some time.
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Texel tuning speed

Post by mar »

I don't remember the times but IIRC eval cache helps a ton. Of course you want to parallelize as well.
Texel tuning is very fast because you don't have to play actual games.
Martin Sedlak
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Texel tuning speed

Post by xr_a_y »

Thanks for the advices. I'll first try a simple GD then look at this very interesting Andrew's method.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Texel tuning speed

Post by Joost Buijs »

mar wrote: Thu Aug 30, 2018 9:23 am I don't remember the times but IIRC eval cache helps a ton. Of course you want to parallelize as well.
Texel tuning is very fast because you don't have to play actual games.
I think using eval cache when tuning is an error, each time you modify a term and call the evaluator you will get the cached value instead of the new value. The same holds for quiescence, if you use TT pruning in quiescence you have to disable it.

Texel tuning works better than I expected, but you really need a high number of training samples covering all terms you want to tune otherwise you'll get strange results.
User avatar
Ronald
Posts: 160
Joined: Tue Jan 23, 2018 10:18 am
Location: Rotterdam
Full name: Ronald Friederich

Re: Texel tuning speed

Post by Ronald »

I used the "quiet-labeled.epd" set created by Zurichess for Texel tuning, which contains 750.000 quiet positions. Because they are quiet you don't need to call quiescence but you can call the eval function directly. This saves a lot of time.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Texel tuning speed

Post by Joost Buijs »

Ronald wrote: Thu Aug 30, 2018 10:14 am I used the "quiet-labeled.epd" set created by Zurichess for Texel tuning, which contains 750.000 quiet positions. Because they are quiet you don't need to call quiescence but you can call the eval function directly. This saves a lot of time.
When I started with Texel tuning I used the same set, recently I generated a larger (quiet) set from the many computer games I collected over the years, this in the hope that these positions have a wider spread because they are played by many different engines with different opening lines. On the other hand you only want to have games played (or at least have the outcome analyzed) by the strongest engines, so I don't know if it is any better.