Texel tuning speed

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Ronald
Posts: 160
Joined: Tue Jan 23, 2018 10:18 am
Location: Rotterdam
Full name: Ronald Friederich

Re: Texel tuning speed

Post by Ronald »

I would expect that a correct outcome of the position is important for the tuning result, and if you use the outcome of those games which are probably played with different rated engines the "true" outcome may differ more often from the game outcome and thus create error. There's only one way to find out however.., did you already make a comparison ?
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Texel tuning speed

Post by Joost Buijs »

Ronald wrote: Thu Aug 30, 2018 11:16 am I would expect that a correct outcome of the position is important for the tuning result, and if you use the outcome of those games which are probably played with different rated engines the "true" outcome may differ more often from the game outcome and thus create error. There's only one way to find out however.., did you already make a comparison ?
No, I didn't make a comparison yet, when I can find some time for it I will.

Somehow I have the feeling that with enough games statistics will make up for the lower quality (or uncertain outcome) of the games played by weaker engines.

Most of the time I spend working on a new version of my engine which is atm basic search with material and psq only. Right now I'm busy implementing YBW using C++11 threads, in the past I always used Windows threads, things like mutexes, locks and condition variables behave somewhat differently now. Maybe lazy SMP would be a better choice, at least 10 times easier to implement.
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Texel tuning speed

Post by mar »

Joost Buijs wrote: Thu Aug 30, 2018 9:57 am I think using eval cache when tuning is an error, each time you modify a term and call the evaluator you will get the cached value instead of the new value. The same holds for quiescence, if you use TT pruning in quiescence you have to disable it.
Not at all, you clear it before each iteration, so you modify your parameter(s), then run through the set of positions with eval cache enabled.
Martin Sedlak
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Texel tuning speed

Post by Joost Buijs »

mar wrote: Thu Aug 30, 2018 11:53 am
Joost Buijs wrote: Thu Aug 30, 2018 9:57 am I think using eval cache when tuning is an error, each time you modify a term and call the evaluator you will get the cached value instead of the new value. The same holds for quiescence, if you use TT pruning in quiescence you have to disable it.
Not at all, you clear it before each iteration, so you modify your parameter(s), then run through the set of positions with eval cache enabled.
You're right, that is something I didn't think of. I only use a pawn eval cache, currently disabled when tuning. To be honest, I didn't look at it very carefully yet, and never spent any effort to make it faster.
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Texel tuning speed

Post by jdart »

By the way, I have also tried a recent method called SVRG (https://papers.nips.cc/paper/4937-accel ... uction.pdf). It is an improved version of SGD. But I couldn't get it to work. I am pretty sure that is an implementation issue and not a defect in the algorithm, which does seem promising.

--Jon
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Texel tuning speed

Post by Sven »

Ronald wrote: Thu Aug 30, 2018 10:14 am I used the "quiet-labeled.epd" set created by Zurichess for Texel tuning, which contains 750.000 quiet positions. Because they are quiet you don't need to call quiescence but you can call the eval function directly. This saves a lot of time.
I do exactly the same for Jumbo ("quiet-labeled.epd" and only calling eval), and I think my tuning runs quite fast. I have also implemented parallel computation of the E function for different training positions. Jumbo currently has 2 * 137 = 274 eval parameters (MG + EG) which I always tune all at once (about 10 of them are excluded from tuning, e.g. the EG pawn material value). But my eval function is not very complex, and more than half of the parameters belong to the PST. Basically I use the original Texel tuning method. When using 8 threads in parallel the time needed for one iteration, i.e. one walk over all parameters where some of them are modified, is about 2-3 minutes. The number of iterations until convergence obviously depends on many factors and can't be predicted but I never observed much more than 100 iterations so the longest tuning run that I can remember was roughly about 4 hours (not sure though since the last time was several months ago already). The first complete tuning of Jumbo (performed end of 2017) gave an improvement of about 100 Elo points.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Texel tuning speed

Post by Sven »

Joost Buijs wrote: Thu Aug 30, 2018 12:05 pm
mar wrote: Thu Aug 30, 2018 11:53 am
Joost Buijs wrote: Thu Aug 30, 2018 9:57 am I think using eval cache when tuning is an error, each time you modify a term and call the evaluator you will get the cached value instead of the new value. The same holds for quiescence, if you use TT pruning in quiescence you have to disable it.
Not at all, you clear it before each iteration, so you modify your parameter(s), then run through the set of positions with eval cache enabled.
You're right, that is something I didn't think of. I only use a pawn eval cache, currently disabled when tuning. To be honest, I didn't look at it very carefully yet, and never spent any effort to make it faster.
I do not understand how an eval cache can help to speed up texel tuning. To my knowledge an eval cache stores the result of the whole evalution call as one value per position. Since you need to clear that cache after modifying any eval parameter, and since all training positions are different from each other, a texel tuning implementation that callls eval() (with 100% quiet positions) will not get any benefit from using an eval cache since no position will be evaluated more than once for the same set of parameter values. The same basically holds for implementations calling qsearch() since there it is possible in rare cases that one training position in the input file can lead to another training position by playing a capture sequence that can also be part of qsearch() of the first position, but this is certainly an exception.

Pawn hash helps of course, I use this in Jumbo. I also have implemented some logic that tries to avoid clearing the pawn hash whenever it is clear that this would not change anything. This is a further speedup of the tuning run.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Texel tuning speed

Post by Sven »

xr_a_y wrote: Thu Aug 30, 2018 12:16 am I seems Weini is able to run the qsearch needed for each position and each evaluation of the error in around 0.06 millisecond.

Let's say I have only 100 000 positions and want to optimize 10 parameters.
It will requiere let's say at least 100 000 x 10 x 100 qsearch, so 2h40min of computation.
Do you mean *one* qsearch() call takes 0.06 ms? That would be quite a lot, I don't think that is what you mean. On the other hand, 100,000 qsearch() calls plus one error calculation should also take much longer than 0.06 ms.

Therefore my question is: how often do you calculate the error function, once per training set and per set of parameter values (as it is intended), or once per position (which I would not understand)?
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Texel tuning speed

Post by Sven »

Sven wrote: Thu Aug 30, 2018 11:24 pm I have also implemented parallel computation of the E function for different training positions.
This may sound confusing and is actually wrong, what I meant was I compute eval() and the corresponding Sigmoid value in parallel for different training positions.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Texel tuning speed

Post by Joost Buijs »

Sven wrote: Thu Aug 30, 2018 11:33 pm
Joost Buijs wrote: Thu Aug 30, 2018 12:05 pm
mar wrote: Thu Aug 30, 2018 11:53 am
Joost Buijs wrote: Thu Aug 30, 2018 9:57 am I think using eval cache when tuning is an error, each time you modify a term and call the evaluator you will get the cached value instead of the new value. The same holds for quiescence, if you use TT pruning in quiescence you have to disable it.
Not at all, you clear it before each iteration, so you modify your parameter(s), then run through the set of positions with eval cache enabled.
You're right, that is something I didn't think of. I only use a pawn eval cache, currently disabled when tuning. To be honest, I didn't look at it very carefully yet, and never spent any effort to make it faster.
I do not understand how an eval cache can help to speed up texel tuning. To my knowledge an eval cache stores the result of the whole evalution call as one value per position. Since you need to clear that cache after modifying any eval parameter, and since all training positions are different from each other, a texel tuning implementation that callls eval() (with 100% quiet positions) will not get any benefit from using an eval cache since no position will be evaluated more than once for the same set of parameter values. The same basically holds for implementations calling qsearch() since there it is possible in rare cases that one training position in the input file can lead to another training position by playing a capture sequence that can also be part of qsearch() of the first position, but this is certainly an exception.

Pawn hash helps of course, I use this in Jumbo. I also have implemented some logic that tries to avoid clearing the pawn hash whenever it is clear that this would not change anything. This is a further speedup of the tuning run.
In the training set not every position has to be unique and different from each other, a position that can be won in game A can be drawn or lost in game B, and this all averages out in the error function. When you train with unique positions only you have no statistical info whatsoever.

Another possibility might be to prepare a training set with unique positions only that doesn't score 0, 0.5 or 1.0 but an average between 0 and 1.0. I've been thinking about this but I don't know if it mathematically amounts to the same, something I still have to look at.