Texel tuning speed

Joost Buijs · Post by **Joost Buijs** » Fri Aug 31, 2018 7:43 am

Sven wrote: ↑Thu Aug 30, 2018 11:52 pm
xr_a_y wrote: ↑Thu Aug 30, 2018 12:16 am I seems Weini is able to run the qsearch needed for each position and each evaluation of the error in around 0.06 millisecond.

Let's say I have only 100 000 positions and want to optimize 10 parameters.
It will requiere let's say at least 100 000 x 10 x 100 qsearch, so 2h40min of computation.
Do you mean *one* qsearch() call takes 0.06 ms? That would be quite a lot, I don't think that is what you mean. On the other hand, 100,000 qsearch() calls plus one error calculation should also take much longer than 0.06 ms.

Indeed, these numbers seem a bit weird. It is a long time ago since I measured it but the evaluation function of my engine takes ~700 processor cycles, roughly 175 nS. at 4GHz. Most of the time quiescence takes < 1 uS. I don't think that for Weini this will be very different.

mar · Post by **mar** » Fri Aug 31, 2018 9:48 am

Sven wrote: ↑Thu Aug 30, 2018 11:33 pm I do not understand how an eval cache can help to speed up texel tuning.

It depends on what positions you use, since I extracted the positions from actual self-play games, they weren't actually "random" positions but rather naturally sorted as the individual games progressed, that's why eval cache helped a lot in my case.

xr_a_y · Post by **xr_a_y** » Fri Aug 31, 2018 9:53 am

Joost Buijs wrote: ↑Fri Aug 31, 2018 7:43 am
Sven wrote: ↑Thu Aug 30, 2018 11:52 pm
xr_a_y wrote: ↑Thu Aug 30, 2018 12:16 am I seems Weini is able to run the qsearch needed for each position and each evaluation of the error in around 0.06 millisecond.

Let's say I have only 100 000 positions and want to optimize 10 parameters.
It will requiere let's say at least 100 000 x 10 x 100 qsearch, so 2h40min of computation.
Do you mean *one* qsearch() call takes 0.06 ms? That would be quite a lot, I don't think that is what you mean. On the other hand, 100,000 qsearch() calls plus one error calculation should also take much longer than 0.06 ms.
Indeed, these numbers seem a bit weird. It is a long time ago since I measured it but the evaluation function of my engine takes ~700 processor cycles, roughly 175 nS. at 4GHz. Most of the time quiescence takes < 1 uS. I don't think that for Weini this will be very different.

Maybe I am doing something wrong but I compute $E = 1/N \sum_i R_i - S_i$ where S_i is the sigmoid that depend on scoring the ith position. So I run a qsearch for each i and each E computation. And I compute E quite often using the single minimal finding algorithm given on the wiki page about Texel Tuning.

Doing so I can easily measure the time requiere for computing E once and thus the mean time needed for one qsearch. And for now Weini seems quite slow at this. But i don't call qsearch directly, there is some context to initialize around each qsearch that may explain the slow down.

I'll be back with more timing soon.

xr_a_y · Post by **xr_a_y** » Fri Aug 31, 2018 10:25 am

ok getting ride of some useless context, I now get 0.01ms (10us) for the mean qsearch call. Given your comments, this is still too much ...

xr_a_y · Post by **xr_a_y** » Fri Aug 31, 2018 10:28 am

Sven wrote: ↑Thu Aug 30, 2018 11:52 pm Therefore my question is: how often do you calculate the error function, once per training set and per set of parameter values (as it is intended), or once per position (which I would not understand)?

I run one qsearch per position for each error computation.
I compute E once for each set of parameters.

xr_a_y · Post by **xr_a_y** » Fri Aug 31, 2018 11:01 am

Weini classic search is currently running at only 400knps single thread.

Which mean around 0.0025ms per evaluation.

One call to qsearch often leads to 5, 10 call to evaluation, so 0.01ms per qsearch seems at least coherent with Weini speed.

Weini is not using bitboards for move generation and many evaluation terms are still based on a mailbox data structure.

I see other engines running at 2Mnps on the same engine, this is 5 times better than Weini, maybe bitboard move generation and evaluation shall be the next step forward ...

Ronald · Post by **Ronald** » Fri Aug 31, 2018 11:23 am

Joost Buijs wrote: ↑Thu Aug 30, 2018 9:57 am In the training set not every position has to be unique and different from each other, a position that can be won in game A can be drawn or lost in game B, and this all averages out in the error function. When you train with unique positions only you have no statistical info whatsoever.

I think every position needs to be unique in the set otherwise the tuning will be much less effective. Worst case is 3 times the same position with 3 different outcomes. This is a waste of time because the total error over the 3 positions will be minimal when the eval is 0 (and thus probably all your parameters 0). 2 times the same position depends on the different outcomes Win-Lose also will draw eval to 0, Draw-Win/Lose will blurr the result. Most important in every position is that you get the result right.

Joost Buijs · Post by **Joost Buijs** » Fri Aug 31, 2018 4:56 pm

Ronald wrote: ↑Fri Aug 31, 2018 11:23 am
Joost Buijs wrote: ↑Thu Aug 30, 2018 9:57 am In the training set not every position has to be unique and different from each other, a position that can be won in game A can be drawn or lost in game B, and this all averages out in the error function. When you train with unique positions only you have no statistical info whatsoever.
I think every position needs to be unique in the set otherwise the tuning will be much less effective. Worst case is 3 times the same position with 3 different outcomes. This is a waste of time because the total error over the 3 positions will be minimal when the eval is 0 (and thus probably all your parameters 0). 2 times the same position depends on the different outcomes Win-Lose also will draw eval to 0, Draw-Win/Lose will blurr the result. Most important in every position is that you get the result right.

Well, I clearly have a different view. Of course there are positions that are clearly won or lost but for most positions this in not clear at all (otherwise chess would be solved) and then statistics will come into play. Everybody is entitled to do it in his own way of course.

Robert Pope · Post by **Robert Pope** » Fri Aug 31, 2018 9:16 pm

Joost Buijs wrote: ↑Fri Aug 31, 2018 4:56 pm
Ronald wrote: ↑Fri Aug 31, 2018 11:23 am
Joost Buijs wrote: ↑Thu Aug 30, 2018 9:57 am In the training set not every position has to be unique and different from each other, a position that can be won in game A can be drawn or lost in game B, and this all averages out in the error function. When you train with unique positions only you have no statistical info whatsoever.
I think every position needs to be unique in the set otherwise the tuning will be much less effective. Worst case is 3 times the same position with 3 different outcomes. This is a waste of time because the total error over the 3 positions will be minimal when the eval is 0 (and thus probably all your parameters 0). 2 times the same position depends on the different outcomes Win-Lose also will draw eval to 0, Draw-Win/Lose will blurr the result. Most important in every position is that you get the result right.
Well, I clearly have a different view. Of course there are positions that are clearly won or lost but for most positions this in not clear at all (otherwise chess would be solved) and then statistics will come into play. Everybody is entitled to do it in his own way of course.

I think the question is, which do you gain more information from? e.g. the same position from 10 different games, 6 of which are wins, or 10 unique positions, 6 of which are wins? Taken to the extreme, 200,000 instances of the same position would be awful for training, though it would give you a better measure of that position's win probability. We are learning based on both the board layout and the game score, so the more (realistic) variety we have in each, the better the training would be expected to go.

In practice, you are probably better off training on 200,000 positions, some of which appear multiple times, than filtering on unique positions and training on 190,000 positions.

Joost Buijs · Post by **Joost Buijs** » Sat Sep 01, 2018 7:09 am

Robert Pope wrote: ↑Fri Aug 31, 2018 9:16 pm
Joost Buijs wrote: ↑Fri Aug 31, 2018 4:56 pm
Ronald wrote: ↑Fri Aug 31, 2018 11:23 am
Joost Buijs wrote: ↑Thu Aug 30, 2018 9:57 am In the training set not every position has to be unique and different from each other, a position that can be won in game A can be drawn or lost in game B, and this all averages out in the error function. When you train with unique positions only you have no statistical info whatsoever.
I think every position needs to be unique in the set otherwise the tuning will be much less effective. Worst case is 3 times the same position with 3 different outcomes. This is a waste of time because the total error over the 3 positions will be minimal when the eval is 0 (and thus probably all your parameters 0). 2 times the same position depends on the different outcomes Win-Lose also will draw eval to 0, Draw-Win/Lose will blurr the result. Most important in every position is that you get the result right.
Well, I clearly have a different view. Of course there are positions that are clearly won or lost but for most positions this in not clear at all (otherwise chess would be solved) and then statistics will come into play. Everybody is entitled to do it in his own way of course.
I think the question is, which do you gain more information from? e.g. the same position from 10 different games, 6 of which are wins, or 10 unique positions, 6 of which are wins? Taken to the extreme, 200,000 instances of the same position would be awful for training, though it would give you a better measure of that position's win probability. We are learning based on both the board layout and the game score, so the more (realistic) variety we have in each, the better the training would be expected to go.

In practice, you are probably better off training on 200,000 positions, some of which appear multiple times, than filtering on unique positions and training on 190,000 positions.

Indeed, and this is why I am thinking to filter the positions in such a way that I keep track on the WLD score. In fact exactly what my book-generator does, just a binary three with hashes and scores, the only thing that I have to change is to add the full position to a node, and to output the position instead of the hash, either being it in binary or epd format.

Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed