Texel tuning speed

Sven · Post by **Sven** » Sat Sep 01, 2018 2:37 pm

Joost Buijs wrote: ↑Sat Sep 01, 2018 7:09 am
Robert Pope wrote: ↑Fri Aug 31, 2018 9:16 pm
Joost Buijs wrote: ↑Fri Aug 31, 2018 4:56 pm
Ronald wrote: ↑Fri Aug 31, 2018 11:23 am
Joost Buijs wrote: ↑Thu Aug 30, 2018 9:57 am In the training set not every position has to be unique and different from each other, a position that can be won in game A can be drawn or lost in game B, and this all averages out in the error function. When you train with unique positions only you have no statistical info whatsoever.
I think every position needs to be unique in the set otherwise the tuning will be much less effective. Worst case is 3 times the same position with 3 different outcomes. This is a waste of time because the total error over the 3 positions will be minimal when the eval is 0 (and thus probably all your parameters 0). 2 times the same position depends on the different outcomes Win-Lose also will draw eval to 0, Draw-Win/Lose will blurr the result. Most important in every position is that you get the result right.
Well, I clearly have a different view. Of course there are positions that are clearly won or lost but for most positions this in not clear at all (otherwise chess would be solved) and then statistics will come into play. Everybody is entitled to do it in his own way of course.
I think the question is, which do you gain more information from? e.g. the same position from 10 different games, 6 of which are wins, or 10 unique positions, 6 of which are wins? Taken to the extreme, 200,000 instances of the same position would be awful for training, though it would give you a better measure of that position's win probability. We are learning based on both the board layout and the game score, so the more (realistic) variety we have in each, the better the training would be expected to go.

In practice, you are probably better off training on 200,000 positions, some of which appear multiple times, than filtering on unique positions and training on 190,000 positions.
Indeed, and this is why I am thinking to filter the positions in such a way that I keep track on the WLD score. In fact exactly what my book-generator does, just a binary three with hashes and scores, the only thing that I have to change is to add the full position to a node, and to output the position instead of the hash, either being it in binary or epd format.

You are probably right that it does not hurt much, and may even help a bit, to avoid perfect uniqueness of training positions, for the reasons you have discussed. But we came from the question whether eval cache may help for texel tuning, and here I still believe that it has no measurable influence if I assume that duplicate positions with different game results do occur as an exception only, and with mostly unique positions and cache clearing after each parameter change there will be almost no benefit from that cache IMO.

Joost Buijs · Post by **Joost Buijs** » Sat Sep 01, 2018 3:36 pm

Sven wrote: ↑Sat Sep 01, 2018 2:37 pm
Joost Buijs wrote: ↑Sat Sep 01, 2018 7:09 am
Robert Pope wrote: ↑Fri Aug 31, 2018 9:16 pm
Joost Buijs wrote: ↑Fri Aug 31, 2018 4:56 pm
Ronald wrote: ↑Fri Aug 31, 2018 11:23 am
Joost Buijs wrote: ↑Thu Aug 30, 2018 9:57 am In the training set not every position has to be unique and different from each other, a position that can be won in game A can be drawn or lost in game B, and this all averages out in the error function. When you train with unique positions only you have no statistical info whatsoever.
I think every position needs to be unique in the set otherwise the tuning will be much less effective. Worst case is 3 times the same position with 3 different outcomes. This is a waste of time because the total error over the 3 positions will be minimal when the eval is 0 (and thus probably all your parameters 0). 2 times the same position depends on the different outcomes Win-Lose also will draw eval to 0, Draw-Win/Lose will blurr the result. Most important in every position is that you get the result right.
Well, I clearly have a different view. Of course there are positions that are clearly won or lost but for most positions this in not clear at all (otherwise chess would be solved) and then statistics will come into play. Everybody is entitled to do it in his own way of course.
I think the question is, which do you gain more information from? e.g. the same position from 10 different games, 6 of which are wins, or 10 unique positions, 6 of which are wins? Taken to the extreme, 200,000 instances of the same position would be awful for training, though it would give you a better measure of that position's win probability. We are learning based on both the board layout and the game score, so the more (realistic) variety we have in each, the better the training would be expected to go.

In practice, you are probably better off training on 200,000 positions, some of which appear multiple times, than filtering on unique positions and training on 190,000 positions.
Indeed, and this is why I am thinking to filter the positions in such a way that I keep track on the WLD score. In fact exactly what my book-generator does, just a binary three with hashes and scores, the only thing that I have to change is to add the full position to a node, and to output the position instead of the hash, either being it in binary or epd format.
You are probably right that it does not hurt much, and may even help a bit, to avoid perfect uniqueness of training positions, for the reasons you have discussed. But we came from the question whether eval cache may help for texel tuning, and here I still believe that it has no measurable influence if I assume that duplicate positions with different game results do occur as an exception only, and with mostly unique positions and cache clearing after each parameter change there will be almost no benefit from that cache IMO.

My engine/evaluator doesn't use a full eval-cache so this is something I can't try. There is hardly any speed difference between enabling/disabling the pawn-cache when training, maybe this holds for a full eval-cache too. For the pawn-cache it depends strongly upon the order in which I apply the training positions, when they are in game-chronological order the pawn-cache helps somewhat more.

jdart · Post by **jdart** » Sat Sep 01, 2018 6:47 pm

Since I do batch training, the parameters don't change until a whole iteration through the training set and then there is no issue with turning caching on during a batch. If you use SGD though the parameters do change and so you can't cache evals or partial evals.

Also: my positions are not quiescent so I have to run a search (at least a qsearch) to get to a quiet position. But I only run this search once every few iterations. The other iterations use the positions from the last search (even though the parameters may have changed) and just call the eval for the position at the end of the PV. This was a trick from the MMTO algorithm used in Go. This speeds up things quite a bit since the iterations with search are much slower than those without.

--Jon

mar · Post by **mar** » Sat Sep 01, 2018 7:33 pm

Sven wrote: ↑Sat Sep 01, 2018 2:37 pm But we came from the question whether eval cache may help for texel tuning, and here I still believe that it has no measurable influence if I assume that duplicate positions with different game results do occur as an exception only, and with mostly unique positions and cache clearing after each parameter change there will be almost no benefit from that cache IMO.

It seems I was too optimistic (it's ~4 years ago I did Texel tuning), but anyway I have some numbers (single-threaded performance) rather than guesswork:

Code: Select all

65.4 time units with eval cache + pawn cache
71.1 time units without eval cache but with pawn cache
76.6 time units without eval cache and pawn cache

so 17%, not so stellar but those numbers hold for Cheng (YMMV)

note that those positions follow the games; so scheduling worker threads is also important to make sure per-thread cache follows a couple of positions in the game sequentially

The best thing might be to pre-filter quiet positions (IIRC this is what Miguel did in Gaviota) - note that I didn't try this.

Sven · Post by **Sven** » Sat Sep 01, 2018 9:07 pm

mar wrote: ↑Sat Sep 01, 2018 7:33 pm
Sven wrote: ↑Sat Sep 01, 2018 2:37 pm But we came from the question whether eval cache may help for texel tuning, and here I still believe that it has no measurable influence if I assume that duplicate positions with different game results do occur as an exception only, and with mostly unique positions and cache clearing after each parameter change there will be almost no benefit from that cache IMO.
It seems I was too optimistic (it's ~4 years ago I did Texel tuning), but anyway I have some numbers (single-threaded performance) rather than guesswork:
Code: Select all
65.4 time units with eval cache + pawn cache
71.1 time units without eval cache but with pawn cache
76.6 time units without eval cache and pawn cache
so 17%, not so stellar but those numbers hold for Cheng (YMMV)

Pawn cache helps, no doubt about that, I use it as well in my tuner. Your numbers indicate a speedup of 8% ((71.1-65.4)/71.1) for enabling eval cache. Ok, measurable at least but we can agree on the "not so stellar"

mar · Post by **mar** » Sun Sep 02, 2018 1:53 am

Sven wrote: ↑Sat Sep 01, 2018 9:07 pmPawn cache helps, no doubt about that, I use it as well in my tuner. Your numbers indicate a speedup of 8% ((71.1-65.4)/71.1) for enabling eval cache. Ok, measurable at least but we can agree on the "not so stellar"

I think we can agree that the speedup would be 9% if you round properly

Anyway I found some old testpositions, prefiltered them and tried to tune piece values. The result seemed fine but I didn't do a full retune + gameplay verification.
The speedup of doing eval over qsearch was about 6-fold.
Download link is here: http://www.crabaware.com/positions/a.7z
The format is <double outcome from white's POV><space><FEN><unix EOL>
Contains about 12.7 million positions (Cheng selfplay games, no book moves, no mate moves, no stm in check, no draws by material, "quiet", dup positions not filtered)
Perhaps it might be useful (at least to stress test parser/tuner).

AndrewGrant · Post by **AndrewGrant** » Sun Sep 02, 2018 6:07 am

mar wrote: ↑Sun Sep 02, 2018 1:53 am
Sven wrote: ↑Sat Sep 01, 2018 9:07 pmPawn cache helps, no doubt about that, I use it as well in my tuner. Your numbers indicate a speedup of 8% ((71.1-65.4)/71.1) for enabling eval cache. Ok, measurable at least but we can agree on the "not so stellar"
I think we can agree that the speedup would be 9% if you round properly

Anyway I found some old testpositions, prefiltered them and tried to tune piece values. The result seemed fine but I didn't do a full retune + gameplay verification.
The speedup of doing eval over qsearch was about 6-fold.
Download link is here: http://www.crabaware.com/positions/a.7z
The format is <double outcome from white's POV><space><FEN><unix EOL>
Contains about 12.7 million positions (Cheng selfplay games, no book moves, no mate moves, no stm in check, no draws by material, "quiet", dup positions not filtered)
Perhaps it might be useful (at least to stress test parser/tuner).

I'll throw in my FEN sets for all ...

https://github.com/AndyGrant/TexelSets

Quick explanation of each data set ...

Ethereal4Per - ~350,000 self-play game at 1+.01s, with very aggressive adjudication. 4 positions taken from each game. No filtering.
LaserSpecial - Jeffrey An played some ~40,000 games, and then sampled the search tree from various positions. These are (I think) quiets only.
Stockfish1Per - Downloaded 3GB of games from fishtest ... ~900,000 games. 1 position taken from each, filtered out the known wins.
Stockfish3per - Same game set as above, but 3 positions with more aggressive known win/loss filtering.

I've had great success with LaserSpecial + Ethereal4Per.

xr_a_y · Post by **xr_a_y** » Sun Sep 02, 2018 8:08 pm

Ok i've implemented a simple GD method :
* compute the gradient with centered finite difference scheme for each parameter (this cost 3 evaluations of E for each parameter), using 1 (the smallest posible difference for piece value) as "delta" in each direction.
* do a line seach in the normalized gradient direction ( -g/norm(g)), this cost one evaluation of E per iteration of line search

As I found evaluation of E (with lots of position : 1 364 312) too expensive, I randomize a set of positions of good size (some 10 000s) at each gradient method iteration. I am using Ethereal.fens positions just given before on this thread.

To my surprise, using just piece values as parameter (for N,B,R,Q only) leads to nothing as the line search fail to find any improvement at the first
(or second ...) gradient method iteration ...

Starting from {N,B,R,Q}={315,325,500,900}, the first normalized gradient look like {0.7,0.04,0.47,0.48}

As piece value are integers, I use an initial gradient step of 2, 3 or 4 so that some new values are at least +1.

Does somebody see what I'm doing wrong ?

jdart · Post by **jdart** » Mon Sep 03, 2018 12:09 am

Ok i've implemented a simple GD method

There are so many published effective gradient methods I don't think you should invent your own.

But in any case I have found it helpful to always validate the gradients, at least while debugging. Compute the gradient, apply a small delta to each parameter, compute gradient*delta and add it to the base objective value. Then also compute the new objective with param + delta as input. You should get the same number within rounding error. If you aren't computing the gradient right you can't get convergence. If you do compute it right the only way you won't get improvement in the objective is if the steps sizes are grossly too big or too small.

--Jon

xr_a_y · Post by **xr_a_y** » Mon Sep 03, 2018 7:04 am

What may be too big or too small for changes in piece value ?

For now I am near 1 indeed, that may be too small ?

Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed