You are probably right that it does not hurt much, and may even help a bit, to avoid perfect uniqueness of training positions, for the reasons you have discussed. But we came from the question whether eval cache may help for texel tuning, and here I still believe that it has no measurable influence if I assume that duplicate positions with different game results do occur as an exception only, and with mostly unique positions and cache clearing after each parameter change there will be almost no benefit from that cache IMO.Joost Buijs wrote: ↑Sat Sep 01, 2018 7:09 amIndeed, and this is why I am thinking to filter the positions in such a way that I keep track on the WLD score. In fact exactly what my book-generator does, just a binary three with hashes and scores, the only thing that I have to change is to add the full position to a node, and to output the position instead of the hash, either being it in binary or epd format.Robert Pope wrote: ↑Fri Aug 31, 2018 9:16 pmI think the question is, which do you gain more information from? e.g. the same position from 10 different games, 6 of which are wins, or 10 unique positions, 6 of which are wins? Taken to the extreme, 200,000 instances of the same position would be awful for training, though it would give you a better measure of that position's win probability. We are learning based on both the board layout and the game score, so the more (realistic) variety we have in each, the better the training would be expected to go.Joost Buijs wrote: ↑Fri Aug 31, 2018 4:56 pmWell, I clearly have a different view. Of course there are positions that are clearly won or lost but for most positions this in not clear at all (otherwise chess would be solved) and then statistics will come into play. Everybody is entitled to do it in his own way of course.Ronald wrote: ↑Fri Aug 31, 2018 11:23 amI think every position needs to be unique in the set otherwise the tuning will be much less effective. Worst case is 3 times the same position with 3 different outcomes. This is a waste of time because the total error over the 3 positions will be minimal when the eval is 0 (and thus probably all your parameters 0). 2 times the same position depends on the different outcomes Win-Lose also will draw eval to 0, Draw-Win/Lose will blurr the result. Most important in every position is that you get the result right.Joost Buijs wrote: ↑Thu Aug 30, 2018 9:57 am In the training set not every position has to be unique and different from each other, a position that can be won in game A can be drawn or lost in game B, and this all averages out in the error function. When you train with unique positions only you have no statistical info whatsoever.
In practice, you are probably better off training on 200,000 positions, some of which appear multiple times, than filtering on unique positions and training on 190,000 positions.
Texel tuning speed
Moderators: hgm, Rebel, chrisw
-
- Posts: 4052
- Joined: Thu May 15, 2008 9:57 pm
- Location: Berlin, Germany
- Full name: Sven Schüle
Re: Texel tuning speed
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
-
- Posts: 1563
- Joined: Thu Jul 16, 2009 10:47 am
- Location: Almere, The Netherlands
Re: Texel tuning speed
My engine/evaluator doesn't use a full eval-cache so this is something I can't try. There is hardly any speed difference between enabling/disabling the pawn-cache when training, maybe this holds for a full eval-cache too. For the pawn-cache it depends strongly upon the order in which I apply the training positions, when they are in game-chronological order the pawn-cache helps somewhat more.Sven wrote: ↑Sat Sep 01, 2018 2:37 pmYou are probably right that it does not hurt much, and may even help a bit, to avoid perfect uniqueness of training positions, for the reasons you have discussed. But we came from the question whether eval cache may help for texel tuning, and here I still believe that it has no measurable influence if I assume that duplicate positions with different game results do occur as an exception only, and with mostly unique positions and cache clearing after each parameter change there will be almost no benefit from that cache IMO.Joost Buijs wrote: ↑Sat Sep 01, 2018 7:09 amIndeed, and this is why I am thinking to filter the positions in such a way that I keep track on the WLD score. In fact exactly what my book-generator does, just a binary three with hashes and scores, the only thing that I have to change is to add the full position to a node, and to output the position instead of the hash, either being it in binary or epd format.Robert Pope wrote: ↑Fri Aug 31, 2018 9:16 pmI think the question is, which do you gain more information from? e.g. the same position from 10 different games, 6 of which are wins, or 10 unique positions, 6 of which are wins? Taken to the extreme, 200,000 instances of the same position would be awful for training, though it would give you a better measure of that position's win probability. We are learning based on both the board layout and the game score, so the more (realistic) variety we have in each, the better the training would be expected to go.Joost Buijs wrote: ↑Fri Aug 31, 2018 4:56 pmWell, I clearly have a different view. Of course there are positions that are clearly won or lost but for most positions this in not clear at all (otherwise chess would be solved) and then statistics will come into play. Everybody is entitled to do it in his own way of course.Ronald wrote: ↑Fri Aug 31, 2018 11:23 amI think every position needs to be unique in the set otherwise the tuning will be much less effective. Worst case is 3 times the same position with 3 different outcomes. This is a waste of time because the total error over the 3 positions will be minimal when the eval is 0 (and thus probably all your parameters 0). 2 times the same position depends on the different outcomes Win-Lose also will draw eval to 0, Draw-Win/Lose will blurr the result. Most important in every position is that you get the result right.Joost Buijs wrote: ↑Thu Aug 30, 2018 9:57 am In the training set not every position has to be unique and different from each other, a position that can be won in game A can be drawn or lost in game B, and this all averages out in the error function. When you train with unique positions only you have no statistical info whatsoever.
In practice, you are probably better off training on 200,000 positions, some of which appear multiple times, than filtering on unique positions and training on 190,000 positions.
-
- Posts: 4366
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Texel tuning speed
Since I do batch training, the parameters don't change until a whole iteration through the training set and then there is no issue with turning caching on during a batch. If you use SGD though the parameters do change and so you can't cache evals or partial evals.
Also: my positions are not quiescent so I have to run a search (at least a qsearch) to get to a quiet position. But I only run this search once every few iterations. The other iterations use the positions from the last search (even though the parameters may have changed) and just call the eval for the position at the end of the PV. This was a trick from the MMTO algorithm used in Go. This speeds up things quite a bit since the iterations with search are much slower than those without.
--Jon
Also: my positions are not quiescent so I have to run a search (at least a qsearch) to get to a quiet position. But I only run this search once every few iterations. The other iterations use the positions from the last search (even though the parameters may have changed) and just call the eval for the position at the end of the PV. This was a trick from the MMTO algorithm used in Go. This speeds up things quite a bit since the iterations with search are much slower than those without.
--Jon
-
- Posts: 2554
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: Texel tuning speed
It seems I was too optimistic (it's ~4 years ago I did Texel tuning), but anyway I have some numbers (single-threaded performance) rather than guesswork:Sven wrote: ↑Sat Sep 01, 2018 2:37 pm But we came from the question whether eval cache may help for texel tuning, and here I still believe that it has no measurable influence if I assume that duplicate positions with different game results do occur as an exception only, and with mostly unique positions and cache clearing after each parameter change there will be almost no benefit from that cache IMO.
Code: Select all
65.4 time units with eval cache + pawn cache
71.1 time units without eval cache but with pawn cache
76.6 time units without eval cache and pawn cache
note that those positions follow the games; so scheduling worker threads is also important to make sure per-thread cache follows a couple of positions in the game sequentially
The best thing might be to pre-filter quiet positions (IIRC this is what Miguel did in Gaviota) - note that I didn't try this.
Martin Sedlak
-
- Posts: 4052
- Joined: Thu May 15, 2008 9:57 pm
- Location: Berlin, Germany
- Full name: Sven Schüle
Re: Texel tuning speed
Pawn cache helps, no doubt about that, I use it as well in my tuner. Your numbers indicate a speedup of 8% ((71.1-65.4)/71.1) for enabling eval cache. Ok, measurable at least but we can agree on the "not so stellar"mar wrote: ↑Sat Sep 01, 2018 7:33 pmIt seems I was too optimistic (it's ~4 years ago I did Texel tuning), but anyway I have some numbers (single-threaded performance) rather than guesswork:Sven wrote: ↑Sat Sep 01, 2018 2:37 pm But we came from the question whether eval cache may help for texel tuning, and here I still believe that it has no measurable influence if I assume that duplicate positions with different game results do occur as an exception only, and with mostly unique positions and cache clearing after each parameter change there will be almost no benefit from that cache IMO.
so 17%, not so stellar but those numbers hold for Cheng (YMMV)Code: Select all
65.4 time units with eval cache + pawn cache 71.1 time units without eval cache but with pawn cache 76.6 time units without eval cache and pawn cache
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
-
- Posts: 2554
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: Texel tuning speed
I think we can agree that the speedup would be 9% if you round properly
Anyway I found some old testpositions, prefiltered them and tried to tune piece values. The result seemed fine but I didn't do a full retune + gameplay verification.
The speedup of doing eval over qsearch was about 6-fold.
Download link is here: http://www.crabaware.com/positions/a.7z
The format is <double outcome from white's POV><space><FEN><unix EOL>
Contains about 12.7 million positions (Cheng selfplay games, no book moves, no mate moves, no stm in check, no draws by material, "quiet", dup positions not filtered)
Perhaps it might be useful (at least to stress test parser/tuner).
Martin Sedlak
-
- Posts: 1754
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: Texel tuning speed
I'll throw in my FEN sets for all ...mar wrote: ↑Sun Sep 02, 2018 1:53 amI think we can agree that the speedup would be 9% if you round properly
Anyway I found some old testpositions, prefiltered them and tried to tune piece values. The result seemed fine but I didn't do a full retune + gameplay verification.
The speedup of doing eval over qsearch was about 6-fold.
Download link is here: http://www.crabaware.com/positions/a.7z
The format is <double outcome from white's POV><space><FEN><unix EOL>
Contains about 12.7 million positions (Cheng selfplay games, no book moves, no mate moves, no stm in check, no draws by material, "quiet", dup positions not filtered)
Perhaps it might be useful (at least to stress test parser/tuner).
https://github.com/AndyGrant/TexelSets
Quick explanation of each data set ...
Ethereal4Per - ~350,000 self-play game at 1+.01s, with very aggressive adjudication. 4 positions taken from each game. No filtering.
LaserSpecial - Jeffrey An played some ~40,000 games, and then sampled the search tree from various positions. These are (I think) quiets only.
Stockfish1Per - Downloaded 3GB of games from fishtest ... ~900,000 games. 1 position taken from each, filtered out the known wins.
Stockfish3per - Same game set as above, but 3 positions with more aggressive known win/loss filtering.
I've had great success with LaserSpecial + Ethereal4Per.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
-
- Posts: 1871
- Joined: Sat Nov 25, 2017 2:28 pm
- Location: France
Re: Texel tuning speed
Ok i've implemented a simple GD method :
* compute the gradient with centered finite difference scheme for each parameter (this cost 3 evaluations of E for each parameter), using 1 (the smallest posible difference for piece value) as "delta" in each direction.
* do a line seach in the normalized gradient direction ( -g/norm(g)), this cost one evaluation of E per iteration of line search
As I found evaluation of E (with lots of position : 1 364 312) too expensive, I randomize a set of positions of good size (some 10 000s) at each gradient method iteration. I am using Ethereal.fens positions just given before on this thread.
To my surprise, using just piece values as parameter (for N,B,R,Q only) leads to nothing as the line search fail to find any improvement at the first
(or second ...) gradient method iteration ...
Starting from {N,B,R,Q}={315,325,500,900}, the first normalized gradient look like {0.7,0.04,0.47,0.48}
As piece value are integers, I use an initial gradient step of 2, 3 or 4 so that some new values are at least +1.
Does somebody see what I'm doing wrong ?
* compute the gradient with centered finite difference scheme for each parameter (this cost 3 evaluations of E for each parameter), using 1 (the smallest posible difference for piece value) as "delta" in each direction.
* do a line seach in the normalized gradient direction ( -g/norm(g)), this cost one evaluation of E per iteration of line search
As I found evaluation of E (with lots of position : 1 364 312) too expensive, I randomize a set of positions of good size (some 10 000s) at each gradient method iteration. I am using Ethereal.fens positions just given before on this thread.
To my surprise, using just piece values as parameter (for N,B,R,Q only) leads to nothing as the line search fail to find any improvement at the first
(or second ...) gradient method iteration ...
Starting from {N,B,R,Q}={315,325,500,900}, the first normalized gradient look like {0.7,0.04,0.47,0.48}
As piece value are integers, I use an initial gradient step of 2, 3 or 4 so that some new values are at least +1.
Does somebody see what I'm doing wrong ?
-
- Posts: 4366
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Texel tuning speed
There are so many published effective gradient methods I don't think you should invent your own.Ok i've implemented a simple GD method
But in any case I have found it helpful to always validate the gradients, at least while debugging. Compute the gradient, apply a small delta to each parameter, compute gradient*delta and add it to the base objective value. Then also compute the new objective with param + delta as input. You should get the same number within rounding error. If you aren't computing the gradient right you can't get convergence. If you do compute it right the only way you won't get improvement in the objective is if the steps sizes are grossly too big or too small.
--Jon
-
- Posts: 1871
- Joined: Sat Nov 25, 2017 2:28 pm
- Location: France
Re: Texel tuning speed
What may be too big or too small for changes in piece value ?
For now I am near 1 indeed, that may be too small ?
For now I am near 1 indeed, that may be too small ?