I just started looking at that - very interesting. However, it seems that TreeStrap's biggest strength is being able to quickly get to reasonable values from a random start. I'd be curious to see how well it trains relative to TD-Leaf when we already have reasonable values, a more common jumping off point for most of us.Gerd Isenberg wrote:Isn't Joel Veness' RootStrap or even TreeStrap as applied in Meep supposed to gain better or faster results than TD-Leaf?
Evaluation Tuning
Moderators: hgm, Rebel, chrisw
-
- Posts: 558
- Joined: Sat Mar 25, 2006 8:27 pm
Re: Evaluation Tuning
-
- Posts: 4367
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Evaluation Tuning
Nobody doubts that reinforcement learning can improve evaluation functions, especially if you are starting from a really basic one.
But there isn't much published evidence (that I know of) that this method can improve such functions to the point that they outperform a good hand-coded evaluation. But maybe I'm wrong and this is possible.
--Jon
But there isn't much published evidence (that I know of) that this method can improve such functions to the point that they outperform a good hand-coded evaluation. But maybe I'm wrong and this is possible.
--Jon
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: Evaluation Tuning
Nice idea especially the collection of training positions, and you are usingFabio Gobbato wrote:I'm trying a new tuning method. I have only 2 attempts with 2 good results since now. so I'm not sure it's a very good method, but in the first attempt I have had a +12 ELO and in the second a +8 ELO.
I extract 150 millions positions, it doesn't matter the quality of the games but it's better if they are quite differents each other.
From them I remove these position:
- in check
- with fewer than 7 pieces (it's good if the engine uses the tablebases)
- static eval different from qsearch score (quiet positions)
Then I run a 2 depth search for every position and I store in a file the position with the score returned.
Ones I have this file I compare every single score of the position with the static evaluation.
I compute the error for a position as the absolute difference of the 2 score.
Then I use an optimization algorithm to minimize the error.
The idea is that it's difficult for the evaluation to see the results of the position but it's easier that it can see the eval from 2 ply ahead.
Another advantage is that this method minimize the difference from related position.
I have made only 2 iteration of this method with a total +20ELO improvement.
If someone else would try we can compare the results.
150 million!! We could be similar in the use of eval, you are using the score from a 2-depth search, while I am using the score from the move
comments from the engine. And you are using static eval similar to what Miguel is using, and I am using qsearch similar to Texel. So how many eval parameters you have allowed the tuner to tune?
Thanks for sharing.
-
- Posts: 793
- Joined: Sun Aug 03, 2014 4:48 am
- Location: London, UK
Re: Evaluation Tuning
I went from material-only to near Stockfish level (eval only), so it's possible. However, my learned eval function is also orders of magnitude slower than Stockfish's.jdart wrote:Nobody doubts that reinforcement learning can improve evaluation functions, especially if you are starting from a really basic one.
But there isn't much published evidence (that I know of) that this method can improve such functions to the point that they outperform a good hand-coded evaluation. But maybe I'm wrong and this is possible.
--Jon
This is with an almost completely-generic function, though (neural network), and not tuning parameters on hand-coded features.
But no, it hasn't been published yet.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
-
- Posts: 793
- Joined: Sun Aug 03, 2014 4:48 am
- Location: London, UK
Re: Evaluation Tuning
That's very interesting! They somehow escaped my literature review.Gerd Isenberg wrote:Isn't Joel Veness' RootStrap or even TreeStrap as applied in Meep supposed to gain better or faster results than TD-Leaf?
TreeStrap does sound very interesting. It requires quite an invasive change on the engine, though, since training is also done on internal nodes. I will give it a try when I have time.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
-
- Posts: 433
- Joined: Fri Jan 16, 2015 4:02 pm
Re: Evaluation Tuning
As Ferdinand noted this is similar to TEXEL's tuning method. I have my own tuner https://bitbucket.org/zurichess/txt that was discussed here http://talkchess.com/forum/viewtopic.php?t=55696.
I used a set of 1.000.000 EPDs positions to tune my engine. Parsing normally takes about 25% which is fine given that the tuner can be applied to any UCI engine and it didn't add almost any code to mine (except for some minor micro-optimizations).
Things that I noticed with my tuner:
* Going over 1mil positions doesn't help that much, ELO wise.
* ELO improvement depends a lot on the starting conditions (starting values, games played, etc).
* Lower score strongly correlates with higher ELO for the same starting conditions.
* Using quiet positions is better (but I haven't done as much extensive testing on tactical positions).
* Positions from hyperbullet games are better, probably because evaluation is more relevant.
I used a set of 1.000.000 EPDs positions to tune my engine. Parsing normally takes about 25% which is fine given that the tuner can be applied to any UCI engine and it didn't add almost any code to mine (except for some minor micro-optimizations).
Things that I noticed with my tuner:
* Going over 1mil positions doesn't help that much, ELO wise.
* ELO improvement depends a lot on the starting conditions (starting values, games played, etc).
* Lower score strongly correlates with higher ELO for the same starting conditions.
* Using quiet positions is better (but I haven't done as much extensive testing on tactical positions).
* Positions from hyperbullet games are better, probably because evaluation is more relevant.
-
- Posts: 217
- Joined: Fri Apr 11, 2014 10:45 am
- Full name: Fabio Gobbato
Re: Evaluation Tuning
All my evaluation parameters are in an array so I could tune all at once, but in these tests I tune about 80 parameters.
-
- Posts: 858
- Joined: Wed Mar 08, 2006 9:24 pm
- Location: Germany
- Full name: Daniel Mehrmann
Re: Evaluation Tuning
Thanks for sharing your ideas.
Well, i'm working with Jörg tuning idea with epd's. At the moment it's just a "play ground". I need to find more time to work on it.
My code isn't yet ready, but useable.
However, here is a clean fruit manner written code if someone need it.
https://www.dropbox.com/sh/781g8fuonz6i ... aSvaLrLFTa
Regards
Daniel
Well, i'm working with Jörg tuning idea with epd's. At the moment it's just a "play ground". I need to find more time to work on it.
My code isn't yet ready, but useable.
However, here is a clean fruit manner written code if someone need it.
https://www.dropbox.com/sh/781g8fuonz6i ... aSvaLrLFTa
Regards
Daniel
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: Evaluation Tuning
I am not into tuning now, I am reviewing and refactoring my code, I realized tuning is useless when the code is not thoroughly checked and later you will find some bugs. I have done some tuning before that was only around 20 params to test the tuner, I could get +20 wins in around 700 games, in only 1.8 million training pos after around 5 hrs of tuning. I have difficulty raising my traning positions because I am limited to the shredder gui pgn output where there is move comments in the game, as I extract the move eval in this game. I will probably try your method of collecting games. I might spend a 2 ply analysis time but I could get a lot of positions, not 150 million of course. Or perhaps I will just run a tourney of different engines using cutechess-cli at very fast TC then extract the pos and move eval there, no need to run a 2-depth analysis.Fabio Gobbato wrote:All my evaluation parameters are in an array so I could tune all at once, but in these tests I tune about 80 parameters.
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: Evaluation Tuning
Ferdy wrote:I am not into tuning now, I am reviewing and refactoring my code, I realized tuning is useless when the code is not thoroughly checked and later you will find some bugs. I have done some tuning before that was only around 20 params to test the tuner, I could get +20 wins in around 700 games, in only 1.8 million training pos after around 5 hrs of tuning. I have difficulty raising my traning positions because I am limited to the shredder gui pgn output where there is move comments in the game, as I extract the move eval in this game. I will probably try your method of collecting pos. I might spend a 2 ply analysis time but I could get a lot of positions, not 150 million of course. Or perhaps I will just run a tourney of different engines using cutechess-cli at very fast TC then extract the pos and move eval there, no need to run a 2-depth analysis.Fabio Gobbato wrote:All my evaluation parameters are in an array so I could tune all at once, but in these tests I tune about 80 parameters.