Evaluation Tuning

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Robert Pope
Posts: 558
Joined: Sat Mar 25, 2006 8:27 pm

Re: Evaluation Tuning

Post by Robert Pope »

Gerd Isenberg wrote:Isn't Joel Veness' RootStrap or even TreeStrap as applied in Meep supposed to gain better or faster results than TD-Leaf?
I just started looking at that - very interesting. However, it seems that TreeStrap's biggest strength is being able to quickly get to reasonable values from a random start. I'd be curious to see how well it trains relative to TD-Leaf when we already have reasonable values, a more common jumping off point for most of us.
jdart
Posts: 4367
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Evaluation Tuning

Post by jdart »

Nobody doubts that reinforcement learning can improve evaluation functions, especially if you are starting from a really basic one.

But there isn't much published evidence (that I know of) that this method can improve such functions to the point that they outperform a good hand-coded evaluation. But maybe I'm wrong and this is possible.

--Jon
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Evaluation Tuning

Post by Ferdy »

Fabio Gobbato wrote:I'm trying a new tuning method. I have only 2 attempts with 2 good results since now. so I'm not sure it's a very good method, but in the first attempt I have had a +12 ELO and in the second a +8 ELO.

I extract 150 millions positions, it doesn't matter the quality of the games but it's better if they are quite differents each other.
From them I remove these position:
- in check
- with fewer than 7 pieces (it's good if the engine uses the tablebases)
- static eval different from qsearch score (quiet positions)
Then I run a 2 depth search for every position and I store in a file the position with the score returned.

Ones I have this file I compare every single score of the position with the static evaluation.
I compute the error for a position as the absolute difference of the 2 score.

Then I use an optimization algorithm to minimize the error.

The idea is that it's difficult for the evaluation to see the results of the position but it's easier that it can see the eval from 2 ply ahead.
Another advantage is that this method minimize the difference from related position.

I have made only 2 iteration of this method with a total +20ELO improvement.
If someone else would try we can compare the results.
Nice idea especially the collection of training positions, and you are using
150 million!! We could be similar in the use of eval, you are using the score from a 2-depth search, while I am using the score from the move
comments from the engine. And you are using static eval similar to what Miguel is using, and I am using qsearch similar to Texel. So how many eval parameters you have allowed the tuner to tune?
Thanks for sharing.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Evaluation Tuning

Post by matthewlai »

jdart wrote:Nobody doubts that reinforcement learning can improve evaluation functions, especially if you are starting from a really basic one.

But there isn't much published evidence (that I know of) that this method can improve such functions to the point that they outperform a good hand-coded evaluation. But maybe I'm wrong and this is possible.

--Jon
I went from material-only to near Stockfish level (eval only), so it's possible. However, my learned eval function is also orders of magnitude slower than Stockfish's.

This is with an almost completely-generic function, though (neural network), and not tuning parameters on hand-coded features.

But no, it hasn't been published yet.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Evaluation Tuning

Post by matthewlai »

Gerd Isenberg wrote:Isn't Joel Veness' RootStrap or even TreeStrap as applied in Meep supposed to gain better or faster results than TD-Leaf?
That's very interesting! They somehow escaped my literature review.

TreeStrap does sound very interesting. It requires quite an invasive change on the engine, though, since training is also done on internal nodes. I will give it a try when I have time.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 4:02 pm

Re: Evaluation Tuning

Post by brtzsnr »

As Ferdinand noted this is similar to TEXEL's tuning method. I have my own tuner https://bitbucket.org/zurichess/txt that was discussed here http://talkchess.com/forum/viewtopic.php?t=55696.

I used a set of 1.000.000 EPDs positions to tune my engine. Parsing normally takes about 25% which is fine given that the tuner can be applied to any UCI engine and it didn't add almost any code to mine (except for some minor micro-optimizations).

Things that I noticed with my tuner:

* Going over 1mil positions doesn't help that much, ELO wise.
* ELO improvement depends a lot on the starting conditions (starting values, games played, etc).
* Lower score strongly correlates with higher ELO for the same starting conditions.
* Using quiet positions is better (but I haven't done as much extensive testing on tactical positions).
* Positions from hyperbullet games are better, probably because evaluation is more relevant.
User avatar
Fabio Gobbato
Posts: 217
Joined: Fri Apr 11, 2014 10:45 am
Full name: Fabio Gobbato

Re: Evaluation Tuning

Post by Fabio Gobbato »

All my evaluation parameters are in an array so I could tune all at once, but in these tests I tune about 80 parameters.
User avatar
Daniel Mehrmann
Posts: 858
Joined: Wed Mar 08, 2006 9:24 pm
Location: Germany
Full name: Daniel Mehrmann

Re: Evaluation Tuning

Post by Daniel Mehrmann »

Thanks for sharing your ideas. :)

Well, i'm working with Jörg tuning idea with epd's. At the moment it's just a "play ground". I need to find more time to work on it.

My code isn't yet ready, but useable.
However, here is a clean fruit manner written code if someone need it.

https://www.dropbox.com/sh/781g8fuonz6i ... aSvaLrLFTa

Regards
Daniel
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Evaluation Tuning

Post by Ferdy »

Fabio Gobbato wrote:All my evaluation parameters are in an array so I could tune all at once, but in these tests I tune about 80 parameters.
I am not into tuning now, I am reviewing and refactoring my code, I realized tuning is useless when the code is not thoroughly checked and later you will find some bugs. I have done some tuning before that was only around 20 params to test the tuner, I could get +20 wins in around 700 games, in only 1.8 million training pos after around 5 hrs of tuning. I have difficulty raising my traning positions because I am limited to the shredder gui pgn output where there is move comments in the game, as I extract the move eval in this game. I will probably try your method of collecting games. I might spend a 2 ply analysis time but I could get a lot of positions, not 150 million of course. Or perhaps I will just run a tourney of different engines using cutechess-cli at very fast TC then extract the pos and move eval there, no need to run a 2-depth analysis.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Evaluation Tuning

Post by Ferdy »

Ferdy wrote:
Fabio Gobbato wrote:All my evaluation parameters are in an array so I could tune all at once, but in these tests I tune about 80 parameters.
I am not into tuning now, I am reviewing and refactoring my code, I realized tuning is useless when the code is not thoroughly checked and later you will find some bugs. I have done some tuning before that was only around 20 params to test the tuner, I could get +20 wins in around 700 games, in only 1.8 million training pos after around 5 hrs of tuning. I have difficulty raising my traning positions because I am limited to the shredder gui pgn output where there is move comments in the game, as I extract the move eval in this game. I will probably try your method of collecting pos. I might spend a 2 ply analysis time but I could get a lot of positions, not 150 million of course. Or perhaps I will just run a tourney of different engines using cutechess-cli at very fast TC then extract the pos and move eval there, no need to run a 2-depth analysis.