txt: automated chess engine tuning

brtzsnr · Post by **brtzsnr** » Tue Jun 16, 2015 9:55 pm

Joerg Oster wrote: Now I wonder how you can get tuned values in 3 - 5 hours ...

It depends on your test size, the number of parameters and how well they are already optimized. What you can do is to optimize on a set with 10000/100000 and then switch to 50000/1000000. You can load the values from the previous optimizations.

Ferdy · Post by **Ferdy** » Tue Jun 16, 2015 11:11 pm

brtzsnr wrote:Hi, all!

I wrote a small framework to do automated chess tuning based on Texel's Tuning method (https://chessprogramming.wikispaces.com ... ing+Method). You can find the source code and instructions how to use it here https://bitbucket.org/brtzsnr/txt.

I'm still experimenting with it, so I cannot yet report any success. Nevertheless, feel free to experiment with txt and if you find it useful, please consider contributing.

Regards,

Would you explain further on the use of small and big training positions? As I understand in the Texel way, all training positions are used.

Now split epds into two sets, one large and one small. txt will optimize the score for the large set, but it will use the small set to guide the optimization. The small set should be at least 20x smaller, but it also should have a considerable number of positions in order to be a good estimator.

I have a tuner developed which I hope to publish once I am done with some testing, it uses the score of the pgn during the game to calculate ave error instead of the result.

But I will also try later using the result.
UCI engines that have some options can use the tool.
It operates via uci engine options, like the following, all spin type options can be included in the tuning. No space between option names. Later I will convert my check type options to spin type so that it will be included in the optimization. It is written in python.

Code: Select all

Deuterium v2015.1.35.194 64bit POPCNT
uci
id name Deuterium v2015.1.35.194
id author Ferdinand Mosca
option name Hash type spin default 64 min 8 max 2048
option name EvalHash type spin default 16 min 4 max 32
option name PawnHash type spin default 4 min 1 max 8
option name KingAndPawnHash type spin default 4 min 1 max 8
option name Log type check default false
option name Ponder type check default false
option name OwnBook type check default false
option name BookFile type string default <empty>
option name KingSafetyPercent type spin default 100 min 0 max 1000
option name MidGameMobilityPercent type spin default 100 min 0 max 1000
option name EndGameMobilityPercent type spin default 100 min 0 max 1000
option name OffensivePercent type spin default 100 min 100 max 10000
option name DefensivePercent type spin default 100 min 100 max 10000
option name PawnValueMg type spin default 89 min 70 max 110
option name PawnValueEg type spin default 118 min 90 max 150
option name KnightValueMg type spin default 325 min 300 max 350
option name KnightValueEg type spin default 315 min 300 max 350
option name BishopValueMg type spin default 328 min 300 max 350
option name BishopValueEg type spin default 327 min 300 max 350
option name RookValueMg type spin default 504 min 480 max 530
option name RookValueEg type spin default 498 min 480 max 530
option name QueenValueMg type spin default 996 min 950 max 1200
option name QueenValueEg type spin default 999 min 950 max 1200
option name PawnThreat type spin default 100 min 0 max 1000
option name KnightThreat type spin default 100 min 0 max 1000
option name BishopThreat type spin default 100 min 0 max 1000
option name RookThreat type spin default 100 min 0 max 1000
option name QueenThreat type spin default 100 min 0 max 1000
option name KingThreat type spin default 100 min 0 max 1000
option name Rook7thRankMg type spin default 15 min 0 max 200
option name Rook7thRankEg type spin default 50 min 0 max 200
option name Rook8thRankMg type spin default 15 min 0 max 200
option name Rook8thRankEg type spin default 50 min 0 max 200
option name TwoBishopAdv type spin default 50 min 20 max 80
option name KingShelterInitPenalty type spin default 80 min 50 max 200
option name KingShelterPercent type spin default 110 min 20 max 500
option name ConnectedPasserPercent type spin default 16 min 0 max 100
option name BlockedPasserPenaltyPercent type spin default 70 min 0 max 100
option name FutilityPruning type check default true
option name LateMoveReduction type check default true
option name NullMove type check default true
option name Razoring type check default true
option name GoodEvaluationPruning type check default true
option name RepetitionScore type spin default 0 min -500 max 500
option name InsufficientMaterialScore type spin default 0 min -500 max 500
option name FiftyMoveScore type spin default 0 min -500 max 500
option name StaleMateScore type spin default 0 min -30000 max 30000
option name TimeBufferMilliSec type spin default 60 min 0 max 1000
option name CheckExtension type check default true
option name EndingExtension type check default true
option name RecaptureExtension type check default true
option name PawnTo7thRankExtension type check default true
option name UseScorpioBitBases type check default false
option name AvailableScorpioEgbbMen type spin default 5 min 3 max 6
option name EgbbMinimumProbeDepth type spin default 8 min 1 max 128
option name ScorpioBitBasesPath type string default <empty>
option name ScorpioBitBasesCache type spin default 16 min 8 max 512
option name ScorpioBitBasesLoadType type combo default 4MEN var NONE var 4MEN va
r SMART var 5MEN
option name MultiPV type spin default 1 min 1 max 300
option name UCI_AnalyseMode type check default false
option name ClearHash type button
uciok

brtzsnr · Post by **brtzsnr** » Tue Jun 16, 2015 11:28 pm

Ferdy wrote:Would you explain further on the use of small and big training positions? As I understand in the Texel way, all training positions are used.

txt implements a simple hill climbing algorithm: take current values, mutate slightly, and if it gives a better score accept the new values.

txt tries to optimize parameters for the larger set. The small set is there only to weed out bad candidates and find potential good candidates. The current algorithms has a pool of candidates that have been measured on the small set. Then txt will pick from the pool that a set of parameters that has a small estimated value (on the small set). This set will be tested further on the large pool.

For example let's say one mutation is making Pawn 1000 and Queen 150. For these values almost any position will return a very high score. Therefore this set will not be considered for testing on the larger test. This way we can save lot of testing time.

Does it make sense? It's late here.

Ferdy · Post by **Ferdy** » Wed Jun 17, 2015 12:02 am

brtzsnr wrote:
Ferdy wrote:Would you explain further on the use of small and big training positions? As I understand in the Texel way, all training positions are used.
txt implements a simple hill climbing algorithm: take current values, mutate slightly, and if it gives a better score accept the new values.

txt tries to optimize parameters for the larger set. The small set is there only to weed out bad candidates and find potential good candidates. The current algorithms has a pool of candidates that have been measured on the small set. Then txt will pick from the pool that a set of parameters that has a small estimated value (on the small set). This set will be tested further on the large pool.

For example let's say one mutation is making Pawn 1000 and Queen 150. For these values almost any position will return a very high score. Therefore this set will not be considered for testing on the larger test. This way we can save lot of testing time.

Does it make sense? It's late here.

So you use first the small set to create candidates (combination of parameters and values) that performs well than the default values, once done you use those candidates in the larger set.

Joerg Oster · Post by **Joerg Oster** » Wed Jun 17, 2015 10:56 am

brtzsnr wrote:
Joerg Oster wrote: Now I wonder how you can get tuned values in 3 - 5 hours ...
It depends on your test size, the number of parameters and how well they are already optimized. What you can do is to optimize on a set with 10000/100000 and then switch to 50000/1000000. You can load the values from the previous optimizations.

I guess it also depends on the eval parameters and how common they are.

Here is an updated graph after about 14h

It looks like the pool converges to the estimation, and not to the computed score.
Is this a problem? Will they converge again, later?

I cancelled the process and restarted using the large set for both, estimation and computation. I will report later.

brtzsnr · Post by **brtzsnr** » Wed Jun 17, 2015 11:36 am

Joerg Oster wrote: It looks like the pool converges to the estimation, and not to the computed score.
Is this a problem? Will they converge again, later?

I cancelled the process and restarted using the large set for both, estimation and computation. I will report later.

The relation is more like computed = a * estimated + b that's why the lines intersect. Also, for me the texel score decreases much faster (10x). I would say that your values are good already.

Also if you restart the search you can use previous values. This is what I do. I optimize first with 10000/100000 and then increase to 200000/1000000 and another final round with larger numbers.

ps. You also use go depth 1 which for me is 2-3x slower.

Joerg Oster · Post by **Joerg Oster** » Wed Jun 17, 2015 1:51 pm

brtzsnr wrote:
Joerg Oster wrote: It looks like the pool converges to the estimation, and not to the computed score.
Is this a problem? Will they converge again, later?

I cancelled the process and restarted using the large set for both, estimation and computation. I will report later.
The relation is more like computed = a * estimated + b that's why the lines intersect. Also, for me the texel score decreases much faster (10x). I would say that your values are good already.

Of course. Almost all values have been more or less thoroughly tuned, so this is more fine-tuning.
My hope was that this tuning method could eventually work faster than Joona's SPSA-Tuner, and/or give better values.
At least it's fun to try (and learn!) something new.

brtzsnr wrote:Also if you restart the search you can use previous values. This is what I do. I optimize first with 10000/100000 and then increase to 200000/1000000 and another final round with larger numbers.

This is really a nice option.

brtzsnr wrote:ps. You also use go depth 1 which for me is 2-3x slower.

For me as well, but this tuning session is done with SF's internal eval command. I slightly modified the output to fit the needs of your framework.

Joerg Oster · Post by **Joerg Oster** » Wed Jun 17, 2015 2:03 pm

brtzsnr wrote:
Ferdy wrote:Would you explain further on the use of small and big training positions? As I understand in the Texel way, all training positions are used.
txt implements a simple hill climbing algorithm: take current values, mutate slightly, and if it gives a better score accept the new values.

txt tries to optimize parameters for the larger set. The small set is there only to weed out bad candidates and find potential good candidates. The current algorithms has a pool of candidates that have been measured on the small set. Then txt will pick from the pool that a set of parameters that has a small estimated value (on the small set). This set will be tested further on the large pool.

For example let's say one mutation is making Pawn 1000 and Queen 150. For these values almost any position will return a very high score. Therefore this set will not be considered for testing on the larger test. This way we can save lot of testing time.

Does it make sense? It's late here.

I see you're doing a tournament selection to pick a set of values from the pool. (First time I read about it, but it's an easy concept to grasp)
What if we would do this twice, and return the average values of both sets?
My consideration is that both sets might have good estimations because of changing different values.

Does that make any sense?

brtzsnr · Post by **brtzsnr** » Wed Jun 17, 2015 2:19 pm

Tournament selection is a common technique https://en.wikipedia.org/wiki/Tournament_selection

It's useful in genetic algorithms because it keeps some variation in the population.

What you are proposing is to introduce cross-overs in addition to mutations to the pool. It might work, there are different algorithms to do cross over:
* average (what you suggest)
* randomly pick half of the values from one set and the remaining from the other set
* use a random split point: pick n, take first n from one set,and the remaining for the other set.

Feel free to experiment. The only drawback that you end up tuning the tuner because you have to run several experiments to see which solution is better.

The current algorithm is from my attempts to add passed pawn evaluation, but so far despite getting a nice improvement in the texel score I could not get an ELO improvement.

Joerg Oster · Post by **Joerg Oster** » Wed Jun 17, 2015 7:00 pm

brtzsnr wrote:Tournament selection is a common technique https://en.wikipedia.org/wiki/Tournament_selection

It's useful in genetic algorithms because it keeps some variation in the population.

What you are proposing is to introduce cross-overs in addition to mutations to the pool. It might work, there are different algorithms to do cross over:
* average (what you suggest)
* randomly pick half of the values from one set and the remaining from the other set
* use a random split point: pick n, take first n from one set,and the remaining for the other set.

Feel free to experiment. The only drawback that you end up tuning the tuner because you have to run several experiments to see which solution is better.

The current algorithm is from my attempts to add passed pawn evaluation, but so far despite getting a nice improvement in the texel score I could not get an ELO improvement.

Now I run into the problem how to build the mean of the two value sets.

I have

v1, e1 := h.pool.value, h.pool.estimate

and after a second selection

v2, e2 := h.pool.value, h.pool.estimate

The simple

return (v1+v2)/2, (e1+e2)/2

does not work, of course.

Can you help, please?

txt: automated chess engine tuning

Re: txt: automated chess engine tuning

Re: txt: automated chess engine tuning

Re: txt: automated chess engine tuning

Re: txt: automated chess engine tuning

Re: txt: automated chess engine tuning

Re: txt: automated chess engine tuning

Re: txt: automated chess engine tuning

Re: txt: automated chess engine tuning

Re: txt: automated chess engine tuning

Re: txt: automated chess engine tuning