txt: automated chess engine tuning

tpetzke · Post by **tpetzke** » Wed Apr 01, 2015 9:04 am

I also made another attempt where I modified qsearch() to actually save fens of all quiet positions it encountered during the search with certain low probability. After running a few thousand games, I had a collection of several millions positions.

Problem here is what do you do with that positions ? If you take from positions that actually occurred in the game, you can say there is a certain relation between the position and the final game outcome. A position that qSearch encountered in a some part part of the tree has only a very weak relationship to the game outcome (if any at all).

I also thought about this collection method because most of the positions an engine has to process are not those that are actually occurring later. But you still need a method to qualify the position.

You could use the evaluation of a stronger engine for that position. That is fast but you somehow you then just capture the knowledge of that engine (not so cool)

You could run a few hundred selfplay games with this position as starting point and translate the winning probability into a score. But this is beyond my available computing resources.

So unfortunately I also have not a good recipe found so far.

Thomas...

Joerg Oster · Post by **Joerg Oster** » Wed Apr 01, 2015 1:46 pm

brtzsnr wrote:Which values did you try to optimize? How common are they (e.g. white king on 8th rank in middle game is not very common)?

I'm just wondering if it would help the hill-climbing algorithm to do the first base score computation with all values set to their minimum ... just an idea.

Joerg Oster · Post by **Joerg Oster** » Wed Apr 01, 2015 8:06 pm

Joerg Oster wrote:
brtzsnr wrote:Which values did you try to optimize? How common are they (e.g. white king on 8th rank in middle game is not very common)?
I'm just wondering if it would help the hill-climbing algorithm to do the first base score computation with all values set to their minimum ... just an idea.

Tried it, no good idea. Forget it!

Joerg Oster · Post by **Joerg Oster** » Thu Apr 02, 2015 12:28 pm

Joerg Oster wrote:
Joerg Oster wrote:
brtzsnr wrote:Which values did you try to optimize? How common are they (e.g. white king on 8th rank in middle game is not very common)?
I'm just wondering if it would help the hill-climbing algorithm to do the first base score computation with all values set to their minimum ... just an idea.
Tried it, no good idea. Forget it!

But maybe you want to try this change in txt.go:

Code: Select all

		// Mutate current values.
		// Pick few candidates with different mutation characteristics and estimate the score.
		testScore &#58;= math.Inf&#40;+1&#41;
=>		for t &#58;= -2; t <= 2; t++ &#123;

This should allow more evenly distributed mutations in both directions, + and -. Shouldn't it?
At least in my tests it seemed to help a bit.

brtzsnr · Post by **brtzsnr** » Thu Apr 02, 2015 11:55 pm

Base is the score without any adjustments. It doesn't make sense to change it because you want to know when a better at of parameters was found.

For the other thing: NormFloat64 produces both positive and negative numbers. It is Gaussian distribution with mean 0 and standard deviation 1. Nevertheless I'll try with your code. Thanks.

Joerg Oster · Post by **Joerg Oster** » Tue Apr 07, 2015 9:27 pm

brtzsnr wrote:Base is the score without any adjustments. It doesn't make sense to change it because you want to know when a better at of parameters was found.

For the other thing: NormFloat64 produces both positive and negative numbers. It is Gaussian distribution with mean 0 and standard deviation 1. Nevertheless I'll try with your code. Thanks.

Hi,
do you plan to further improve upon this?
I noticed that most often only the first value/parameter is being changed, and the others remain unchanged. And depending on the first set of values, I very often get a different set of optimized values when repeating the tuning.

Therefore I changed txt.go to loop over all values and do an estimation run for each parameter. (hopefully no bug)
Here is the code:

Code: Select all

// Run does a hill-climbing to find the base configuration.
func Run&#40;format, run string, opts &#91;&#93;Option&#41; &#123;
	flag.Parse&#40;)
	log.SetFlags&#40;log.Ltime | log.Lshortfile&#41;
	rand.Seed&#40;time.Now&#40;).UnixNano&#40;))

	var err error
	var flogto *os.File
	if *logto != "" &#123;
		flogto, err = os.Create&#40;*logto&#41;
		if err != nil &#123;
			log.Fatal&#40;err&#41;
		&#125;
		defer flogto.Close&#40;)
	&#125;

	// Setup evaluator and compute the base score that needs to be improved.
	tmpl, err &#58;= template.New&#40;"option").Parse&#40;format&#41;
	if err != nil &#123;
		log.Fatal&#40;err&#41;
	&#125;
	fit, err &#58;= newFitness&#40;tmpl, run, flag.Args&#40;))
	if err != nil &#123;
		log.Fatal&#40;err&#41;
	&#125;
	base, err &#58;= fit.evaluate&#40;"txtrun", nil, nil&#41;
	if err != nil &#123;
		log.Fatal&#40;"evaluate&#58; ", err&#41;
	&#125;

	// Try a few starting vectors.
	vals &#58;= make&#40;&#91;&#93;float64, len&#40;opts&#41;)
	for i, o &#58;= range opts &#123;
		vals&#91;i&#93; = o.Min + &#40;o.Max-o.Min&#41;/4
	&#125;

	// Do the optimization.
	start &#58;= time.Now&#40;)
	score &#58;= math.Inf&#40;+1&#41;

	test &#58;= make&#40;&#91;&#93;float64, len&#40;opts&#41;)
	temp &#58;= make&#40;&#91;&#93;float64, len&#40;opts&#41;)

	for step &#58;= 0; step < *steps; step++ &#123;
		// Mutate current values.
		// Pick few candidates with different mutation characteristics and estimate the score.
		for i, o &#58;= range opts &#123;
			testScore &#58;= math.Inf&#40;+1&#41;
        	d &#58;= &#40;o.Max-o.Min&#41;/&#40;10+float64&#40;step/2&#41;)
        	if d < 1 &#123;
        		d = float64&#40;1&#41;
			&#125;
			for t &#58;= -1; t <= 1; t++ &#123;
				temp&#91;i&#93; = o.bound&#40;vals&#91;i&#93; + float64&#40;t&#41;*d&#41;

				score1, err &#58;= fit.evaluate&#40;"txtest", opts, temp&#41;
				if err != nil &#123;
					log.Fatal&#40;err&#41;
				&#125;
				if score1 < testScore &#123;
					testScore = score1
					copy&#40;test, temp&#41;
				&#125;
			&#125;

			// Test is a candidate, evaluate on the full set.
			score1, err &#58;= fit.evaluate&#40;"txtrun", opts, test&#41;
			if err != nil &#123;
				log.Fatal&#40;"evaluate&#58; ", err&#41;
			&#125;

			if score1+1e-8 < score &#123; // Accepts candidates.
				score = score1
				vals&#91;i&#93; = test&#91;i&#93;
			
				if score < base &#123;
					log.Println&#40;"#", step, "new score =", score1, "; base =", base&#41;
					write&#40;os.Stdout, "", tmpl, opts, vals&#41;
				&#125;
			&#125;

			if flogto != nil &#123;
				fmt.Fprintln&#40;flogto, time.Now&#40;).Sub&#40;start&#41;.Seconds&#40;), base, score, score1&#41;
			&#125;
		&#125;
	&#125;
&#125;

This works much better for me, and I get very similar optimum values when repeating the tuning process.
Maybe you want to give this a try as well.

brtzsnr · Post by **brtzsnr** » Fri Apr 10, 2015 6:21 pm

Joerg Oster wrote: This works much better for me, and I get very similar optimum values when repeating the tuning process.
Maybe you want to give this a try as well.

I tried your suggestion and did not work better.

The hill-climbing algorithm I implemented works as follows:
1) Pick a few variables to mutate (the number of variables is controlled by t)
2) Mutate slightly those variables
3) Estimate for each t and each new set of values
4) Pick the set with the best estimation
5) Run full evaluation

Mutating all variables introduces too much noise and the solution doesn't converge at all. Doing negative t doesn't make sense. However if we consider The No Free Lunch Theorem http://en.wikipedia.org/wiki/No_free_lu ... timization any search algorithm performs the same averaged over all problems in the class. So in the end is just a matter of luck. If your changes work better for you, please use them.

What I did so far is tuning all evaluation parameters over the Easter weekend. I tuned only FigureBonus, Mobility, DoublePawn BishopPair and PawnPSqT. The latest tests show that I'm about -5 to -10 ELO from the original evaluation (which included PSQT for every piece, but only Rook mobility). King PSQT might make a difference.

Nevertheless, the values I'm testing right now are here https://bitbucket.org/brtzsnr/zurichess ... =prob-king . The only thing I don't understand is the extremely low value, 17, of the Pawn in middle game. Other numbers look perfectly ok.

Joerg Oster · Post by **Joerg Oster** » Fri Apr 10, 2015 8:33 pm

brtzsnr wrote:
Joerg Oster wrote: This works much better for me, and I get very similar optimum values when repeating the tuning process.
Maybe you want to give this a try as well.
I tried your suggestion and did not work better.

The hill-climbing algorithm I implemented works as follows:
1) Pick a few variables to mutate (the number of variables is controlled by t)
2) Mutate slightly those variables
3) Estimate for each t and each new set of values
4) Pick the set with the best estimation
5) Run full evaluation

Thanks for the explanation.
There are 2 things I don't fully understand, however.

Could you explain these 2 lines of code, please?

Code: Select all

=>			mutationRate &#58;= ft / &#40;ft + float64&#40;len&#40;vals&#41;))
			for has &#58;= false; !has; &#123; // loop until at least one value was modified
				for i, o &#58;= range opts &#123;
=>					if i == 0 || rand.Float64&#40;) < mutationRate &#123;

brtzsnr wrote:Mutating all variables introduces too much noise and the solution doesn't converge at all. Doing negative t doesn't make sense. However if we consider The No Free Lunch Theorem http://en.wikipedia.org/wiki/No_free_lu ... timization any search algorithm performs the same averaged over all problems in the class. So in the end is just a matter of luck. If your changes work better for you, please use them.

Well, it seems my approach only works for a small number of parameters and a limited range of values. Currently I'm running the penalties (mid and endgame) for doubled, isolated and backward pawns of Stockfish. 48 parameters. Doesn't look good so far.

brtzsnr wrote:What I did so far is tuning all evaluation parameters over the Easter weekend. I tuned only FigureBonus, Mobility, DoublePawn BishopPair and PawnPSqT. The latest tests show that I'm about -5 to -10 ELO from the original evaluation (which included PSQT for every piece, but only Rook mobility). King PSQT might make a difference.

Nevertheless, the values I'm testing right now are here https://bitbucket.org/brtzsnr/zurichess ... =prob-king . The only thing I don't understand is the extremely low value, 17, of the Pawn in middle game. Other numbers look perfectly ok.

Joerg Oster · Post by **Joerg Oster** » Sat Apr 11, 2015 8:45 am

brtzsnr wrote:
Joerg Oster wrote: This works much better for me, and I get very similar optimum values when repeating the tuning process.
Maybe you want to give this a try as well.
I tried your suggestion and did not work better.

The hill-climbing algorithm I implemented works as follows:
1) Pick a few variables to mutate (the number of variables is controlled by t)
2) Mutate slightly those variables
3) Estimate for each t and each new set of values
4) Pick the set with the best estimation
5) Run full evaluation

Mutating all variables introduces too much noise and the solution doesn't converge at all. Doing negative t doesn't make sense. However if we consider The No Free Lunch Theorem http://en.wikipedia.org/wiki/No_free_lu ... timization any search algorithm performs the same averaged over all problems in the class. So in the end is just a matter of luck. If your changes work better for you, please use them.

What I did so far is tuning all evaluation parameters over the Easter weekend. I tuned only FigureBonus, Mobility, DoublePawn BishopPair and PawnPSqT. The latest tests show that I'm about -5 to -10 ELO from the original evaluation (which included PSQT for every piece, but only Rook mobility). King PSQT might make a difference.

Nevertheless, the values I'm testing right now are here https://bitbucket.org/brtzsnr/zurichess ... =prob-king . The only thing I don't understand is the extremely low value, 17, of the Pawn in middle game. Other numbers look perfectly ok.

I understand that this random change of the values prevents us to get stuck in a local optimum.
But it also bears the risk that some values are not being changed as often as others, or maybe even not at all!? (Maybe that's the cause for your low pawn value?)

So how about a combination of both ideas?
In general, use your approach, and randomly change a subset of all parameters. But every n-th iteration, maybe n = 10, loop over all parameters and try a random change on each one individually.
What do you think?

brtzsnr · Post by **brtzsnr** » Sat Apr 11, 2015 12:46 pm

That should work. "Random restarts" is a common optimization for the hill climbing.

txt: automated chess engine tuning

Re: txt: automated chess engine tuning

Re: txt: automated chess engine tuning

Re: txt: automated chess engine tuning

Re: txt: automated chess engine tuning

Re: txt: automated chess engine tuning

Re: txt: automated chess engine tuning

Re: txt: automated chess engine tuning

Re: txt: automated chess engine tuning

Re: txt: automated chess engine tuning

Re: txt: automated chess engine tuning