Why computing K that minimizes the sigmoid func. value?...

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
petero2
Posts: 587
Joined: Mon Apr 19, 2010 5:07 pm
Location: Sweden
Contact:

Re: Why computing K that minimizes the sigmoid func. value?.

Post by petero2 » Sun Nov 29, 2015 2:01 pm

cdani wrote:All is working nicely, wining strength after each iteration on all the parameters. Thanks to all!!

I have used mutithread to do the computations, because it was very slow.
Current e = 0.0796898969 (28.2294%)

Has someone computed the "e" value for example for Stockfish? It makes any sense compute it? I really don't understand well the formulas, nor why this works. If someone tries to explain it, please, don't use maths :-)

Particularly, what is the error "e"? Where is the "error", mistake, think that is done badly? :-)
E is basically a measure of how well the evaluation function is able to predict the game outcome. I don't think the actual value of E is very interesting, because there are at least three factors that contribute to E:

1. Mistakes in the game continuation causing the game outcome to not be equal to the game theoretical value of the evaluated position.

2. Tactical aspects of a position that can not reasonably be modeled by the evaluation function.

3. Bad values of evaluation function parameters.

The goal of the algorithm is to improve 3, without being mislead by the "noise" introduced by 1 and 2.

If there are lots of positions in the training data, the effects of 1 and 2 will on average not depend much on the parameter values, so by varying the parameters to reduce E, the parameter values get better. However, the individual contributions from 1, 2 and 3 to E is not known, so you can't say how good the evaluation function is based on the E value alone.

User avatar
cdani
Posts: 2104
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Contact:

Re: Why computing K that minimizes the sigmoid func. value?.

Post by cdani » Sun Nov 29, 2015 8:08 pm

petero2 wrote: E is basically a measure of how well the evaluation function is able to predict the game outcome. I don't think the actual value of E is very interesting, because there are at least three factors that contribute to E:

1. Mistakes in the game continuation causing the game outcome to not be equal to the game theoretical value of the evaluated position.

2. Tactical aspects of a position that can not reasonably be modeled by the evaluation function.

3. Bad values of evaluation function parameters.

The goal of the algorithm is to improve 3, without being mislead by the "noise" introduced by 1 and 2.

If there are lots of positions in the training data, the effects of 1 and 2 will on average not depend much on the parameter values, so by varying the parameters to reduce E, the parameter values get better. However, the individual contributions from 1, 2 and 3 to E is not known, so you can't say how good the evaluation function is based on the E value alone.
Thanks! Now is clear. I hope this explanation is included here:
https://chessprogramming.wikispaces.com ... ing+Method

In this page there is another think I don't understand. You say:

"The 39.4 improvement came when I changed the criteria for which positions to include in the test set. Initially I removed also all positions where the q-search score deviated too much from the search score in the actual game (which conveniently is saved by cutechess-cli in PGN comments). I believed that including those positions would just raise the "noise level" of the data and cause a worse solution to be found. Apparently this is not the case. I now believe that even though including these positions causes noise, the q-search function has to deal with them all the time in real games, so trying to learn how those positions should be evaluated on average is still beneficial."

So if I understand well you say, "I won 39.4 changing the criteria, but apparently changing the criteria was not a good idea". So if it was a bad idea, how was able to win 39.4 elo?


Another think. I'm doing standard selfplay tests at short and long time controls after each iteration of the optimizing algorithm. Until now the new parameters has always won the previous version. After one iteration the version is quite worse, but the algorithm continues to run (in another computer), so it has not finished optimizing "E". Is an expected behavior? Or is supposed to be good to wait the algorithm to finish until no further improvement of "E" is possible, i.e. each new iteration on the parameters must be good?

Thanks!

User avatar
cdani
Posts: 2104
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Contact:

Re: Why computing K that minimizes the sigmoid func. value?.

Post by cdani » Sun Nov 29, 2015 10:11 pm

Another think I just thought. If I have already won 25 elo in self play, I suppose is better to run a new bunch of 60,000 games with the latest best version, as the new parameter values when tested will behave better correlation with the results of the games, so they will be optimized better.

But if this is true, why not use games of a stronger engine like Stockfish? It will be even better.

Probably I did not understand something if this is not true.

petero2
Posts: 587
Joined: Mon Apr 19, 2010 5:07 pm
Location: Sweden
Contact:

Re: Why computing K that minimizes the sigmoid func. value?.

Post by petero2 » Mon Nov 30, 2015 7:04 pm

cdani wrote:https://chessprogramming.wikispaces.com ... ing+Method

In this page there is another think I don't understand. You say:

"The 39.4 improvement came when I changed the criteria for which positions to include in the test set. Initially I removed also all positions where the q-search score deviated too much from the search score in the actual game (which conveniently is saved by cutechess-cli in PGN comments). I believed that including those positions would just raise the "noise level" of the data and cause a worse solution to be found. Apparently this is not the case. I now believe that even though including these positions causes noise, the q-search function has to deal with them all the time in real games, so trying to learn how those positions should be evaluated on average is still beneficial."

So if I understand well you say, "I won 39.4 changing the criteria, but apparently changing the criteria was not a good idea". So if it was a bad idea, how was able to win 39.4 elo?
It was a good idea. What I said was that before testing I thought the idea was bad, but after testing I realized that it was actually good.
cdani wrote:Another think. I'm doing standard selfplay tests at short and long time controls after each iteration of the optimizing algorithm. Until now the new parameters has always won the previous version. After one iteration the version is quite worse, but the algorithm continues to run (in another computer), so it has not finished optimizing "E". Is an expected behavior? Or is supposed to be good to wait the algorithm to finish until no further improvement of "E" is possible, i.e. each new iteration on the parameters must be good?
My intention was to let the optimization run until it could not improve E any more, and only then use the computed parameters to test if the playing strength was improved.

If you can afford the CPU time to test playing strength after each iteration, there is however no harm in that, except that you have to spend time performing the tests. It is even possible that some of the intermediate parameter values are stronger than the final values, but I don't know if that actually happens in practice.

petero2
Posts: 587
Joined: Mon Apr 19, 2010 5:07 pm
Location: Sweden
Contact:

Re: Why computing K that minimizes the sigmoid func. value?.

Post by petero2 » Mon Nov 30, 2015 7:18 pm

cdani wrote:Another think I just thought. If I have already won 25 elo in self play, I suppose is better to run a new bunch of 60,000 games with the latest best version, as the new parameter values when tested will behave better correlation with the results of the games, so they will be optimized better.
That might be better. The first time I did this I got an additional +13 elo in hyper-bullet self test. Trying it again later has not helped much though.
cdani wrote:But if this is true, why not use games of a stronger engine like Stockfish? It will be even better.
That could work, but I think the games from the stronger engine should also be hyper-bullet games, so that they contain some mistakes leading to unbalanced positions.

Personally I did not want to improve my engine by using data generated by an engine that I had not created myself, even though I don't think it would have been morally or legally wrong to do so.

mvk
Posts: 589
Joined: Tue Jun 04, 2013 8:15 pm

Re: Why computing K that minimizes the sigmoid func. value?.

Post by mvk » Mon Nov 30, 2015 10:05 pm

petero2 wrote:2. Tactical aspects of a position that can not reasonably be modeled by the evaluation function.
Correct. But there are certain values you can expect from a raw eval or a qsearch that doesn't change much from program to program. Once search gets involved, the error drops, and does so different for each program. Here you can see the result on sqrt(E) for different searches.

Image

I think it can be useful to go from qsearch to 1 or 2 ply searches if you have many parameters with little coverage per parameter (such as piece/square tables with a parameter for each individual square).

The problem I still have with this method is that correlation isn't causation, and there will be selection bias in the input games. For example, a white bishop on h7 might be very bad on average in normal play. But if players know that, they will avoid the bad cases and then the input games only show the cases where it works. The data from such games is then biassed. Adding a search might solve that because you invite a bit of the tradeoff in the optimisation. But in the end, playing games from the new vector is still the best thing, if it weren't so expensive.
[Account deleted]

User avatar
cdani
Posts: 2104
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Contact:

Re: Why computing K that minimizes the sigmoid func. value?.

Post by cdani » Mon Dec 07, 2015 10:29 pm

So I published the new version of Andscacs with this self tuning done.

The first 4th-6th iterations with the tuning where the best in improvements; I was testing after each iteration. The next after the last good was really bad, so I decided to generate a new bunch of games.

This time I obtained improvements, little ones, until the 9th iteration.

Here I tuned the piece values alone, not present in previous tunings, until the tuning finished. Was a little win. I have yet to try with partial tuning.

Then I added parameters related to passed pawns and I tuned all things related to passed pawns alone, obtaining a little win but only in the 4th first iterations.

Then I tuned some new parameters related with knights that converged quickly and was a little win.

A new general tuning try was bad from the first attempt, so I decided to stop here.

For some of the parameters I use, instead of an static value, a proportional value that is increased/reduced in powers of two and derived values. To tune them I used this function:

Code: Select all

int incrementa_redueix_proporcionalment(int v, int increment) {
	int r, j;
	if (increment == 0)
		return v;
	//64
	j = abs(increment);
	if (j == 1)
		r = FerPun(PunI(v) >> 5, PunF(v) >> 5); //2
	else if (j == 2)
		r = FerPun(PunI(v) >> 4, PunF(v) >> 4); //4
	else if (j == 3)
		r = FerPun(PunI(v) >> 3, PunF(v) >> 3); //8
	else if (j == 4)
		r = FerPun(PunI(v) >> 3, PunF(v) >> 3) + FerPun(PunI(v) >> 4, PunF(v) >> 4); //12
	else if (j == 5)
		r = FerPun(PunI(v) >> 2, PunF(v) >> 2); //16
	else if (j == 6)
		r = FerPun(PunI(v) >> 2, PunF(v) >> 2) + FerPun(PunI(v) >> 4, PunF(v) >> 4); //20
	else if (j == 7)
		r = FerPun(PunI(v) >> 2, PunF(v) >> 2) + FerPun(PunI(v) >> 3, PunF(v) >> 3); //24
	else if (j == 8)
		r = FerPun(PunI(v) >> 1, PunF(v) >> 1); //32
	else if (j == 9)
		r = FerPun(PunI(v) >> 1, PunF(v) >> 1) + FerPun(PunI(v) >> 3, PunF(v) >> 3); //40
	else if (j == 10)
		r = FerPun(PunI(v) >> 1, PunF(v) >> 1) + FerPun(PunI(v) >> 2, PunF(v) >> 2); //48
	else if (j == 11)
		r = FerPun(PunI(v) >> 1, PunF(v) >> 1) + FerPun(PunI(v) >> 2, PunF(v) >> 2) + FerPun(PunI(v) >> 3, PunF(v) >> 3); //56
	else if (j == 12)
		r = v; //64
	return increment > 0 ? v + r : v - r;
}
PunI means mg value, and PunF eg value.

I don't know if someone is using something similar. I have not seen something like this in other engines.

User avatar
Laskos
Posts: 9545
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Why computing K that minimizes the sigmoid func. value?.

Post by Laskos » Tue Dec 08, 2015 4:47 pm

cdani wrote:So I published the new version of Andscacs with this self tuning done.
Good tuning method for fooling the Similarity detector. 0.84 versus 0.83 has 50% similarity hit, much below 65%-75% of the successive versions of engines, and below 60% which starts to show positive as derivative.

User avatar
cdani
Posts: 2104
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Contact:

Re: Why computing K that minimizes the sigmoid func. value?.

Post by cdani » Tue Dec 08, 2015 5:03 pm

Laskos wrote:
cdani wrote:So I published the new version of Andscacs with this self tuning done.
Good tuning method for fooling the Similarity detector. 0.84 versus 0.83 has 50% similarity hit, much below 65%-75% of the successive versions of engines, and below 60% which starts to show positive as derivative.
Curious. I never used this tool. But I think your result is very logical, as hand tuning is, well, very manual :-)

User avatar
Evert
Posts: 2924
Joined: Fri Jan 21, 2011 11:42 pm
Location: NL
Contact:

Re: Why computing K that minimizes the sigmoid func. value?.

Post by Evert » Tue Dec 08, 2015 6:16 pm

cdani wrote:So I published the new version of Andscacs with this self tuning done.

The first 4th-6th iterations with the tuning where the best in improvements; I was testing after each iteration. The next after the last good was really bad, so I decided to generate a new bunch of games.
Beware that finding a drop in strength after an iteration doesn't necessarily mean that it'll keep getting worse: the next iteration might be better again. The landscape unlikely to be some nice smooth surface with a well-defined minimum that you can find easily, sometimes you have to "climb a hill" to find the (locally optimal) minimum.
Did you check whether the residual of the evaluation was still getting better? If you see that a new set of evaluation parameters doesn't improve the residual it may be worth it to reduce the amount by which the parameters are adjusted.
For some of the parameters I use, instead of an static value, a proportional value that is increased/reduced in powers of two and derived values. To tune them I used this function:

Code: Select all

int incrementa_redueix_proporcionalment(int v, int increment) {
	int r, j;
	if (increment == 0)
		return v;
	//64
	j = abs(increment);
	if (j == 1)
		r = FerPun(PunI(v) >> 5, PunF(v) >> 5); //2
	else if (j == 2)
		r = FerPun(PunI(v) >> 4, PunF(v) >> 4); //4
	else if (j == 3)
		r = FerPun(PunI(v) >> 3, PunF(v) >> 3); //8
	else if (j == 4)
		r = FerPun(PunI(v) >> 3, PunF(v) >> 3) + FerPun(PunI(v) >> 4, PunF(v) >> 4); //12
	else if (j == 5)
		r = FerPun(PunI(v) >> 2, PunF(v) >> 2); //16
	else if (j == 6)
		r = FerPun(PunI(v) >> 2, PunF(v) >> 2) + FerPun(PunI(v) >> 4, PunF(v) >> 4); //20
	else if (j == 7)
		r = FerPun(PunI(v) >> 2, PunF(v) >> 2) + FerPun(PunI(v) >> 3, PunF(v) >> 3); //24
	else if (j == 8)
		r = FerPun(PunI(v) >> 1, PunF(v) >> 1); //32
	else if (j == 9)
		r = FerPun(PunI(v) >> 1, PunF(v) >> 1) + FerPun(PunI(v) >> 3, PunF(v) >> 3); //40
	else if (j == 10)
		r = FerPun(PunI(v) >> 1, PunF(v) >> 1) + FerPun(PunI(v) >> 2, PunF(v) >> 2); //48
	else if (j == 11)
		r = FerPun(PunI(v) >> 1, PunF(v) >> 1) + FerPun(PunI(v) >> 2, PunF(v) >> 2) + FerPun(PunI(v) >> 3, PunF(v) >> 3); //56
	else if (j == 12)
		r = v; //64
	return increment > 0 ? v + r : v - r;
}
PunI means mg value, and PunF eg value.

I don't know if someone is using something similar. I have not seen something like this in other engines.
I can't really tell what that code does (the function names are meaningless to me), perhaps if you can explain more clearly what it's supposed to do...?

Post Reply