Revisiting GA's for tuning evaluation weights

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

CRoberson
Posts: 2056
Joined: Mon Mar 13, 2006 2:31 am
Location: North Carolina, USA

Re: Revisiting GA's for tuning evaluation weights

Post by CRoberson »

SuneF wrote:
ilari wrote:For a couple of years now I've been interested in tuning evaluation weights of a chess engine with a genetic algorithm. What has discouraged me are the disappointing results that others have gotten, and the huge amount of time it would take. But even though people smarter than me have tried it before and failed, there are a few reasons why I'm keen on discarding their results and doing it again:

Any advice, corrections, encouraging/discouraging anecdotes?
I'm mostly familiar with SA but it's a very similar idea of random mutation.
I see a basic problem with these tuning methods though.

The methods require that you be able to decide if a mutation is an improvement or not.

Basicly you have two possible senarios. Either there is an improvement or there isn't.

First consider the probability of actually producing an improvement by random (per)mutation.
As the program gets stronger, making any improvement gets harder or less likely.
However we can handtune the program pretty good so we can assume that it will be very hard to find an improvement by a random mutation.
This is the interesting domain to consider I think.

If the program is already fairly well tuned we can pretend there is a 5% chance we have an improvement and thus a 95% chance we don't.

Case 1: We beat the odds and made an improvement (5%):

90% chance we detect this correctly as being an improvement (depends on number of games).
10% chance we detect this falsely as not being an improvement (depends on number of games).

Case 2: We did not manage to produce an improvement (95%):

90% chance we detect this correctly as not being an improvement (depends on number of games).
10% chance we detect this as being an improvement (depends on number of games).


Now in SA, if it is detected as an improvement the new state is accepted, otherwise it is only accepted by some Boltzmann criteria.
However you already see the problem: there is a higher chance of it going uphill than downhill (95% x 10% > 5% x 90%).
In other words, the method seems to reach it limit when the probabilities even out.
Some math have been left out, but the Boltzmann criteria is not going to save it.

So yes, I'm just throwing out numbers here and one could improve on this by detecting with better accuracy whether the change is good or bad, ie. playing more games.
But realisticly, in SA you might need thousands of iterations and probably an ensemble of a few dozen.
So millions of games would be needed for a serious test.
Yes, that is exactly correct. I've been thinking about this myself and I coded an SA that I've used for tuning features to improve
test suite performances. One can isolate a set of features and tune those for a while then move on to the next set of features. That seems
to work in a reasonable time frame.

As HGM stated, the statistics you stated mean that you don't need a special function to allow it to hill-climb. Thus, the algorithm is very
simplistic to code! That is if you are using to improve game performance. It is still needed to improve test suite performance.