An interesting and perhaps workable approach could be based on something H.G. Muller proposed which he called orthogonal multi-testing. The primary characteristic of this approach is that you can test several things simultaneously. Let's say you had 50 parameters to be tuned and you wanted to try 1 of 2 different values for each of them. You could play, for example, 10,000 games where in each game the players had a random combination of each of the 50 settings.bhlangonijr wrote:What is the common approach to implement persistent positional learning in a chess engine?
I want to try it out using my engine and I was thinking about write a code that could automatically adjust pst values and other factors after each game and then save it to a "learning" file. Obviously the learning algorithm which will adjust the parameters is the most important here.
Is there any open source engine which implements positional learning?
The main question is: How would you do that? Any thoughts?
Thanks,
After the 10,000 games, about half the games will have a given parameter set 1 way and half will have it set the other way. You are interested in the games where having the parameter set one way competes with programs that have it set the other way which will be about half the games on average. The winner of these matches gives you an indication of the direction of adjustment. What is good about this idea is that you are testing a huge number of evaluation terms at the same time so it's orders of magnitude more efficient in some sense than testing a single change and playing auto-tests based on that.
This does not meet your criteria however. To make this persistent and incremental you could do something very similar. Each positional term that you want to be adjustable has a canonical value but before every game you modify all values slightly, either 1 step lower, or one step higher. For example if your bishop mobility values is 8, you change it to either 7 or 9 in a uniformly random way.
After the game is complete the result can be viewed as a vote in favor of, or against ALL the adjustments you made. You could accumulate the win/loss/draw record for each term (or use floating point values) and make an adjustment in the canonical value of the terms when enough evidence is accumulated in favor of a change. To give another example, let's say you
are using 8 for the bishop and testing 7 and 9, and at some point 9 gains a 50 game lead on the version with 7. Then you might decide to start using 9 as the "canonical" or reference value from now on.
Like every learning method, there are strengths and weaknesses. In multi-testing one of the weaknesses is the assumption is that the weight of one term has no correlation with the weight of another terms. You will find a LOT of learning algorithms have this same assumption built into them to one degree or another. In chess I'm sure there is correlation, but it's fairly weak (in my opinion) in most cases especially when you are talking about minor adjustments. Even when it isn't it may be somewhat self-correcting as no terms is likely to get way out of line.
Another downside to this is that you are never using "ideal" values when you have learning on. This method is basically a "try it and test it" approach so every time you play a game the program is a "different" program.
I'm sure that there are many improvements to the basic idea that I outlined above, and of course I give no guarantee's about how it will perform. It will require many games to make changes that are meaningful but I don't know of ANY learning methods that will not require a lot of games to produce meaningful results.
Another approach that may be more in line with what you want is called "temporal difference learning", and it is based on feedback from each move to the move that precedes it. For example if you play move 35 and the program thinks the position is equal, but then on move 36 you find that you are winning a pawn, it indicates that the evaluation of move 35 is in error, the position is better than the program thought it was. Little tiny incremental adjustments are made to the evaluation function so that it is ever so slightly biased in favor of being slightly more positive in this case, or slightly more negative in the case where you find your score is dropping. This is done recursively back through the moves of the game so that winning the game gives some credit to all the positions of the game. Look on the web and read up on the "credit assignment problem" and temporal difference learning. It's probably ideal for what you are looking for. It can be done at the end of the game one time and scores then updated. If you are not using floating point evaluation you may have to figure out how to modify this to be workable.