Tapered evaluation is a feature that is generally considered indispensable for strong engines nowadays. It is a feature which is logical (in the endgame I want my king in the center, but he'll get mated if he isn't in a corner early) but at the same time rather arbitrary. Why are we linearly interpolating between opening (or middlegame) and ending? Why only opening and endgame? Why are we only using material for our features and why are we giving each piece the specified value?

I would like to suggest a more theoretically sound approach to this problem, which is a rather standard idea from machine learning. Instead of using tapered evaluation functions I would like to recommend using conditional scores, with scores being weighted averages from a Gaussian Mixture Model. A GMM interprets our data (set of chess positions in our case) as stemming from a number of Gaussian distributions and associated weighting factor. Instead of calculating a score for opening and ending we can instead calculate a score for each of our GMM's K mixtures and then calculate a weighted average with the weights being the relative probability of the current position times a weighting factor.

We could interpret chess as having just two kinds of positions, opening and endgame, as we do with tapered evaluation functions, but that is not actually the case. For example chess players typically learn that in positions with opposite colored bishops, material is less valuable. If we are using a GMM we can have arbitrarily many different kinds of positions.

What the optimal number of mixtures is depends on our data, choice of features and how much time we are willing to spend in our evaluation function at run-time. The more mixtures we have the more data (or tuning iterations) we will need to tune our parameters, as each mixture has it's own copy of our evaluation parameters. Depending on our choice of features a GMM may or may not be able to recognize different types of positions, eg we have to explicitly calculate if there are opposite colored bishops if we want our GMM to assign such positions different weights. Finally the more mixtures we have the greater the number of scores we have to calculate in order to calculate the weighted average. Since our time is limited at run time, spending more time calculating a greater number of scores results in less time being available for search.

In order to implement a GMM I recommend googling and reading up on it, but I'll give a quick rundown here. The standard way to initialize a GMM is to run an expectation maximization (EM) algorithm. Essentially we do the following:

- 1. Load data

2. Choose differing random points to initialize the means of our Gaussians

3. Assign each point in our data a likelihood of belonging to each of our Gaussians. For example we could decide to assign the Gaussian with the closest mean to each data point a probability of 100%

4. For each Gaussian we recalculate our means and standard deviations based on our weighted data points. We also set the weighting factor of each Gaussian to be the respective sum of the weights of the data points.

5. Recalculate the relative probabilities of each data point belonging to each weighted gaussian.

6. Repeat steps 4 and 5 until we are happy

What do you guys think? Has this been recommended before? Perhaps someone could recommend a library so that not everyone wants to try this needs to implement GMM on their own?