naive bayes classifier

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 2:27 pm

naive bayes classifier

Post by Don » Tue Jul 21, 2009 7:53 pm

Has anyone experimented with naive bayes classifiers for computer chess?

I was thinking in the context of endgame databases, specifically the win/loss/draw databases.

One could combine a set of basic observations about the position and get the classifiers opinion on the "probability" that a position is a win, loss or draw.

Such a thing might form a pretty good evaluation function for a specific ending, even if it's not perfect.

In recent years there has been a resurgence of interest in these classifiers, as they are really simple to code up, do not use as much memory as neural networks and many other classifiers, are very fast and are surprisingly powerful.

What appeals to me about them is that for chess endings you can probably make it as good as you want by helping it out with hand-crafted attributes. For instance in KPvsK, if one were building a classifier, you might code up as an attribute whether the white king has the opposition - surely this attribute would contribute to the accuracy of the classifier.

I am thinking about grabbing the KRPKR database and experimenting and was wondering what if anything has already been tried.

Dann Corbit
Posts: 10124
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: naive bayes classifier

Post by Dann Corbit » Tue Jul 21, 2009 8:34 pm

Don wrote:Has anyone experimented with naive bayes classifiers for computer chess?

I was thinking in the context of endgame databases, specifically the win/loss/draw databases.

One could combine a set of basic observations about the position and get the classifiers opinion on the "probability" that a position is a win, loss or draw.

Such a thing might form a pretty good evaluation function for a specific ending, even if it's not perfect.

In recent years there has been a resurgence of interest in these classifiers, as they are really simple to code up, do not use as much memory as neural networks and many other classifiers, are very fast and are surprisingly powerful.

What appeals to me about them is that for chess endings you can probably make it as good as you want by helping it out with hand-crafted attributes. For instance in KPvsK, if one were building a classifier, you might code up as an attribute whether the white king has the opposition - surely this attribute would contribute to the accuracy of the classifier.

I am thinking about grabbing the KRPKR database and experimenting and was wondering what if anything has already been tried.
For something like KPK, you only need one bit per position if you store the absolute answer. After all, it is impossible to lose if you are the KP side, so a 1 will say 'win' and a 0 means draw.

So I think it won't have any utility with a very simple problem like KPK (which is small enough even in a full tablebase anyway).

The more complicated systems like KRPKR might have more utility, but with RAM dropping to $10/GB or so, it seems to me that a bitbase would give certain results. We already know that 'KRP' is usually better than KR, so how much value would the Bayesian results be? Will we be happy with 80% certainty?

I do think that a Bayes classifier would be very useful in the following sort of circumstances:
A quick check to see if you win the pawn race. If a precalculated race shows 'won' then there is less need to consult the tablebase. I suspect that such a precalculated race could be formed in terms of simple geometry and might fit in 10K. So you would only need to consult the bitbase or tablebase if it were unclear whether the race was won or lost.

It could also be a hint to try the tablebase:
Bayes says that pawn race looks won --> check tablebase to be sure.

It seems to me that Bayesian logic might prove valuable in evaluation.

For instance, suppose that some features combine in a non-linear fashion. It could be worthwhile to see if {for example} having a particular sort of material imbalance together with a mobility advantage leads to a particular probability of winning. It might be possible to engineer a set of rules that combine things in a non-linear way to arrive at better decisions than by simple additions.

Gerd Isenberg
Posts: 2127
Joined: Wed Mar 08, 2006 7:47 pm
Location: Hattingen, Germany

Re: naive bayes classifier

Post by Gerd Isenberg » Tue Jul 21, 2009 9:01 pm

Don wrote:Has anyone experimented with naive bayes classifiers for computer chess?

I was thinking in the context of endgame databases, specifically the win/loss/draw databases.

One could combine a set of basic observations about the position and get the classifiers opinion on the "probability" that a position is a win, loss or draw.

Such a thing might form a pretty good evaluation function for a specific ending, even if it's not perfect.

In recent years there has been a resurgence of interest in these classifiers, as they are really simple to code up, do not use as much memory as neural networks and many other classifiers, are very fast and are surprisingly powerful.

What appeals to me about them is that for chess endings you can probably make it as good as you want by helping it out with hand-crafted attributes. For instance in KPvsK, if one were building a classifier, you might code up as an attribute whether the white king has the opposition - surely this attribute would contribute to the accuracy of the classifier.

I am thinking about grabbing the KRPKR database and experimenting and was wondering what if anything has already been tried.
I have no experience with that stuff, but it sounds like an application of a bloom filter.

User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 2:27 pm

Re: naive bayes classifier

Post by Don » Tue Jul 21, 2009 9:19 pm

Dann Corbit wrote:
Don wrote:Has anyone experimented with naive bayes classifiers for computer chess?

I was thinking in the context of endgame databases, specifically the win/loss/draw databases.

One could combine a set of basic observations about the position and get the classifiers opinion on the "probability" that a position is a win, loss or draw.

Such a thing might form a pretty good evaluation function for a specific ending, even if it's not perfect.

In recent years there has been a resurgence of interest in these classifiers, as they are really simple to code up, do not use as much memory as neural networks and many other classifiers, are very fast and are surprisingly powerful.

What appeals to me about them is that for chess endings you can probably make it as good as you want by helping it out with hand-crafted attributes. For instance in KPvsK, if one were building a classifier, you might code up as an attribute whether the white king has the opposition - surely this attribute would contribute to the accuracy of the classifier.

I am thinking about grabbing the KRPKR database and experimenting and was wondering what if anything has already been tried.
For something like KPK, you only need one bit per position if you store the absolute answer. After all, it is impossible to lose if you are the KP side, so a 1 will say 'win' and a 0 means draw.

So I think it won't have any utility with a very simple problem like KPK (which is small enough even in a full tablebase anyway).
I only used the KPK database as a simple to understand example - I would not consider it for this in actual practice. As a matter of fact I built my own KPK database over 20 years ago and have been using it ever since then - it's 24k with no compression of any kind and fit's neatly into memory. (I have since discovered that other programs use a database of the same exact size, but it's not my database as the bits are layed out in different patterns.)
The more complicated systems like KRPKR might have more utility, but with RAM dropping to $10/GB or so, it seems to me that a bitbase would give certain results. We already know that 'KRP' is usually better than KR, so how much value would the Bayesian results be? Will we be happy with 80% certainty?
I was thinking that we would need much higher certainty, something above 95% or it would not be worth considering to me. Unless it turns out that it works as an excellent evaluation function. We don't fully understand how mini-max works but I wonder if a bayes calculation might be an excellent evaluation function. If you know about bayes classifiers they return a "percentage" of membership to a certain class. That could be translated to a score trivially. Used this way, it never really returns a win/loss or draw score, just how likely it is a win,loss,draw.

My thinking is that it would be a very good evaluation function if it was rarely "way off." For instance it might think a draw is slightly more likely to be a win when in fact it's just a draw. This may be enough to make it a huge win in the absence of a real database.

But there are other considerations too. You mentioned that we might as well just use a real database for this interesting ending with RAM sizes constantly going up. But I'm think of this as a problem that will ALWAYS be with us. At some point cache and ram may be a non-issue for this particular database, but what about the next one that comes along that is a couple of orders of magnitude larger?

So what I'm trying to do is to find a lossy compression algorithm that will be able to super-compress perfect knowledge. Think of this is trying to store images on a tiny storage device using a lossy compression format such as jpeg.

I do think that a Bayes classifier would be very useful in the following sort of circumstances:
A quick check to see if you win the pawn race. If a precalculated race shows 'won' then there is less need to consult the tablebase. I suspect that such a precalculated race could be formed in terms of simple geometry and might fit in 10K. So you would only need to consult the bitbase or tablebase if it were unclear whether the race was won or lost.

It could also be a hint to try the tablebase:
Bayes says that pawn race looks won --> check tablebase to be sure.
Yes, I agree there may be many opportunities to exploit this. But if you can precalculate a pawn race, it should just become an attribute for your bayes net, no doubt drastically increases it's predictive ability.

The real appeal when you talk about using it as a substitute for a bitbase is that you KNOW in advance exactly how good or bad it is. You know from offline testing exactly which positions are right and wrong. So you can accurately evaluate and tune it.

If you can get the accuracy really high, you can use a bloom filter or some other data structure to note the exceptions - and you might end up with a loss-less procedure that is far more memory efficient than a real database.

I don't pretend that won't be challenging! There are some endings where it's probably very difficult to classify accurately.

It seems to me that Bayesian logic might prove valuable in evaluation.
I think so too. I was even going to present a separate post on how it might be used to build an evaluation function.

For instance, suppose that some features combine in a non-linear fashion. It could be worthwhile to see if {for example} having a particular sort of material imbalance together with a mobility advantage leads to a particular probability of winning. It might be possible to engineer a set of rules that combine things in a non-linear way to arrive at better decisions than by simple additions.
Yes, this might be possible. There are tons of papers on naive bayes and hybrid approaches that combine this with something else which are pretty interesting.

User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 2:27 pm

Re: naive bayes classifier

Post by Don » Tue Jul 21, 2009 9:33 pm

Gerd Isenberg wrote:
Don wrote:Has anyone experimented with naive bayes classifiers for computer chess?

I was thinking in the context of endgame databases, specifically the win/loss/draw databases.

One could combine a set of basic observations about the position and get the classifiers opinion on the "probability" that a position is a win, loss or draw.

Such a thing might form a pretty good evaluation function for a specific ending, even if it's not perfect.

In recent years there has been a resurgence of interest in these classifiers, as they are really simple to code up, do not use as much memory as neural networks and many other classifiers, are very fast and are surprisingly powerful.

What appeals to me about them is that for chess endings you can probably make it as good as you want by helping it out with hand-crafted attributes. For instance in KPvsK, if one were building a classifier, you might code up as an attribute whether the white king has the opposition - surely this attribute would contribute to the accuracy of the classifier.

I am thinking about grabbing the KRPKR database and experimenting and was wondering what if anything has already been tried.
I have no experience with that stuff, but it sounds like an application of a bloom filter.
This is different from a bloom filter, but I have considered using bloom filters to identify positions that are wrongly classified. A bloom filter can give you a false positive with respect to set membership but we can even test for that since we have a known data set. If the bloom filter gives a false match we can store the entries that have false matches in another much smaller bloom filter!

You could use a bloom filter instead of a bitbase but in this case a bitbase will be much more compact. I think one would use a bloom filter if you had a good classifier than was almost always right - and the bloom filter would simply mark the exceptions.

User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 2:27 pm

Re: naive bayes classifier

Post by Don » Tue Jul 21, 2009 9:43 pm

Don wrote:You could use a bloom filter instead of a bitbase but in this case a bitbase will be much more compact. I think one would use a bloom filter if you had a good classifier than was almost always right - and the bloom filter would simply mark the exceptions.
There are many databases where only a small percentage of the entire space is a win (or a draw.) In such cases a bloom filter might be used to identify the smaller class - unfortunately it is unlikely to help because in these cases the compression scheme for the databases is extremely effective. But it's a nice thought!

shiv
Posts: 351
Joined: Sat Apr 01, 2006 12:03 am

Re: naive bayes classifier

Post by shiv » Fri Aug 07, 2009 7:23 am

I have dabbled with applied machine learning and playing chess but sadly never really with writing chess engines.

Using opposition as a feature to the naive bayes classifier intuitively seems to make sense.

Of course, it would be nice to automatically generate multiple features such as isolated pawn, doubled pawn etc without human intervention and the naive bayes (NB) network can alert us to the more important features. Perhaps a pure bayesian network might be better as there might be parent child relationships that naive bayes cannot exploit. It will be interesting to compare these features with those commonly known by humans. The actual process of feature extraction probably has parallels with graph mining.

The output of the NB classifier can be used in conjunction with the material evaluation.

Of course, a limitation of NB is that each feature in it should have enough training examples in it in order to be useful. So one would need a lot games with the opposition and many without as well. This just means more data collection needed and it would be interesting to try with more features.

As a chess player, I doubt the utility this would have without a lot of hand optimizations. For example, the opposition is only applicable in a small percentage of endgames where the material balance is very close. Thus a lot of grunt work of isolating these type of cases has to be done by hand.

Nevertheless, it would be a great start to see a NB based chess evaluation engine.

User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 2:27 pm

Re: naive bayes classifier

Post by Don » Fri Aug 07, 2009 1:18 pm

shiv wrote:I have dabbled with applied machine learning and playing chess but sadly never really with writing chess engines.

Using opposition as a feature to the naive bayes classifier intuitively seems to make sense.
Indeed, using opposition in king and pawn vs king can give you a program that plays that ending almost perfectly (once it gets into it), even though it may not actually know if the position is a win or draw.

Years ago someone constructed a set of rules that properly classify this particular ending 100% correctly. However many people use a small 24K bit table that correctly identifies each possible position as won or drawn and this is much faster than parsing several rules.
Of course, it would be nice to automatically generate multiple features such as isolated pawn, doubled pawn etc without human intervention and the naive bayes (NB) network can alert us to the more important features. Perhaps a pure bayesian network might be better as there might be parent child relationships that naive bayes cannot exploit. It will be interesting to compare these features with those commonly known by humans. The actual process of feature extraction probably has parallels with graph mining.

The output of the NB classifier can be used in conjunction with the material evaluation.

Of course, a limitation of NB is that each feature in it should have enough training examples in it in order to be useful. So one would need a lot games with the opposition and many without as well. This just means more data collection needed and it would be interesting to try with more features.

As a chess player, I doubt the utility this would have without a lot of hand optimizations. For example, the opposition is only applicable in a small percentage of endgames where the material balance is very close. Thus a lot of grunt work of isolating these type of cases has to be done by hand.

Nevertheless, it would be a great start to see a NB based chess evaluation engine.
Most programs uses a set of weights that are not correlated much with other weights - unless by definition. Naive Bayes assumes that features are not correlated, so it seems like a natural fit. Of course the devil is in the details.

Post Reply