fuzzy endgame databases

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

mike_bike_kite
Posts: 98
Joined: Tue Jul 26, 2011 12:18 am
Location: London

fuzzy endgame databases

Post by mike_bike_kite »

I'm trying to improve on the end game database in my checkers (English/international draughts) program Diamond Draughts. I could obviously store every position possible and it's corresponding score but I haven't got the inclination to gather that much data and there's no way my web based java program could hold that amount of data.

I currently store everything from the perspective of the player to move rather than holding the same position for black and again for white - that obviously reduces the database by 1/2. I then use reflections etc to reduce the amount of data - by a 1/4 if there's no men on the board. I assume something similar is done in chess endgame databases.

A simple 2 kings vs 1 king ending might translate to 100 (for the sake of argument) separate positions. The problem is if the winning side has an extra man then I now need to store all these positions again for each place on the board where the man might be. This adds to the database considerably.

My new idea (at least to me) is to look at the position and simply remove any pieces which are not required by the attacking side and remove them before searching the database. This might just be for one extra man or any number of extra men and kings. This reduces the size of the database considerably. Obviously the issue is how to decide which pieces are extra to the position and which can be removed.

I suppose the chess analogy is to have one rook and king vs king database and avoiding the need to have a separate pawn, rook and king vs king database as it's handled by the first instance.

I'm in the middle of testing but wondered if this has been done before? and did it work?

Mike
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: fuzzy endgame databases

Post by Daniel Shawul »

mike_bike_kite wrote:I'm trying to improve on the end game database in my checkers (English/international draughts) program Diamond Draughts. I could obviously store every position possible and it's corresponding score but I haven't got the inclination to gather that much data and there's no way my web based java program could hold that amount of data.

I currently store everything from the perspective of the player to move rather than holding the same position for black and again for white - that obviously reduces the database by 1/2. I then use reflections etc to reduce the amount of data - by a 1/4 if there's no men on the board. I assume something similar is done in chess endgame databases.
Yes you can store only one side to move tables if you are willing to do a one ply search during probing. For my bitbases, I generate both sides and drop which ever is bigger. The bitbase sizes are very asymmetric and I usually end up with much less than half the size.
A simple 2 kings vs 1 king ending might translate to 100 (for the sake of argument) separate positions. The problem is if the winning side has an extra man then I now need to store all these positions again for each place on the board where the man might be. This adds to the database considerably.

My new idea (at least to me) is to look at the position and simply remove any pieces which are not required by the attacking side and remove them before searching the database. This might just be for one extra man or any number of extra men and kings. This reduces the size of the database considerably. Obviously the issue is how to decide which pieces are extra to the position and which can be removed.

I suppose the chess analogy is to have one rook and king vs king database and avoiding the need to have a separate pawn, rook and king vs king database as it's handled by the first instance.

I'm in the middle of testing but wondered if this has been done before? and did it work?

Mike
Well you can always decide to have only KRK bitbase and not KRPK. You say P is not needed for winning but how would you know? If you find that one of the pieces is not needed for winning, then the endgame is probably easily won anyway.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: fuzzy endgame databases

Post by syzygy »

mike_bike_kite wrote:I currently store everything from the perspective of the player to move rather than holding the same position for black and again for white - that obviously reduces the database by 1/2. I then use reflections etc to reduce the amount of data - by a 1/4 if there's no men on the board. I assume something similar is done in chess endgame databases.
Yes, these are the basic reductions. What is usually done is storing the positions with white to move and with black to move, but giving white the majority of pieces or heavier pieces. So KQvKR (white K+Q and black K+R) positions are stored both for white to move and black to move, but KRvKQ not. This loses no information, since KRvKQ positions can be derived from KQvKR positions by mirroring the board vertically and inverting the colors of the pieces. If material is equal for both sides such as KQvKQ, then only positions with white to move are stored.

Board symmetries are even more important in chess than in checkers. Without pawns, there are 8 symmetries leading to a reduction factor of about 8 (it's a bit less due to symmetric positions). With pawns there is only the horizontal symmetry leading to a reduction factor of usually 2 (again less if there are horizontally symmetric positions).
A simple 2 kings vs 1 king ending might translate to 100 (for the sake of argument) separate positions. The problem is if the winning side has an extra man then I now need to store all these positions again for each place on the board where the man might be. This adds to the database considerably.

My new idea (at least to me) is to look at the position and simply remove any pieces which are not required by the attacking side and remove them before searching the database. This might just be for one extra man or any number of extra men and kings. This reduces the size of the database considerably. Obviously the issue is how to decide which pieces are extra to the position and which can be removed.
If what you mean is that you're not going to store information for positions with the extra man at all but instead try to "guess" the value based on the values for positions obtained by removing a man, then you're not generating the bigger database at all and your idea is basically an evaluation heuristic.

If you mean something else, then maybe you could explain your idea in some more detail.

What might be possible is to build the bigger database, and somehow use the values for positions obtained by removing one of the man to "predict" the real value of a position in the bigger database. You then store the difference between the real value and the predicted value instead of storing the real value. This might lead to an improved compression ratio and therefore a smaller table. However, I doubt that this will work well, because I'm guessing that the correlation is not too high. (There will be endgames where the correlation is high, e.g. if removing a man brings you in a generally won endgame so the position with the removed man will be generally won as well, but those will compress well anyway and the positions with the man removed will probably not predict the exceptions to the general win very well.)

Some things that you "should" do: don't store the values of positions where a capture is possible (and therefore mandatory). Instead, probe those positions by performing the capture(s) and probing the smaller tables. If removing capture positions from your index function is too hard (which seems pretty likely), then you can instead replace the values for those positions by whatever value gives good compression, e.g. the most frequent value in the database (but what is best depends on your compression method).

There are many papers on building endgame databases for checkers by the Chinook team.
mike_bike_kite
Posts: 98
Joined: Tue Jul 26, 2011 12:18 am
Location: London

Re: fuzzy endgame databases

Post by mike_bike_kite »

Daniel Shawul wrote:Yes you can store only one side to move tables if you are willing to do a one ply search during probing.
I actually check the "database" at the leaf nodes. If the position is stored then I'll return the score associated with it otherwise I do my normal scoring methods. I do actually check to see if the position is one piece against another and, if it is, then I don't continue the search as the database would already hold the result for that position. The database might allow me to see wins that are 40 ply away. Checking the database after a 10 ply search means I'm getting a 50 ply search. Unfortunately I don't store all positions so the last statement isn't wholly true.
Daniel Shawul wrote:Well you can always decide to have only KRK bitbase and not KRPK. You say P is not needed for winning but how would you know?
If KRK ending is a win then having the extra P might at best provide a faster checkmate or at worst generate a stalemate. To store all the KRPK endings would require replicating the KRK endings but with a P in all the valid positions ie 8*6 = 48 times. That's a huge growth in the database for very little gain. If you can see that the P is a certain distance from the enemy King then you could just ignore it and use a smaller database.

Take the KRK chess position (8/3R4/8/8/8/1K6/8/1k w KQkq - 0 1). This "could" be stored as 8 different reflections and for both sides. If we included a P on the winning side then this might produce approximately 48 similar positions. If there were 2 extra P then obviously this makes the database grow exponentially. If we just removed the extra pawns if they weren't going to affect the position then we can get away with just storing the one position.
Daniel Shawul wrote:If you find that one of the pieces is not needed for winning, then the endgame is probably easily won anyway.
The problem I was finding was that my program would get a winning position often with winning material but if the opponent was sufficiently canny then the program would have a hard time converting the position into a final win. There are many positions in checkers where it might require 50 moves (ply) to be able to convert a winning position into winning material. I couldn't program this using a clever scoring method but I could using a database. The problem was the size of the database that gets generated.
syzygy wrote:If what you mean is that you're not going to store information for positions with the extra man at all but instead try to "guess" the value based on the values for positions obtained by removing a man, then you're not generating the bigger database at all and your idea is basically an evaluation heuristic. If you mean something else, then maybe you could explain your idea in some more detail.
I'm definitely generating a database but it's more the essence of each position that I'm storing. If there are pieces that are extra to the win then I remove these before searching the database.
syzygy wrote:What might be possible is to build the bigger database, and somehow use the values for positions obtained by removing one of the man to "predict" the real value of a position in the bigger database.
An interesting idea but not really possible on my little computer. A 4 piece database requires 7m positions, a 6 piece database requires 2.5 billion positions (figures from the Chinook book "One Jump Ahead"). I currently store around 50k positions and this allows it to play most 6 piece positions well.
syzygy wrote:Some things that you "should" do: don't store the values of positions where a capture is possible (and therefore mandatory). Instead, probe those positions by performing the capture(s) and probing the smaller tables.
I agree and this is exactly what I do.
syzygy wrote:There are many papers on building endgame databases for checkers by the Chinook team.
I've had a look at a few papers and I've read their book. The problem for me is that I can't use a large database on a web based application (plus I haven't got the resources to generate one). I also prefer the idea of using a smaller database in a more flexible manner.

Note: I'm not saying this should be done in Chess (or in checkers) but it's something I hadn't seen done before and I do seem to be getting some good results in testing even when I have a relatively small database. A small advantage of not having a complete database is that the program plays more like a human rather than like a computer that has the benefit of being able to see vast distances ahead.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: fuzzy endgame databases

Post by bob »

mike_bike_kite wrote:I'm trying to improve on the end game database in my checkers (English/international draughts) program Diamond Draughts. I could obviously store every position possible and it's corresponding score but I haven't got the inclination to gather that much data and there's no way my web based java program could hold that amount of data.

I currently store everything from the perspective of the player to move rather than holding the same position for black and again for white - that obviously reduces the database by 1/2. I then use reflections etc to reduce the amount of data - by a 1/4 if there's no men on the board. I assume something similar is done in chess endgame databases.

A simple 2 kings vs 1 king ending might translate to 100 (for the sake of argument) separate positions. The problem is if the winning side has an extra man then I now need to store all these positions again for each place on the board where the man might be. This adds to the database considerably.

My new idea (at least to me) is to look at the position and simply remove any pieces which are not required by the attacking side and remove them before searching the database. This might just be for one extra man or any number of extra men and kings. This reduces the size of the database considerably. Obviously the issue is how to decide which pieces are extra to the position and which can be removed.

I suppose the chess analogy is to have one rook and king vs king database and avoiding the need to have a separate pawn, rook and king vs king database as it's handled by the first instance.

I'm in the middle of testing but wondered if this has been done before? and did it work?

Mike
I suspect trouble would soon follow that idea. For a simple example, what if the pawn you remove in KRP vs K happens to control the only escape square for the enemy king on the current move? This becomes a stalemate where you think it is winning. For example, black king at e8, white pawn at e7, white king at e6, white rook at e5. You have to shuffle the king and rook around to reach a forced mate, but if you pass through THIS position, you just turned this into a draw. The same thing can happen with any extra piece white has, and you certainly don't want to think you are winning and uncork a stalemate.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: fuzzy endgame databases

Post by bob »

mike_bike_kite wrote:
Daniel Shawul wrote:Yes you can store only one side to move tables if you are willing to do a one ply search during probing.
I actually check the "database" at the leaf nodes. If the position is stored then I'll return the score associated with it otherwise I do my normal scoring methods. I do actually check to see if the position is one piece against another and, if it is, then I don't continue the search as the database would already hold the result for that position. The database might allow me to see wins that are 40 ply away. Checking the database after a 10 ply search means I'm getting a 50 ply search. Unfortunately I don't store all positions so the last statement isn't wholly true.
Daniel Shawul wrote:Well you can always decide to have only KRK bitbase and not KRPK. You say P is not needed for winning but how would you know?
If KRK ending is a win then having the extra P might at best provide a faster checkmate or at worst generate a stalemate. To store all the KRPK endings would require replicating the KRK endings but with a P in all the valid positions ie 8*6 = 48 times. That's a huge growth in the database for very little gain. If you can see that the P is a certain distance from the enemy King then you could just ignore it and use a smaller database.

Take the KRK chess position (8/3R4/8/8/8/1K6/8/1k w KQkq - 0 1). This "could" be stored as 8 different reflections and for both sides. If we included a P on the winning side then this might produce approximately 48 similar positions. If there were 2 extra P then obviously this makes the database grow exponentially. If we just removed the extra pawns if they weren't going to affect the position then we can get away with just storing the one position.
Daniel Shawul wrote:If you find that one of the pieces is not needed for winning, then the endgame is probably easily won anyway.
The problem I was finding was that my program would get a winning position often with winning material but if the opponent was sufficiently canny then the program would have a hard time converting the position into a final win. There are many positions in checkers where it might require 50 moves (ply) to be able to convert a winning position into winning material. I couldn't program this using a clever scoring method but I could using a database. The problem was the size of the database that gets generated.
syzygy wrote:If what you mean is that you're not going to store information for positions with the extra man at all but instead try to "guess" the value based on the values for positions obtained by removing a man, then you're not generating the bigger database at all and your idea is basically an evaluation heuristic. If you mean something else, then maybe you could explain your idea in some more detail.
I'm definitely generating a database but it's more the essence of each position that I'm storing. If there are pieces that are extra to the win then I remove these before searching the database.
syzygy wrote:What might be possible is to build the bigger database, and somehow use the values for positions obtained by removing one of the man to "predict" the real value of a position in the bigger database.
An interesting idea but not really possible on my little computer. A 4 piece database requires 7m positions, a 6 piece database requires 2.5 billion positions (figures from the Chinook book "One Jump Ahead"). I currently store around 50k positions and this allows it to play most 6 piece positions well.
syzygy wrote:Some things that you "should" do: don't store the values of positions where a capture is possible (and therefore mandatory). Instead, probe those positions by performing the capture(s) and probing the smaller tables.
I agree and this is exactly what I do.
syzygy wrote:There are many papers on building endgame databases for checkers by the Chinook team.
I've had a look at a few papers and I've read their book. The problem for me is that I can't use a large database on a web based application (plus I haven't got the resources to generate one). I also prefer the idea of using a smaller database in a more flexible manner.

Note: I'm not saying this should be done in Chess (or in checkers) but it's something I hadn't seen done before and I do seem to be getting some good results in testing even when I have a relatively small database. A small advantage of not having a complete database is that the program plays more like a human rather than like a computer that has the benefit of being able to see vast distances ahead.
One note. Why not check the database ONLY after captures, and specifically, only after captures take you down to the number of pieces in your largest endgame database? That cuts the tree off much sooner on a hit, and greatly reduces the number of probes/hits. All at no cost in accuracy.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: fuzzy endgame databases

Post by syzygy »

mike_bike_kite wrote:If KRK ending is a win then having the extra P might at best provide a faster checkmate or at worst generate a stalemate.
But this is not at all necessary.

Of course in this example KRK is a general win and KRPK even more so, but if you base yourself on that you may as well leave out KRK and KRPK altogether.
To store all the KRPK endings would require replicating the KRK endings but with a P in all the valid positions ie 8*6 = 48 times. That's a huge growth in the database for very little gain. If you can see that the P is a certain distance from the enemy King then you could just ignore it and use a smaller database.
Generally won databases such as KRPK compress very well. My win/draw/loss databases for KRPK, KRPPK and KRPPPK have 5136, 165072 and 2506512 bytes. Of course for game play KRPPK and KRPPPK can simply be replaced by a heuristic evaluation that evaluates it as a win, and it is not much different for KRPK and KRK.
Take the KRK chess position (8/3R4/8/8/8/1K6/8/1k w KQkq - 0 1). This "could" be stored as 8 different reflections and for both sides. If we included a P on the winning side then this might produce approximately 48 similar positions. If there were 2 extra P then obviously this makes the database grow exponentially. If we just removed the extra pawns if they weren't going to affect the position then we can get away with just storing the one position.
Without maybe realizing it, you are arguing against the use of big endgame databases, unless I completely misunderstand you. You want to replace them by heuristics. That is of course fine. However, you will have to accept that your engine will make mistakes in more evenly balanced endgames.
syzygy wrote:If what you mean is that you're not going to store information for positions with the extra man at all but instead try to "guess" the value based on the values for positions obtained by removing a man, then you're not generating the bigger database at all and your idea is basically an evaluation heuristic. If you mean something else, then maybe you could explain your idea in some more detail.
I'm definitely generating a database but it's more the essence of each position that I'm storing. If there are pieces that are extra to the win then I remove these before searching the database.
What do you get if you remove the P from KRPK? You get KRK. Instead of generating KRPK, you want to generate KRK. Since in order to generate KRPK you anyway need KRK, you are effectively deciding not to generate KRPK. You can't remove the P from KRPK and maintain that you generate KRPK.

Of course you can come up with heuristics to deal with KRPK instead of using a KRPK database. The heuristic "it's a win, just move your pawn or even lose your pawn" will work fine for this endgame.
syzygy wrote:What might be possible is to build the bigger database, and somehow use the values for positions obtained by removing one of the man to "predict" the real value of a position in the bigger database.
An interesting idea but not really possible on my little computer. A 4 piece database requires 7m positions, a 6 piece database requires 2.5 billion positions (figures from the Chinook book "One Jump Ahead"). I currently store around 50k positions and this allows it to play most 6 piece positions well.
I suppose that just using search and a decent evaluation function a checkers engine will play most 6 piece positions well. The problem is playing the positions many plies removed from the 6 piece positions, where the search can get you into the 6-piece databases.

I didn't check the Chinook papers for this, but I'm sure that the individual tables required for the full 6 piece database are far smaller than 2.5 billion positions and would probably fit in the amount of RAM you have in your computer. So generating them is not a problem of (relatively) modern hardware. To really use them you'd have to get them into a good compressed format, though.
The problem for me is that I can't use a large database on a web based application (plus I haven't got the resources to generate one).
Ok, that makes things different.
Note: I'm not saying this should be done in Chess (or in checkers) but it's something I hadn't seen done before and I do seem to be getting some good results in testing even when I have a relatively small database.
Well, what is well known is to use special evaluation functions for certain endgame positions. That also obviates the need for endgame databases in the sense that when the game reaches e.g. KBNK, the engine is able to win it as white.

Maybe there is some small misunderstanding here arising from the fact that most chess engines do fine without endgame databases, while these databases are very important for probably all modern checkers engines. So when you propose to essentially not use them, then that's what chess engines already do almost by default.
mike_bike_kite
Posts: 98
Joined: Tue Jul 26, 2011 12:18 am
Location: London

Re: fuzzy endgame databases

Post by mike_bike_kite »

I'll quickly restate that I'm not using this in a chess program - my own little chess program has no end game database at all and instead just switches in different scoring algorithms depending on the position. For checkers it's different because you need to search much deeper to see that a given move in an end game is better than another which makes an end game database much more useful. I'm not saying this idea can't be used in chess, I think it can, but personally I'm using it in checkers. There are some chess end games that are difficult to win ie KRB vs KNN. I've read that this requires over 200 perfect moves to guarantee a win. I suspect this type of end game would benefit from a database approach but obviously this situation isn't going to appear that often. The same can be said for KRN vs KNN.

To be useful in checkers this database needs to be huge. The database grows exponentially for each extra piece you want to hold in the database. Even if you ignore reflections and make various other savings you still end up with a huge database. The clever bit is working out which pieces can be ignored in a given position. As others have stated it's easy to think you've got a win but then find your extra pieces actually stop you from winning but I actually check that the pieces concerned aren't "in the action" or are distant enough from the defending pieces. Another alternative might be to store the minority cases where a "stalemate" occurs and then these positions would then be spotted and avoided.

My old checkers program would often get into an end game where it was a number of pieces up (say 4 kings vs 2 kings). These positions weren't stored in my database so the program would try and work out how to win. If there wasn't an immediate win then it would just "pointlessly" sacrifice a piece to reach a 3k vs 2k ending which would be in the database. This sacrificing a piece made sense to me (the programmer) but to an average user it looked like the program had suddenly lost it's marbles and spoiled the illusion that they were playing against a master player. I didn't like this.

Realising that some pieces can be ignored and then searching the database for what's essentially just the essence of the position means I can use a much smaller database. It also means that each position in the database can apply itself to hundreds/thousands of real world positions. Lastly, as I mentioned above, the style of play becomes more logical and human like rather than "cold" style of most advanced games programs. I'm currently using a 50k position database which approximates to the 2.5b position database used by Chinook (current world champ program) for 6 piece endings. The two databases aren't equivalent though - Chinook's database might well cover positions that my own database doesn't hold but whether all those positions are worth storing is another question - is it really worth storing positions where you have 5 kings vs 1 king? My own database will also apply to positions where there are far more than just 6 pieces on the board.

Mike

PS Chinook will beat my program but then it is the world champ and it's current database means it can no longer be beaten by any human or program. It does however have the worst interface known to man :)