How much are bitbases worth?

Uri Blass · Post by **Uri Blass** » Thu May 22, 2008 9:43 pm

Don wrote:
Tord Romstad wrote:I've started thinking about adding 4-piece and 5-piece bitbases to my program, but I hesitate a little because it seems to be an awful lot of work, especially for something that doesn't scale down to handheld devices.

Therefore, before I start: Has there been any experiments investigating how much you gain by using the complete 4-piece and 5-piece bitbases on modern hardware?

Tord
Hi Tord,

I'll give you my sense of this, which is just an opinion of course and should be taken as such (no flames please

My sense of this is that there is not that much benefit except to one of them in particular, the RP vs R database. I would say 90% of the benefit of 5 piece databases comes from this single database.

There are small databases that are important but they are easily contained in a small amount of memory. KPk is probably the most important of these. I built my own kpk in the late 80's and still use it and I assume most programs have this one. Also, someone devised a set of rules to score this perfectly.

In 1986 I got into an argument with a world famous chess programmer about the importance of KPK and this programmer believed that it wasn't needed because a simple rule of king opposition combined with keeping the king in front enables a win. Although it's true that you can play this ending correctly with a couple of trivial rules, the much bigger issue is KNOWING whether it's a win or not.

Stuff like BN vs K are not important, they are always wins and all your program needs to know is to drive the king to corner of the bishop, although if your program doesn't know that, it can flounder and lose the win.

The problem with 5 piece databases are that despite herculean efforts, they do impact the speed of your program. I have no doubt that if you could have them all in your program and they were FREE, you would probably increase the ELO by 20-30 or something modest like that, but since they are not free they probably degrade the performance by at least that much or more.

It may very well be the case that with modern programs these databases become much more important (and thus my guesses could be off.) I believe that at really high ELO levels you need to have some knowledge that is more commensurate with your ability - for instance it's not very important whether a raw beginner understands how to win with RP vs P when he hangs pieces all the time, but it's very important for a master to understand this. So that concept may be valid here.

To help answer that question, what do the very top guys do in tournaments? Does Rybka use these databases in tournaments and if so does he use them in the tree search, or only to play the ending correctly once he reaches it?

I disagree about the importance of 5 piece database.
I think that they are less important when the level is high because when the level is high the probability of the program to find the right move without tablebases based on search and knowledge is also higher(an extreme case that does not happen is when search gives perfect play even without tablebases and in this case tablebases gives 0 elo).

My guess is that 20-30 elo for free database can be correct for weak programs or for fast time control when the difference is smaller for slower time control or for stronger programs.

Uri

Don · Post by **Don** » Thu May 22, 2008 10:08 pm

Uri Blass wrote:
Don wrote:
Tord Romstad wrote:I've started thinking about adding 4-piece and 5-piece bitbases to my program, but I hesitate a little because it seems to be an awful lot of work, especially for something that doesn't scale down to handheld devices.

Therefore, before I start: Has there been any experiments investigating how much you gain by using the complete 4-piece and 5-piece bitbases on modern hardware?

Tord
Hi Tord,

I'll give you my sense of this, which is just an opinion of course and should be taken as such (no flames please

My sense of this is that there is not that much benefit except to one of them in particular, the RP vs R database. I would say 90% of the benefit of 5 piece databases comes from this single database.

There are small databases that are important but they are easily contained in a small amount of memory. KPk is probably the most important of these. I built my own kpk in the late 80's and still use it and I assume most programs have this one. Also, someone devised a set of rules to score this perfectly.

In 1986 I got into an argument with a world famous chess programmer about the importance of KPK and this programmer believed that it wasn't needed because a simple rule of king opposition combined with keeping the king in front enables a win. Although it's true that you can play this ending correctly with a couple of trivial rules, the much bigger issue is KNOWING whether it's a win or not.

Stuff like BN vs K are not important, they are always wins and all your program needs to know is to drive the king to corner of the bishop, although if your program doesn't know that, it can flounder and lose the win.

The problem with 5 piece databases are that despite herculean efforts, they do impact the speed of your program. I have no doubt that if you could have them all in your program and they were FREE, you would probably increase the ELO by 20-30 or something modest like that, but since they are not free they probably degrade the performance by at least that much or more.

It may very well be the case that with modern programs these databases become much more important (and thus my guesses could be off.) I believe that at really high ELO levels you need to have some knowledge that is more commensurate with your ability - for instance it's not very important whether a raw beginner understands how to win with RP vs P when he hangs pieces all the time, but it's very important for a master to understand this. So that concept may be valid here.

To help answer that question, what do the very top guys do in tournaments? Does Rybka use these databases in tournaments and if so does he use them in the tree search, or only to play the ending correctly once he reaches it?
I disagree about the importance of 5 piece database.
I think that they are less important when the level is high because when the level is high the probability of the program to find the right move without tablebases based on search and knowledge is also higher(an extreme case that does not happen is when search gives perfect play even without tablebases and in this case tablebases gives 0 elo).

My guess is that 20-30 elo for free database can be correct for weak programs or for fast time control when the difference is smaller for slower time control or for stronger programs.

Uri

You could be right, I don't pretend to understand this completely. However, I would like to point out the fact that I'm trying to "prove" that extra knowledge is more important at greater search depths and if I'm right about this, then it would indicate that these databases are at least slightly more important for stronger (i.e. deeper searching modern day) programs.

My study seems to indicate that this is so with knowledge. However, the effect seems to fall off rather suddenly beyond a few ply of depth.

I'm still playing these games, and we have almost 20,000 of them now.

http://www.greencheeks.homelinux.org:8015/

wgarvin · Post by **wgarvin** » Thu May 22, 2008 10:09 pm

Uri Blass wrote:I disagree about the importance of 5 piece database.
I think that they are less important when the level is high because when the level is high the probability of the program to find the right move without tablebases based on search and knowledge is also higher(an extreme case that does not happen is when search gives perfect play even without tablebases and in this case tablebases gives 0 elo).

My guess is that 20-30 elo for free database can be correct for weak programs or for fast time control when the difference is smaller for slower time control or for stronger programs.

Uri

But your program has to search deep in order to find everything, and then it has to evaluate the nodes it reaches. Will it evaluate them correctly? If it tries to evaluate a node with 5 pieces on the board and you have a bitbase, you can evaluate it with 100% accuracy to a GTV of win, loss or draw.

If you have interior node recognizers that are fast enough, and you can afford to keep the bitbases in RAM, then now you can cut off entire branches of the tree at any depth as soon as they are found in the bitbase, and spend that time searching deeper in other parts of the tree. If I recall correctly, DarkThought benefited a lot from this sort of thing in late middlegame in the early 90's. It had bitbases or near-complete knowledge for all 4-piece endings, which took up 15 MB of ram in an uncompressed form (though "knowledgeably coded" so that many of them were 1 bit per position rather than 2 bits per position).

I would love to see the same thing extended to 3-vs-2 piece endings, but honestly it might take too much RAM and the benefits might not be large enough to justify it. Hence my long-lasting interest in better compression/representation schemes for the info in bitbases.

And I wonder if good heuristics could steer the search towards "probably winnable" endings and avoid "probably lost" ones, using some representation of the bitbase contents that was extremely sparse or lossy? I.e. how "complete" does the knowledge have to be, to be worth having?

Uri Blass · Post by **Uri Blass** » Thu May 22, 2008 10:27 pm

Don wrote:
Uri Blass wrote:
Don wrote:
Tord Romstad wrote:I've started thinking about adding 4-piece and 5-piece bitbases to my program, but I hesitate a little because it seems to be an awful lot of work, especially for something that doesn't scale down to handheld devices.

Therefore, before I start: Has there been any experiments investigating how much you gain by using the complete 4-piece and 5-piece bitbases on modern hardware?

Tord
Hi Tord,

I'll give you my sense of this, which is just an opinion of course and should be taken as such (no flames please

My sense of this is that there is not that much benefit except to one of them in particular, the RP vs R database. I would say 90% of the benefit of 5 piece databases comes from this single database.

There are small databases that are important but they are easily contained in a small amount of memory. KPk is probably the most important of these. I built my own kpk in the late 80's and still use it and I assume most programs have this one. Also, someone devised a set of rules to score this perfectly.

In 1986 I got into an argument with a world famous chess programmer about the importance of KPK and this programmer believed that it wasn't needed because a simple rule of king opposition combined with keeping the king in front enables a win. Although it's true that you can play this ending correctly with a couple of trivial rules, the much bigger issue is KNOWING whether it's a win or not.

Stuff like BN vs K are not important, they are always wins and all your program needs to know is to drive the king to corner of the bishop, although if your program doesn't know that, it can flounder and lose the win.

The problem with 5 piece databases are that despite herculean efforts, they do impact the speed of your program. I have no doubt that if you could have them all in your program and they were FREE, you would probably increase the ELO by 20-30 or something modest like that, but since they are not free they probably degrade the performance by at least that much or more.

It may very well be the case that with modern programs these databases become much more important (and thus my guesses could be off.) I believe that at really high ELO levels you need to have some knowledge that is more commensurate with your ability - for instance it's not very important whether a raw beginner understands how to win with RP vs P when he hangs pieces all the time, but it's very important for a master to understand this. So that concept may be valid here.

To help answer that question, what do the very top guys do in tournaments? Does Rybka use these databases in tournaments and if so does he use them in the tree search, or only to play the ending correctly once he reaches it?
I disagree about the importance of 5 piece database.
I think that they are less important when the level is high because when the level is high the probability of the program to find the right move without tablebases based on search and knowledge is also higher(an extreme case that does not happen is when search gives perfect play even without tablebases and in this case tablebases gives 0 elo).

My guess is that 20-30 elo for free database can be correct for weak programs or for fast time control when the difference is smaller for slower time control or for stronger programs.

Uri
You could be right, I don't pretend to understand this completely. However, I would like to point out the fact that I'm trying to "prove" that extra knowledge is more important at greater search depths and if I'm right about this, then it would indicate that these databases are at least slightly more important for stronger (i.e. deeper searching modern day) programs.

My study seems to indicate that this is so with knowledge. However, the effect seems to fall off rather suddenly beyond a few ply of depth.

I'm still playing these games, and we have almost 20,000 of them now.

http://www.greencheeks.homelinux.org:8015/

I think that knowledge in the evaluation is different here because it is not
knowledge about winning won position or drawing inferior positions.

I believe that with bad evaluation in the opening practically the difference is going to be bigger at least at the depths that we search today because there are many cases when we need very deep search to find the correct moves with bad evaluation.

Situation with tablebases seems different to me and I believe that in most cases deeper search that we practically can get is enough to avoid mistakes in the endgame(I do not deny that there are cases when there is some long mate that deep search is not going to find without tablebases but I believe that they happen in relatively small part of the games and the main problem of not having knowledge is only with weak programs or at small depths when programs may not be able even to win KR vs K and practically when I play games at depth 1 between programs without tablebases I can see cases when KR vs K ends in a draw).

Uri

Uri Blass · Post by **Uri Blass** » Thu May 22, 2008 10:30 pm

wgarvin wrote:
Uri Blass wrote:I disagree about the importance of 5 piece database.
I think that they are less important when the level is high because when the level is high the probability of the program to find the right move without tablebases based on search and knowledge is also higher(an extreme case that does not happen is when search gives perfect play even without tablebases and in this case tablebases gives 0 elo).

My guess is that 20-30 elo for free database can be correct for weak programs or for fast time control when the difference is smaller for slower time control or for stronger programs.

Uri
But your program has to search deep in order to find everything, and then it has to evaluate the nodes it reaches. Will it evaluate them correctly? If it tries to evaluate a node with 5 pieces on the board and you have a bitbase, you can evaluate it with 100% accuracy to a GTV of win, loss or draw.

If you have interior node recognizers that are fast enough, and you can afford to keep the bitbases in RAM, then now you can cut off entire branches of the tree at any depth as soon as they are found in the bitbase, and spend that time searching deeper in other parts of the tree. If I recall correctly, DarkThought benefited a lot from this sort of thing in late middlegame in the early 90's. It had bitbases or near-complete knowledge for all 4-piece endings, which took up 15 MB of ram in an uncompressed form (though "knowledgeably coded" so that many of them were 1 bit per position rather than 2 bits per position).

I would love to see the same thing extended to 3-vs-2 piece endings, but honestly it might take too much RAM and the benefits might not be large enough to justify it. Hence my long-lasting interest in better compression/representation schemes for the info in bitbases.

And I wonder if good heuristics could steer the search towards "probably winnable" endings and avoid "probably lost" ones, using some representation of the bitbase contents that was extremely sparse or lossy? I.e. how "complete" does the knowledge have to be, to be worth having?

Dark thought benefited in the early 90
My point is that I expect the benefit to be smaller when the hardware is faster and hardware today is faster than the early 90.

Uri

Don · Post by **Don** » Thu May 22, 2008 11:15 pm

wgarvin wrote:
Uri Blass wrote:I disagree about the importance of 5 piece database.
I think that they are less important when the level is high because when the level is high the probability of the program to find the right move without tablebases based on search and knowledge is also higher(an extreme case that does not happen is when search gives perfect play even without tablebases and in this case tablebases gives 0 elo).

My guess is that 20-30 elo for free database can be correct for weak programs or for fast time control when the difference is smaller for slower time control or for stronger programs.

Uri
But your program has to search deep in order to find everything, and then it has to evaluate the nodes it reaches. Will it evaluate them correctly? If it tries to evaluate a node with 5 pieces on the board and you have a bitbase, you can evaluate it with 100% accuracy to a GTV of win, loss or draw.

If you have interior node recognizers that are fast enough, and you can afford to keep the bitbases in RAM, then now you can cut off entire branches of the tree at any depth as soon as they are found in the bitbase, and spend that time searching deeper in other parts of the tree. If I recall correctly, DarkThought benefited a lot from this sort of thing in late middlegame in the early 90's. It had bitbases or near-complete knowledge for all 4-piece endings, which took up 15 MB of ram in an uncompressed form (though "knowledgeably coded" so that many of them were 1 bit per position rather than 2 bits per position).

I would love to see the same thing extended to 3-vs-2 piece endings, but honestly it might take too much RAM and the benefits might not be large enough to justify it. Hence my long-lasting interest in better compression/representation schemes for the info in bitbases.

And I wonder if good heuristics could steer the search towards "probably winnable" endings and avoid "probably lost" ones, using some representation of the bitbase contents that was extremely sparse or lossy? I.e. how "complete" does the knowledge have to be, to be worth having?

I've put much thought into that one. Several years ago I tried to produce a decision tree-like scheme to produce the win/loss/draw results in a much smaller form. I never had real success but I might try again.

Most databases are one-sided with a few minor exceptions. I believe a few simply rules may help a lot if you are satisfied with an imperfect solution. It would still be better than than not having ANY databases of course.

Has anyone every considered using bloom filters for an imperfect database substitute? Bloom filters store set membership in a compressed form. My guess is that RLE compression with standard databases is more compact that Bloom filters (and they are exact where bloom filters can give false positives although not false negatives.)

But I thought I would throw the idea out there of using bloom filters and I wonder if it's every been analyzed or even tried.

michiguel · Post by **michiguel** » Thu May 22, 2008 11:47 pm

Tord Romstad wrote:
Dirt wrote:
Tord Romstad wrote:The difficulty of winning KBNK is a myth.
Not entirely. I'm almost certain I couldn't win it, and HGM has just told us Joker might not. Are you claiming he's wrong about that?
I'm almost certain I couldn't win it either.

It is not that difficult. I could always do it consistently with less than 3 min on the clock. When I practiced a little for it (many years ago...), I could do it with a minute on the clock.
BTW, is is a very good exercise for a chess player.

Miguel

What I meant is that it was an easy endgame to program.

Perhaps there are no theoretically won five piece bitbase positions that Glaurung couldn't win with only win/lose/draw information.
I'm quite sure there must be some. Some of the KQP vs KQ and KQN vs KQ wins are phenomenally difficult. There are probably some KRP vs KR wins which are too hard, too. There is therefore little doubt that having tablebases in addition to bitbases would be better for my program than only bitbases, but in practice adding tablebase support would be a waste of time, because noone would ever download gigabytes of data just to make my program one or two Elo points stronger.

In that case I see no down side to using bitbases, but If so I doubt that only having drawn/not-drawn would result in any missed win.
I think a drawn/not drawn bitbase would be just as good as a full bitbase in practice, but it doesn't really offer any advantages. I doubt that you would save a significant amount of space: In almost all the endgame families in 4 and 5-piece endgames, there are at most two frequently occuring results anyway. After compression, the full bitbase will have almost exactly the same size as the drawn/not drawn bitbase.

Tord

wgarvin · Post by **wgarvin** » Fri May 23, 2008 2:24 am

Don wrote:I've put much thought into that one. Several years ago I tried to produce a decision tree-like scheme to produce the win/loss/draw results in a much smaller form. I never had real success but I might try again.

Most databases are one-sided with a few minor exceptions. I believe a few simply rules may help a lot if you are satisfied with an imperfect solution. It would still be better than than not having ANY databases of course.

Has anyone every considered using bloom filters for an imperfect database substitute? Bloom filters store set membership in a compressed form. My guess is that RLE compression with standard databases is more compact that Bloom filters (and they are exact where bloom filters can give false positives although not false negatives.)

But I thought I would throw the idea out there of using bloom filters and I wonder if it's every been analyzed or even tried.

Bloom filters are an interesting idea.

The trick is to find a way of representing the info in the bitbases so that it can fit in RAM, but still be accessed randomly with a decent efficiency. I think that might be a problem for decision trees. For very lopsided bitbases, sparse array representations would be pretty good. But then again, those seem like the least interesting bitbases.

I've been thinking in terms of using a heuristic function to predict the result, and storing a bitbase of the differences. So to do a probe, you have to evaluate the heuristic function, access the "diffbase" and then combine the two results. If the heuristic function were good enough, this might make the "diffbase" mostly zeros, and then a sparse 2D array representation should be able to shrink it quite a bit. Unfortunately it means you need heuristic functions for each ending (with rules like "if the enemy king is within two squares of my king, and its my move, and ...") The same kinds of things you would put in your decision trees, actually.

I've been thinking about and talking about doing exactly this for more than 5 years now, which means I am probably too lazy to ever actually do it. (Part of my hang-up was that I wanted to start by building my own tablebases and bitbases that respected castling and the 50-move rule. It just doesn't seem worth doing it at all, unless I know that the bitbases I'm working with are 100% correct.) But if someone else got motivated enough to try this stuff, I would be very interested in hearing how it turns out! =)

Dann Corbit · Post by **Dann Corbit** » Fri May 23, 2008 2:48 am

hgm wrote:That seems a lot to me. It corresponds to a 7% better score, or 1 in 7 games where half a point was lost because of bungling a won or drawn position.

It must depend a lot on the level of the engines; Joker almost never gets into a 5-men position before the game is totally decided.

What bitbases could be responsible for such a large effect? f the 3+1, KBNK is the only one that Jker might bungle. But bitbases are no help there. With 2+2 it might bungle KQKR and KQKP, but also there bitbases are zero help.

Is it all due to KRPKR?

I think that 50 Elo was probably way overblown. I thought I recalled that number from an experiment long ago, but I can't find the post to verify it. None of the experiments I found showed significant strength.

In this link:
http://www.horizonchess.com/FAQ/Winboar ... ebase.html
There seems to be plenty of evidence of tablebase files weakening the search.
They did show +115 Elo for Yace using bitbase [but only when endgames were reached].

So I guess that it is probably a very minor contribution and my memory deceived me.

I would be interested to see a fresh experiment with modern equipment to see if the whole idea is worth any significant advantage at all.

I would think that with 64 bit memories and (perhaps) 6 man bitbase files fully in memory there would simply have to be some benefit, but then again, maybe the benefit of larger hash would be more.

Don · Post by **Don** » Fri May 23, 2008 3:14 am

wgarvin wrote:
Don wrote:I've put much thought into that one. Several years ago I tried to produce a decision tree-like scheme to produce the win/loss/draw results in a much smaller form. I never had real success but I might try again.

Most databases are one-sided with a few minor exceptions. I believe a few simply rules may help a lot if you are satisfied with an imperfect solution. It would still be better than than not having ANY databases of course.

Has anyone every considered using bloom filters for an imperfect database substitute? Bloom filters store set membership in a compressed form. My guess is that RLE compression with standard databases is more compact that Bloom filters (and they are exact where bloom filters can give false positives although not false negatives.)

But I thought I would throw the idea out there of using bloom filters and I wonder if it's every been analyzed or even tried.
Bloom filters are an interesting idea.

The trick is to find a way of representing the info in the bitbases so that it can fit in RAM, but still be accessed randomly with a decent efficiency. I think that might be a problem for decision trees. For very lopsided bitbases, sparse array representations would be pretty good. But then again, those seem like the least interesting bitbases.

I've been thinking in terms of using a heuristic function to predict the result, and storing a bitbase of the differences. So to do a probe, you have to evaluate the heuristic function, access the "diffbase" and then combine the two results. If the heuristic function were good enough, this might make the "diffbase" mostly zeros, and then a sparse 2D array representation should be able to shrink it quite a bit. Unfortunately it means you need heuristic functions for each ending (with rules like "if the enemy king is within two squares of my king, and its my move, and ...") The same kinds of things you would put in your decision trees, actually.

I've been thinking about and talking about doing exactly this for more than 5 years now, which means I am probably too lazy to ever actually do it. (Part of my hang-up was that I wanted to start by building my own tablebases and bitbases that respected castling and the 50-move rule. It just doesn't seem worth doing it at all, unless I know that the bitbases I'm working with are 100% correct.) But if someone else got motivated enough to try this stuff, I would be very interested in hearing how it turns out! =)

Actually, you could combine the two approaches, both yours and mine!

Suppose you have a heuristic function that is pretty good at returning the correct answer most of the time. All you need to know is when it's wrong. A bloom filter is a good choice for this. Bloom filters occasionally return a false positive but an exception list can be constructed to deal with that. A trick is to put the exceptions themselves in a much smaller bloom filter which can be an order of magnitude smaller.

The appeal of the bloom filter approach is that you can probably use your zobrist hash keys directly.

My hidden doubts concern Run Length Encoding which is typically used to compress end game databases. RLE can be very effective because positions tend to cluster into large chains of wins, losses or draws and thus may be more effective than bloom filters.

Bloom filters have the advantage of utter simplicity and are compact since you don't have to store the key. But it's not as compact as directly indexing an endgame database that returns only 2 states, for instance win or draw.

I'm sorely tempted to try it. As a feasibility test, I could try it on KpK just to see if I can get 90% of the positions correct with just 2 or 3 lines of code. I am estimating that this would reduce a 24K database to about 5k.

What is appealing about this idea is that it is a very simple methodology for making the time vs space trade off. You write the best evaluator you can, and the bloom filter does the rest of the work for you!

How much are bitbases worth?

Re: How much are bitbases worth?

Re: How much are bitbases worth?

Re: How much are bitbases worth?

Re: How much are bitbases worth?

Re: How much are bitbases worth?

Re: How much are bitbases worth?

Re: How much are bitbases worth?

Re: How much are bitbases worth?

Re: How much are bitbases worth?

Re: How much are bitbases worth?