Programming issue: engine speed slowing over time

Chessnut1071 · Post by **Chessnut1071** » Sat Jul 02, 2022 4:00 pm

hgm wrote: ↑Sat Jul 02, 2022 3:39 pm Since, as you say, the hash table cannot be made large enough to hold every position in a long search, one only keeps the positions that are most important, and overwrites those that are less important to make room for them. It turns out that most important positions are recently first-visited positions. Because these have the highest chance to be visited again in the very near future. A secondary priority is to keep entries resulting from a deep search. As these would save you very much effort when they are visited again, and supply the result of the search immediately.

With a large table the 'always-replace' scheme already works satisfactorily: you resolve any collision by just overwriting what is already there. So never rehash, never stack. Only probe one entry in the table; it either has the same key (hit), or it has a different one, and you overwrite it with the new result. This tends to keep the most recent results in the table.

A slightly refined scheme tries to give some protection to entries from a high-depth search, so they are not as easily overwritten as those from a d=0 or d=1 search. In this scheme you have buckets of 2, so you can resolve the first collision without overwriting anything. When a third position collides, you overwrite the entry with the lowest depth of those that already were there (no matter whether that depth is larger than that of the new entry). This way the highest depth that ever mapped into that bucket would be preserved forever.

Note that it is no big loss if occasionally two entries of high depth collide. This happens very rarely (as such entries themselves are already rare). When it does happen, one of the two gets 'quickly' lost by overwriting with a d=0 or d=1. Say you lose the result of a d=10 search that way. So during the search you don't get a hash hit that you would otherwise had gotten, and have to repeat the d=10 search. But most likely all d=9 daughters are still in the hash table, and will now give hash hits. So in fact you are only doing a d=1 search.

Very interesting description. I never spent a lot of time optimizing the hash table because it never became an issue until 18-ply and above. I thought I was done with that. Now, I'm looking at weeks and months of revisions <%#$!/%$#

Chessnut1071 · Post by **Chessnut1071** » Sat Jul 02, 2022 10:31 pm

Chessnut1071 wrote: ↑Sat Jul 02, 2022 4:00 pm
hgm wrote: ↑Sat Jul 02, 2022 3:39 pm Since, as you say, the hash table cannot be made large enough to hold every position in a long search, one only keeps the positions that are most important, and overwrites those that are less important to make room for them. It turns out that most important positions are recently first-visited positions. Because these have the highest chance to be visited again in the very near future. A secondary priority is to keep entries resulting from a deep search. As these would save you very much effort when they are visited again, and supply the result of the search immediately.

With a large table the 'always-replace' scheme already works satisfactorily: you resolve any collision by just overwriting what is already there. So never rehash, never stack. Only probe one entry in the table; it either has the same key (hit), or it has a different one, and you overwrite it with the new result. This tends to keep the most recent results in the table.

A slightly refined scheme tries to give some protection to entries from a high-depth search, so they are not as easily overwritten as those from a d=0 or d=1 search. In this scheme you have buckets of 2, so you can resolve the first collision without overwriting anything. When a third position collides, you overwrite the entry with the lowest depth of those that already were there (no matter whether that depth is larger than that of the new entry). This way the highest depth that ever mapped into that bucket would be preserved forever.

Note that it is no big loss if occasionally two entries of high depth collide. This happens very rarely (as such entries themselves are already rare). When it does happen, one of the two gets 'quickly' lost by overwriting with a d=0 or d=1. Say you lose the result of a d=10 search that way. So during the search you don't get a hash hit that you would otherwise had gotten, and have to repeat the d=10 search. But most likely all d=9 daughters are still in the hash table, and will now give hash hits. So in fact you are only doing a d=1 search.
Very interesting description. I never spent a lot of time optimizing the hash table because it never became an issue until 18-ply and above. I thought I was done with that. Now, I'm looking at weeks and months of revisions <%#$!/%$#

I revised my hash table to your suggestion of replace and zero stack. Although it still found every solution, the nodes searched, and subsequently the time, increased for every ply tested. Data as follows:

Zero stack with replacement:

Ply level, time/sec, nodes searched
6-ply, 0.751, 291,357
8-ply, 12,75, 4,300,917
10-ply 15.227, 4,730,358
12-ply, 67.5, 13,348,474

Optimum stack size = 10 + replacement of 1st element if full

Ply level, time/sec, nodes searched
6-ply, 0.72, 250,168
8-ply, 7.21, 2,249,254
10-ply 7.23, 2,279,810
12-ply, 16.71, 3,886,790

You benefit with a stack, although it’s much smaller than I thought. With no stack and just replacement the nodes searched explode as ply-level increases. In fact, the 18-ply is still running. I did include replacement on the 1st position and it does improve performance at higher ply., but, has little effect at lower ply.

hgm · Post by **hgm** » Sun Jul 03, 2022 7:45 am

I have no idea how your code looks, but I get the impression that the old hash design had a table with a small number of buckets, each bucket potentially containing a very large (possibly unlimited) number of entries. For the comparising with the new scheme, where you have only 1 or 2 entries per bucket, you must make sure the amount of memory used is about the same. So if before you had on average 150 entries per bucket in the stack, you must now make the number of buckets 150 or 75 times larger. Did you do that?

Chessnut1071 · Post by **Chessnut1071** » Sun Jul 03, 2022 1:41 pm

hgm wrote: ↑Sun Jul 03, 2022 7:45 am I have no idea how your code looks, but I get the impression that the old hash design had a table with a small number of buckets, each bucket potentially containing a very large (possibly unlimited) number of entries. For the comparising with the new scheme, where you have only 1 or 2 entries per bucket, you must make sure the amount of memory used is about the same. So if before you had on average 150 entries per bucket in the stack, you must now make the number of buckets 150 or 75 times larger. Did you do that?

First, I want to thank you for the small stack and replacement idea. At ply levels less than 16-ply I can find the solution looking at 1/2 the nodes. At higher ply I'm getting as much as 85% reduction in nodes and finding solutions that took too much time before. I didn't change my Table size, but, modified it to a LIFO queue of 4 - 10 for under 18-ply and 20-30 for 18-22 ply. The queue goes up slightly with ply. Code as follows:
StackMax = 10;
long[,] ZTable = new long[120000000, 2];[pgn]

void PushZ(byte ply, byte j, byte score)
{
long L, key;
int i;

// ZTable[key,0] = key
// ZTable[key,1] = score 1 = white wins; 2 = black wins

L = UpdateHashKey(ply, j);
key = L / 400;

push_hash_key = key;

i = 0;
L1: if(i > StackMax)
{
for (byte k = 1; k<=StackMax; k++) { ZTable[key + k, 0] = ZTable[key + k - 1, 0]; ZTable[key + k, 1] = ZTable[key + k - 1, 1]; }
ZTable[key, 0] = L; ZTable[key, 1] = score;
return;
}

if (ZTable[key + i, 0] == 0)
{
ZTable[key + i, 0] = L;
ZTable[key + i, 1] = score;
return;
}
if (score == ZTable[key, 1]) { ++i; goto L1; }

// Collusion, clear table
for (byte m = 1; m <= StackMax; m++) { ZTable[key + m, 0] = 0; ZTable[key + m, 1] = 0; }

return;

}

long PopZ(byte ply, byte j)
{
// ZTable[key,0] = key
// ZTable[key,1] = score 1 = white wins; 2 = black wins

long L, key, score;
int i;
score = 0;

L = UpdateHashKey(ply, j);
key = L / 400;

pop_hash_key = key;

i = 0;
L1: if (ZTable[key + i, 0] == 0 || ZTable[key + i,1] == 3) { return 0; }
score = ZTable[key + i, 1];
if (ZTable[key + i, 0] == L & score == ZTable[key,1])
{
return score;
}
++i; if (i <= StackMax) { goto L1; } // keep searching so long as same score

// not found
return 0;
}
[/pgn]
StackMax = 10;

Chessnut1071 · Post by **Chessnut1071** » Sun Jul 03, 2022 1:45 pm

hgm wrote: ↑Sun Jul 03, 2022 7:45 am I have no idea how your code looks, but I get the impression that the old hash design had a table with a small number of buckets, each bucket potentially containing a very large (possibly unlimited) number of entries. For the comparising with the new scheme, where you have only 1 or 2 entries per bucket, you must make sure the amount of memory used is about the same. So if before you had on average 150 entries per bucket in the stack, you must now make the number of buckets 150 or 75 times larger. Did you do that?

First, I want to thank you for the small stack and replacement idea. At ply levels less than 16-ply I can find the solution looking at 1/2 the nodes. At higher ply I'm getting as much as 85% reduction in nodes and finding solutions that took too much time before. I didn't change my Table size, but, modified it to a LIFO queue of 4 - 10 for under 18-ply and 20-30 for 18-22 ply. The queue goes up slightly with ply. Code as follows:

Code: Select all

StackMax = 10;
long[,] ZTable = new long[120000000, 2];

           void PushZ(byte ply, byte j, byte score)
            {
                long L, key;
                int i;

                // ZTable[key,0] = key
                // ZTable[key,1] = score  1 = white wins; 2 = black wins

                L = UpdateHashKey(ply, j);
                key = L / 400;

                push_hash_key = key;

                i = 0;
            L1: if(i > StackMax) 
                {
                    for (byte k = 1; k<=StackMax; k++) { ZTable[key + k, 0] = ZTable[key + k - 1, 0]; ZTable[key + k, 1] = ZTable[key + k - 1, 1]; }
                    ZTable[key, 0] = L; ZTable[key, 1] = score;
                    return;
                }

                if (ZTable[key + i, 0] == 0)
                {
                    ZTable[key + i, 0] = L;
                    ZTable[key + i, 1] = score;
                    return;
                }
                if (score == ZTable[key, 1]) { ++i; goto L1; }

                // Collusion, clear table
                for (byte m = 1; m <= StackMax; m++) { ZTable[key + m, 0] = 0; ZTable[key + m, 1] = 0; }

                return;

            }

            long PopZ(byte ply, byte j)
            {
                // ZTable[key,0] = key
                // ZTable[key,1] = score  1 = white wins; 2 = black wins

                long L, key, score;
                int i;
                score = 0;

                L = UpdateHashKey(ply, j);
                key = L / 400;

                pop_hash_key = key;

                i = 0;
            L1: if (ZTable[key + i, 0] == 0 || ZTable[key + i,1] == 3) { return 0; }
                score = ZTable[key + i, 1];
                if (ZTable[key + i, 0] == L & score == ZTable[key,1])
                {
                    return score;
                }               
                ++i; if (i <= StackMax) { goto L1; }  // keep searching so long as same score
              
                // not found
                return 0; 
            }

hgm · Post by **hgm** » Sun Jul 03, 2022 2:51 pm

It still looks a bit strange to me. Why would the verdict of having a collision depend on the score? You store many positions (each with L differing by a multiple of 400) in a row, in the table slots behind the slot where they actually map to ('key', for which the more commonly used term would be 'index'). This could make you run into positions that originally mapped to such a 'downstream' slot. You then have the choice of just continuing hunting for an empty slot behind those, or declaring a collision and erase a bunch of entries.

Why would it matter what the score of the position was that you encounter? I don't see any harm in continuing the hunt for an empty slot behind it if it had the opposit score.

You also evoke a very allergic reaction to having a 'collision': you erase an entire stretch of entries from the table. And then you don't even bother to store the one you just made room for. (I suppose this is a bug?) But even if you would have stored it, why would you erase anything more than just the one needed to make room for it? Why would you ever erase anything from the table before there is any need to do so?

In case hunting for an empty slot takes too long, you shift up the entire stretch you searched through one slot, to insert the new one at the beginning, and losing the last one. What is the point of that? The last position in the stretch will almost always be one that originally mapped to a differt slot than the one you are storing now; by the time you would get 10 hits on the same slot, the chances that you would not have had any hits at all on all 9 slots just behind it are virtually zero. Why kick out that position rather than just overwriting the first one in the stretch? (Which might also be a position that did not originally map there, but on an 'upstream' slot, and bounced off to a later slot.

I am not sure how your scoring works (e.g. what 3 means). But if you find the position (same L) but it isn't a win for either player, your PopZ routine returns the same as what it would do when the position is not in the table. Don't you store positions that are not resolved into a mate yet in the table? I think that would be a mistake, because before you solved the problem most positions would be like that, and many of those would occur multiple time in the tree, reached through transpositions. So you could save tons of work by not searching their sub-trees for a second time, again without finding a forced mate.

I also don't understand how you can get away with just storing a win/loss score. Normally one stores a measure for the distance to mate. So that you can still determine how long the mate took in a tree with hash hits. E.g. if a 10-ply search would not give you a forced mate, but an 11-ply search would find one, how would you know that it is a mate in 6 (= 11 ply), and not, say, a mate in 9? "Because I did not search that deep" is NOT the correct answer in case you are using a transposition table, because results from deep searches (i.e. made close to the root) could be stored there, and be successfully probed near a leaf of your search tree, effectively grafting the deep search tree on it. Engines regularly find checkmates way longer than they search.

Chessnut1071 · Post by **Chessnut1071** » Sun Jul 03, 2022 3:41 pm

hgm wrote: ↑Sun Jul 03, 2022 2:51 pm It still looks a bit strange to me. Why would the verdict of having a collision depend on the score? You store many positions (each with L differing by a multiple of 400) in a row, in the table slots behind the slot where they actually map to ('key', for which the more commonly used term would be 'index'). This could make you run into positions that originally mapped to such a 'downstream' slot. You then have the choice of just continuing hunting for an empty slot behind those, or declaring a collision and erase a bunch of entries.

Why would it matter what the score of the position was that you encounter? I don't see any harm in continuing the hunt for an empty slot behind it if it had the opposit score.

You also evoke a very allergic reaction to having a 'collision': you erase an entire stretch of entries from the table. And then you don't even bother to store the one you just made room for. (I suppose this is a bug?) But even if you would have stored it, why would you erase anything more than just the one needed to make room for it? Why would you ever erase anything from the table before there is any need to do so?

In case hunting for an empty slot takes too long, you shift up the entire stretch you searched through one slot, to insert the new one at the beginning, and losing the last one. What is the point of that? The last position in the stretch will almost always be one that originally mapped to a differt slot than the one you are storing now; by the time you would get 10 hits on the same slot, the chances that you would not have had any hits at all on all 9 slots just behind it are virtually zero. Why kick out that position rather than just overwriting the first one in the stretch? (Which might also be a position that did not originally map there, but on an 'upstream' slot, and bounced off to a later slot.

I am not sure how your scoring works (e.g. what 3 means). But if you find the position (same L) but it isn't a win for either player, your PopZ routine returns the same as what it would do when the position is not in the table. Don't you store positions that are not resolved into a mate yet in the table? I think that would be a mistake, because before you solved the problem most positions would be like that, and many of those would occur multiple time in the tree, reached through transpositions. So you could save tons of work by not searching their sub-trees for a second time, again without finding a forced mate.

I also don't understand how you can get away with just storing a win/loss score. Normally one stores a measure for the distance to mate. So that you can still determine how long the mate took in a tree with hash hits. E.g. if a 10-ply search would not give you a forced mate, but an 11-ply search would find one, how would you know that it is a mate in 6 (= 11 ply), and not, say, a mate in 9? "Because I did not search that deep" is NOT the correct answer in case you are using a transposition table, because results from deep searches (i.e. made close to the root) could be stored there, and be successfully probed near a leaf of your search tree, effectively grafting the deep search tree on it. Engines regularly find checkmates way longer than they search.

Your 1st point: Good eye on that one. That code was a left over from a problem I had with 2 positions mapping to the exact same key, an ent passant where the last move enabled a capture. Taking that out even as we write. Also, the 3 use to id a collision which also isn't used any more, but, was left over from issues I had with this table growing before. Also eliminating that code.

As far as the 6 - 30 length LIFO queue, that is needed. The most recent entries are usually the most probable hits. That's what really made the most improvement in my search, reducing significantly the number of nodes searched. I found you only need a small queue to be effective. Previously, I let that length grow to enormous lengths at 18 - 24 ply and that was what caused my degradation issue. That queue length should grow with deep ply searches to be more effective.

Also, I really don't need distance to mate, I already have it. When the key [index] is found that indicates a mate, the ply if hit already indicates the distance.

I'm sure the ZTable can be optimized further. This is what I needed to test the deep ply solutions I couldn't do before. thx for the comments!

hgm · Post by **hgm** » Sun Jul 03, 2022 4:51 pm

The most recent engines are the most-probable hits. But the problem is that your queues (which are FIFOs, btw, not LIFOs) overlap. So the one you kick out from (say) key+9 need not be the one that mapped to key 10 such hits ago. It can very well be an entry that is the most recent one that mapped to key+9. So far downstream from key it is actually very likely that what is in there did not map to key at all. So you are just randomly deleting entries, knowing nothing about their age.

What is also bad is that you might keep many entries that can no longer be found, just wasting space. Suppose the table is completely filled (as in a long search it will eventually be for most of the search). An entry mapping to key could eventually migrate to key+9. But then new arrivals on key+1 to key+8 would shift it up further without deleting, to key+10, key+11 etc. Since on a probe you would only test for the presence up to key+9, you would never find it.

It would be better not to have the queues overlap. E.g. with queue length 6, make sure that key always has a value that is a multiple of 6 (e.g. key = (L/(6*400))*6.

Also, I really don't need distance to mate, I already have it. When the key [index] is found that indicates a mate, the ply if hit already indicates the distance.

I already explained why this is not true. Read it again. And try to find a deep mate in the KRK end-game when you don't believe it.

Chessnut1071 · Post by **Chessnut1071** » Sun Jul 03, 2022 5:20 pm

hgm wrote: ↑Sun Jul 03, 2022 4:51 pm The most recent engines are the most-probable hits. But the problem is that your queues (which are FIFOs, btw, not LIFOs) overlap. So the one you kick out from (say) key+9 need not be the one that mapped to key 10 such hits ago. It can very well be an entry that is the most recent one that mapped to key+9. So far downstream from key it is actually very likely that what is in there did not map to key at all. So you are just randomly deleting entries, knowing nothing about their age.

What is also bad is that you might keep many entries that can no longer be found, just wasting space. Suppose the table is completely filled (as in a long search it will eventually be for most of the search). An entry mapping to key could eventually migrate to key+9. But then new arrivals on key+1 to key+8 would shift it up further without deleting, to key+10, key+11 etc. Since on a probe you would only test for the presence up to key+9, you would never find it.

It would be better not to have the queues overlap. E.g. with queue length 6, make sure that key always has a value that is a multiple of 6 (e.g. key = (L/(6*400))*6.

Also, I really don't need distance to mate, I already have it. When the key [index] is found that indicates a mate, the ply if hit already indicates the distance.
I already explained why this is not true. Read it again. And try to find a deep mate in the KRK end-game when you don't believe it.

Woooops, it is FIFO, thx for pointing that out. I was rushing to write that code. Now, it's LIFO. Modifying to some of your suggests and testing the effect on nodes searched. thx for the comments I'm taking the search to a new level finally.

hgm · Post by **hgm** » Sun Jul 03, 2022 10:42 pm

One more thought:

I think it really hurts when you dont store the search depth in the hash table. For the won positions this depth doesn't matter, because when you have a forced win it should stay a forced win no matter how much deeper you search. So forced wins can be used at any requested depth.

But this is not true for positions where you have not found the forced checkmate yet. I assume you do not store these in the table, because even if you did, you would not be able to use them for anything. You would always have to search then anyway, because you have no idea whether they were obtained by a search that is equally deep or even deeper than the one you try to do now, or whether is is from a shallower search. If you would know that, you could refrain from searching when the current search would not go deeper than the one that produced the table entry, and immediately accept that you would not find a mate.

Since virtually all nodes in the tree will be such 'undecided' nodes before you first find the checkmate, this basically means that your have to find the checkmate without the benefit of a hash table.

Programming issue: engine speed slowing over time

Re: Programming issue: engine speed slowing over time

Re: Programming issue: engine speed slowing over time

Re: Programming issue: engine speed slowing over time

Re: Programming issue: engine speed slowing over time

Re: Programming issue: engine speed slowing over time

Re: Programming issue: engine speed slowing over time

Re: Programming issue: engine speed slowing over time

Re: Programming issue: engine speed slowing over time

Re: Programming issue: engine speed slowing over time

Re: Programming issue: engine speed slowing over time