For or against the transposition table probe in quiet search?

Kotlov · Post by **Kotlov** » Tue May 11, 2021 10:05 am

flok · Post by **flok** » Tue May 11, 2021 10:14 am

Kotlov wrote: ↑Tue May 11, 2021 10:05 am For or against the transposition table probe in quiet search?

I believe that it is best to try.
For my programs, it never helped. ELO actually became less.

niel5946 · Post by **niel5946** » Tue May 11, 2021 10:36 am

Kotlov wrote: ↑Tue May 11, 2021 10:05 am For or against the transposition table probe in quiet search?

Quiescence search will usually be so fast that a transposition table probe will slow it down very significantly. One thing I tried was to probe in alphabeta before diving into quiescence. This way, we wont risk futile probes further down in quiescence, but we'll still be able to save some quiescence searches if the transposition table has an entry for the position.
I didn't measure any noticeable difference, but it is worth a try.

Kotlov · Post by **Kotlov** » Tue May 11, 2021 10:41 am

The idea is that a hit is good anyway, but the probability of a hit is extremely low compared to the cost of accessing RAM.

niel5946 · Post by **niel5946** » Tue May 11, 2021 11:02 am

Kotlov wrote: ↑Tue May 11, 2021 10:41 am The idea is that a hit is good anyway, but the probability of a hit is extremely low compared to the cost of accessing RAM.

Yes, and that is why it is a good idea to try probing before even entering quiescence. But never probe inside quiescence, because once you're there, it'll be really fast.

hgm · Post by **hgm** » Tue May 11, 2021 12:58 pm

I know that Crafty did not probe TT in QS. So I tried that in my engines too, and it always was a significant Elo loss. But I suppose it depends a lot on CPU speed and memory bandwidth. My engines were all single-threadeded, but nowadays multiple cores usually share a single memory interface. So it is much easier to become memory bound.

Of course you can make an extra TT dedicated for QS, which is small enough to always fit in cache. TT stores in the main search could then go both into the main TT and in this dedicated (always-replace) table, while QS only probes and stores there. That would enable you to catch transpositions within the same QS, or perhaps in the sub-tree of the last few plies near the horizon.

I have been thinking about techniques for speeding up the hash probe; it hardly helps to make a search that is 10 times faster than the usual techniques when that means it is now waiting idle for the TT probe to come in 99% of the time, instead of 90%... One possible technique is 'TT focusing', where you try to map positions in such a way into the TT that there is more chance for a cache hit. Memory data is always read per cache line of 64 bytes, and it is common practice to make that coincide with a 'hash bucket' of 4-7 TT entries, and use a replacement scheme that works entirely within the bucket. But even when you get a hit there, the other positons stored in the bucket are completely unrelated to the sought one. So the chances that they would be useful is not better than average.

This could be improved by not randomly distributing positions through the TT, but make similar positions preferably to the same buckets. (Without overdoing that, or a huge sub-tree of similar positions would all map to the same entry.) One aspect that usually changes only slowly within the tree is the Pawn constellation. So you could index the main TT by Pawn key, and only use the main hash key as signature. Then a large sub-tree would only use a small part of the hash table, as the number of different Pawn constellations in it is not nearly as large as the number of nodes. This would probably be over-doing it, as the factor between those would be too large. Then the total number of available entries for the sub-tree would be too small to hold it. But you can derive the index partly from the Pawn key and partly from the main hash key (some bits from the one, some bits from the other).

Another fact that could be exploited is that the overwhelming majority of hash hits is on the always-replace entry of the bucket. (I.e. the place where everything goes that is not more valuable than anything that already was in the bucket.) And if the bucket has only one of those, this means that all the other entries that are brought with it in cache have an extra low probability to ever cause a hit before the bucket is flushed out of the cache again. It would be better if a bucket contains as many always-replace slots as possible; then a much larger fraction of the cache would be populated with entries that have a high hit rate.

E.g. if an entry requires 9 bytes, you could put 6 of those as always-replace enries in a bucket (54 bytes). The key decides which of those will be used for a iven position, so you would only have to probe one of those.The remaining 10 bytes could be used to store one byte of the hash signature for 7 depth-preferred entries which are in another bucket, plus the depth of the least-deep entry in that bucket. When the always-replace entry did not cntain the sought position, you could try whether any of the entries in that other bucket could hold it, without actually accessing that bucket, but by testing if those 8 bits of the key match. Only if you have a match you would compare the remainder of the hash sinature in the other bucket; there only is a 2.8% probability (7 entries x 1/256) you access the other entry in vain. (And if this is unacceptably high, you could store more key bits of the depth-preferred entries in the always-replace bucket.) The depth of the least-deep entry would tell you whether the sought position would qualify for storage in the other bucket, which it usually would not. This way the cache would remain populated by always-replace buckets, which have higher probablilty to be hit by a probe.

These techniques only help when you can get the cache-hit rate sufficiently close to 1, though. If 99.9% of the hash probes misses the cache, improvin the hit rate by a factor 10 would stll leave 99% that need a memory access.

chrisw · Post by **chrisw** » Tue May 11, 2021 1:51 pm

Kotlov wrote: ↑Tue May 11, 2021 10:41 am The idea is that a hit is good anyway, but the probability of a hit is extremely low compared to the cost of accessing RAM.

Did you test using pre-fetch?

mar · Post by **mar** » Tue May 11, 2021 1:52 pm

Kotlov wrote: ↑Tue May 11, 2021 10:05 am For or against the transposition table probe in quiet search?

I both probe and store in QS, disabling this always leads to a measurable elo loss for me

Kotlov · Post by **Kotlov** » Tue May 11, 2021 2:07 pm

chrisw wrote: ↑Tue May 11, 2021 1:51 pm
Kotlov wrote: ↑Tue May 11, 2021 10:41 am The idea is that a hit is good anyway, but the probability of a hit is extremely low compared to the cost of accessing RAM.
Did you test using pre-fetch?

No

Kotlov · Post by **Kotlov** » Tue May 11, 2021 2:10 pm

hgm wrote: ↑Tue May 11, 2021 12:58 pm I know that Crafty did not probe TT in QS. So I tried that in my engines too, and it always was a significant Elo loss. But I suppose it depends a lot on CPU speed and memory bandwidth. My engines were all single-threadeded, but nowadays multiple cores usually share a single memory interface. So it is much easier to become memory bound.

Of course you can make an extra TT dedicated for QS, which is small enough to always fit in cache. TT stores in the main search could then go both into the main TT and in this dedicated (always-replace) table, while QS only probes and stores there. That would enable you to catch transpositions within the same QS, or perhaps in the sub-tree of the last few plies near the horizon.

I have been thinking about techniques for speeding up the hash probe; it hardly helps to make a search that is 10 times faster than the usual techniques when that means it is now waiting idle for the TT probe to come in 99% of the time, instead of 90%... One possible technique is 'TT focusing', where you try to map positions in such a way into the TT that there is more chance for a cache hit. Memory data is always read per cache line of 64 bytes, and it is common practice to make that coincide with a 'hash bucket' of 4-7 TT entries, and use a replacement scheme that works entirely within the bucket. But even when you get a hit there, the other positons stored in the bucket are completely unrelated to the sought one. So the chances that they would be useful is not better than average.

This could be improved by not randomly distributing positions through the TT, but make similar positions preferably to the same buckets. (Without overdoing that, or a huge sub-tree of similar positions would all map to the same entry.) One aspect that usually changes only slowly within the tree is the Pawn constellation. So you could index the main TT by Pawn key, and only use the main hash key as signature. Then a large sub-tree would only use a small part of the hash table, as the number of different Pawn constellations in it is not nearly as large as the number of nodes. This would probably be over-doing it, as the factor between those would be too large. Then the total number of available entries for the sub-tree would be too small to hold it. But you can derive the index partly from the Pawn key and partly from the main hash key (some bits from the one, some bits from the other).

Another fact that could be exploited is that the overwhelming majority of hash hits is on the always-replace entry of the bucket. (I.e. the place where everything goes that is not more valuable than anything that already was in the bucket.) And if the bucket has only one of those, this means that all the other entries that are brought with it in cache have an extra low probability to ever cause a hit before the bucket is flushed out of the cache again. It would be better if a bucket contains as many always-replace slots as possible; then a much larger fraction of the cache would be populated with entries that have a high hit rate.

E.g. if an entry requires 9 bytes, you could put 6 of those as always-replace enries in a bucket (54 bytes). The key decides which of those will be used for a iven position, so you would only have to probe one of those.The remaining 10 bytes could be used to store one byte of the hash signature for 7 depth-preferred entries which are in another bucket, plus the depth of the least-deep entry in that bucket. When the always-replace entry did not cntain the sought position, you could try whether any of the entries in that other bucket could hold it, without actually accessing that bucket, but by testing if those 8 bits of the key match. Only if you have a match you would compare the remainder of the hash sinature in the other bucket; there only is a 2.8% probability (7 entries x 1/256) you access the other entry in vain. (And if this is unacceptably high, you could store more key bits of the depth-preferred entries in the always-replace bucket.) The depth of the least-deep entry would tell you whether the sought position would qualify for storage in the other bucket, which it usually would not. This way the cache would remain populated by always-replace buckets, which have higher probablilty to be hit by a probe.

These techniques only help when you can get the cache-hit rate sufficiently close to 1, though. If 99.9% of the hash probes misses the cache, improvin the hit rate by a factor 10 would stll leave 99% that need a memory access.

As always, thank you for such a detailed answer!

For or against the transposition table probe in quiet search?

For or against the transposition table probe in quiet search?

Re: For or against the transposition table probe in quiet search?

Re: For or against the transposition table probe in quiet search?

Re: For or against the transposition table probe in quiet search?

Re: For or against the transposition table probe in quiet search?

Re: For or against the transposition table probe in quiet search?

Re: For or against the transposition table probe in quiet search?

Re: For or against the transposition table probe in quiet search?

Re: For or against the transposition table probe in quiet search?

Re: For or against the transposition table probe in quiet search?