Cache pollution when reading/writing hash table

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Cache pollution when reading/writing hash table

Post by mcostalba »

When we store/load an entry in hash table (TT) it is uncommon we access it again in the near future.

So it could be an idea to avoid polluting the cache using MOVNTQ SIMD instruction or something similar.

Has someone experience on this subject ?

I have found nothing browsing around.

Thanks
Marco
Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: Cache pollution when reading/writing hash table

Post by Gian-Carlo Pascutto »

mcostalba wrote:When we store/load an entry in hash table (TT) it is uncommon we access it again in the near future.
I don't think so. At the very least, I wouldn't believe this claim without evidence :)
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Cache pollution when reading/writing hash table

Post by hgm »

I am not sure this would be benificial. Certainly not in my engines, which heavily rely on IID (and thus repeat the same hash probe quite often), and have a cache footprint so small that all infrequent accesses like hash probes simply map to a cache way that would otherwise be unused.

But even for engines that don't do that, there is the issue of true transpositions. A depth-first tree walk in Chess is such that when a position is visited for the first time, very quickly thereafter it is revisited through a path that transposes the moves just before it. E.g. the previous 4 plies could potentially be transposed in 4 ways (2 white x 2 black), if the moves are independent. If this occurs in (say) the last 6 plies of the search, the number of nodes visited in between these probes is small enough that they all fit in cache. So these quick revisits would all be cache hits. Only revisits by transposing moves much closer to the root invole sub-trees that are too large to fit in cache.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Cache pollution when reading/writing hash table

Post by mcostalba »

Thanks for your answer. Actually I also am not sure that it happens. I had this idea because profiling with VTune it shows that accessing the hash table almost always is very costly, it means a RAM access is needed most of the times.

But I actually never measured what is the % of RAM access vs L1/L2 accesses. The only thing I can say is that that single look up is the most time consuming function in Stockfish, due to the fact that hash probe is done also in qsearch().

Unfortunatly removing hash probing from qsearch seems to lower the strength in SF. This is due because evaluation is very costly and empty TT slots are also reused cache evaluation values.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Cache pollution when reading/writing hash table

Post by Gerd Isenberg »

mcostalba wrote:When we store/load an entry in hash table (TT) it is uncommon we access it again in the near future.

So it could be an idea to avoid polluting the cache using MOVNTQ SIMD instruction or something similar.

Has someone experience on this subject ?

I have found nothing browsing around.

Thanks
Marco
I actually use SSE2 simd instructions for loading storing hash entries. Bypassing the cache was not beneficial. What works with my approach is prefetching to L1, and then hiding the latency with register intensive on the fly SSE2 kogge-stone attack generation before actually probing TT (at all nodes, even in qsearch).
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Cache pollution when reading/writing hash table

Post by mcostalba »

Gerd Isenberg wrote:
mcostalba wrote:When we store/load an entry in hash table (TT) it is uncommon we access it again in the near future.

So it could be an idea to avoid polluting the cache using MOVNTQ SIMD instruction or something similar.

Has someone experience on this subject ?

I have found nothing browsing around.

Thanks
Marco
I actually use SSE2 simd instructions for loading storing hash entries. Bypassing the cache was not beneficial. What works with my approach is prefetching to L1, and then hiding the latency with register intensive on the fly SSE2 kogge-stone attack generation before actually probing TT (at all nodes, even in qsearch).
Thanks !!! this is really useful.

You mean that while you waiting for RAM data to be loaded you do attack calculation ?

At a first quick glance is not clear to me what I _could_ do during a TT.retireve() but the idea is very good.

Thanks
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Cache pollution when reading/writing hash table

Post by Gerd Isenberg »

mcostalba wrote:
Gerd Isenberg wrote:
mcostalba wrote:When we store/load an entry in hash table (TT) it is uncommon we access it again in the near future.

So it could be an idea to avoid polluting the cache using MOVNTQ SIMD instruction or something similar.

Has someone experience on this subject ?

I have found nothing browsing around.

Thanks
Marco
I actually use SSE2 simd instructions for loading storing hash entries. Bypassing the cache was not beneficial. What works with my approach is prefetching to L1, and then hiding the latency with register intensive on the fly SSE2 kogge-stone attack generation before actually probing TT (at all nodes, even in qsearch).
Thanks !!! this is really useful.

You mean that while you waiting for RAM data to be loaded you do attack calculation ?

At a first quick glance is not clear to me what I _could_ do during a TT.retireve() but the idea is very good.

Thanks
Yes, on my K8. It only works if you don't do too many other memory operations while "waiting" for the cacheline.
Allard Siemelink
Posts: 297
Joined: Fri Jun 30, 2006 9:30 pm
Location: Netherlands

Re: Cache pollution when reading/writing hash table

Post by Allard Siemelink »

Incidentally, I tried this last week in my engine Spark.
movntq resulted in a slight slowdown vs movdqu/a.
Like Gerd, I have been using prefetches (right after a move has been made), this resulted in a speedup of a few % IIRC.
mcostalba wrote:When we store/load an entry in hash table (TT) it is uncommon we access it again in the near future.

So it could be an idea to avoid polluting the cache using MOVNTQ SIMD instruction or something similar.

Has someone experience on this subject ?

I have found nothing browsing around.

Thanks
Marco
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Cache pollution when reading/writing hash table

Post by mcostalba »

Gerd Isenberg wrote: Yes, on my K8. It only works if you don't do too many other memory operations while "waiting" for the cacheline.
Could you please post some portable prefetching code or post links to code, so that I can see an example of how to integrate prefetch in C++ ?

Thanks
Marco
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Cache pollution when reading/writing hash table

Post by Gerd Isenberg »

mcostalba wrote:
Gerd Isenberg wrote: Yes, on my K8. It only works if you don't do too many other memory operations while "waiting" for the cacheline.
Could you please post some portable prefetching code or post links to code, so that I can see an example of how to integrate prefetch in C++ ?

Thanks
Marco
http://msdn.microsoft.com/en-us/library/84szxsww.aspx