Cache pollution when reading/writing hash table

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 7:17 pm

Cache pollution when reading/writing hash table

Post by mcostalba » Sun Aug 09, 2009 9:27 am

When we store/load an entry in hash table (TT) it is uncommon we access it again in the near future.

So it could be an idea to avoid polluting the cache using MOVNTQ SIMD instruction or something similar.

Has someone experience on this subject ?

I have found nothing browsing around.

Thanks
Marco

Gian-Carlo Pascutto
Posts: 1184
Joined: Sat Dec 13, 2008 6:00 pm
Contact:

Re: Cache pollution when reading/writing hash table

Post by Gian-Carlo Pascutto » Sun Aug 09, 2009 10:04 am

mcostalba wrote:When we store/load an entry in hash table (TT) it is uncommon we access it again in the near future.
I don't think so. At the very least, I wouldn't believe this claim without evidence :)

User avatar
hgm
Posts: 23716
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Cache pollution when reading/writing hash table

Post by hgm » Sun Aug 09, 2009 10:07 am

I am not sure this would be benificial. Certainly not in my engines, which heavily rely on IID (and thus repeat the same hash probe quite often), and have a cache footprint so small that all infrequent accesses like hash probes simply map to a cache way that would otherwise be unused.

But even for engines that don't do that, there is the issue of true transpositions. A depth-first tree walk in Chess is such that when a position is visited for the first time, very quickly thereafter it is revisited through a path that transposes the moves just before it. E.g. the previous 4 plies could potentially be transposed in 4 ways (2 white x 2 black), if the moves are independent. If this occurs in (say) the last 6 plies of the search, the number of nodes visited in between these probes is small enough that they all fit in cache. So these quick revisits would all be cache hits. Only revisits by transposing moves much closer to the root invole sub-trees that are too large to fit in cache.

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 7:17 pm

Re: Cache pollution when reading/writing hash table

Post by mcostalba » Sun Aug 09, 2009 11:12 am

Thanks for your answer. Actually I also am not sure that it happens. I had this idea because profiling with VTune it shows that accessing the hash table almost always is very costly, it means a RAM access is needed most of the times.

But I actually never measured what is the % of RAM access vs L1/L2 accesses. The only thing I can say is that that single look up is the most time consuming function in Stockfish, due to the fact that hash probe is done also in qsearch().

Unfortunatly removing hash probing from qsearch seems to lower the strength in SF. This is due because evaluation is very costly and empty TT slots are also reused cache evaluation values.

Gerd Isenberg
Posts: 2127
Joined: Wed Mar 08, 2006 7:47 pm
Location: Hattingen, Germany

Re: Cache pollution when reading/writing hash table

Post by Gerd Isenberg » Sun Aug 09, 2009 11:15 am

mcostalba wrote:When we store/load an entry in hash table (TT) it is uncommon we access it again in the near future.

So it could be an idea to avoid polluting the cache using MOVNTQ SIMD instruction or something similar.

Has someone experience on this subject ?

I have found nothing browsing around.

Thanks
Marco
I actually use SSE2 simd instructions for loading storing hash entries. Bypassing the cache was not beneficial. What works with my approach is prefetching to L1, and then hiding the latency with register intensive on the fly SSE2 kogge-stone attack generation before actually probing TT (at all nodes, even in qsearch).

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 7:17 pm

Re: Cache pollution when reading/writing hash table

Post by mcostalba » Sun Aug 09, 2009 11:20 am

Gerd Isenberg wrote:
mcostalba wrote:When we store/load an entry in hash table (TT) it is uncommon we access it again in the near future.

So it could be an idea to avoid polluting the cache using MOVNTQ SIMD instruction or something similar.

Has someone experience on this subject ?

I have found nothing browsing around.

Thanks
Marco
I actually use SSE2 simd instructions for loading storing hash entries. Bypassing the cache was not beneficial. What works with my approach is prefetching to L1, and then hiding the latency with register intensive on the fly SSE2 kogge-stone attack generation before actually probing TT (at all nodes, even in qsearch).
Thanks !!! this is really useful.

You mean that while you waiting for RAM data to be loaded you do attack calculation ?

At a first quick glance is not clear to me what I _could_ do during a TT.retireve() but the idea is very good.

Thanks

Gerd Isenberg
Posts: 2127
Joined: Wed Mar 08, 2006 7:47 pm
Location: Hattingen, Germany

Re: Cache pollution when reading/writing hash table

Post by Gerd Isenberg » Sun Aug 09, 2009 11:27 am

mcostalba wrote:
Gerd Isenberg wrote:
mcostalba wrote:When we store/load an entry in hash table (TT) it is uncommon we access it again in the near future.

So it could be an idea to avoid polluting the cache using MOVNTQ SIMD instruction or something similar.

Has someone experience on this subject ?

I have found nothing browsing around.

Thanks
Marco
I actually use SSE2 simd instructions for loading storing hash entries. Bypassing the cache was not beneficial. What works with my approach is prefetching to L1, and then hiding the latency with register intensive on the fly SSE2 kogge-stone attack generation before actually probing TT (at all nodes, even in qsearch).
Thanks !!! this is really useful.

You mean that while you waiting for RAM data to be loaded you do attack calculation ?

At a first quick glance is not clear to me what I _could_ do during a TT.retireve() but the idea is very good.

Thanks
Yes, on my K8. It only works if you don't do too many other memory operations while "waiting" for the cacheline.

Allard Siemelink
Posts: 297
Joined: Fri Jun 30, 2006 7:30 pm
Location: Netherlands
Contact:

Re: Cache pollution when reading/writing hash table

Post by Allard Siemelink » Sun Aug 09, 2009 11:41 am

Incidentally, I tried this last week in my engine Spark.
movntq resulted in a slight slowdown vs movdqu/a.
Like Gerd, I have been using prefetches (right after a move has been made), this resulted in a speedup of a few % IIRC.
mcostalba wrote:When we store/load an entry in hash table (TT) it is uncommon we access it again in the near future.

So it could be an idea to avoid polluting the cache using MOVNTQ SIMD instruction or something similar.

Has someone experience on this subject ?

I have found nothing browsing around.

Thanks
Marco

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 7:17 pm

Re: Cache pollution when reading/writing hash table

Post by mcostalba » Sun Aug 09, 2009 11:54 am

Gerd Isenberg wrote: Yes, on my K8. It only works if you don't do too many other memory operations while "waiting" for the cacheline.
Could you please post some portable prefetching code or post links to code, so that I can see an example of how to integrate prefetch in C++ ?

Thanks
Marco

Gerd Isenberg
Posts: 2127
Joined: Wed Mar 08, 2006 7:47 pm
Location: Hattingen, Germany

Re: Cache pollution when reading/writing hash table

Post by Gerd Isenberg » Sun Aug 09, 2009 12:01 pm

mcostalba wrote:
Gerd Isenberg wrote: Yes, on my K8. It only works if you don't do too many other memory operations while "waiting" for the cacheline.
Could you please post some portable prefetching code or post links to code, so that I can see an example of how to integrate prefetch in C++ ?

Thanks
Marco
http://msdn.microsoft.com/en-us/library/84szxsww.aspx

Post Reply