Cache pollution when reading/writing hash table

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Cache pollution when reading/writing hash table

Post by mcostalba »

Gerd Isenberg wrote:
mcostalba wrote:
Gerd Isenberg wrote: Yes, on my K8. It only works if you don't do too many other memory operations while "waiting" for the cacheline.
Could you please post some portable prefetching code or post links to code, so that I can see an example of how to integrate prefetch in C++ ?

Thanks
Marco
http://msdn.microsoft.com/en-us/library/84szxsww.aspx
Thanks :)
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Cache pollution when reading/writing hash table

Post by mcostalba »

Wow !!! it worked !!!

It cut in half hash table access times. Following is VTune profiling:

Before

retrieve() 6,88 % (CPU_CLK_UNHALTED) 1,84 % (INST_RETIRED)

After

retrieve() 3,10 % (CPU_CLK_UNHALTED) 1,91 % (INST_RETIRED)


What it counts is the first value that takes in account memory accesses, the second simply counts needed instructions that are few anyway and are not a problem.

Thanks again
Marco
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Cache pollution when reading/writing hash table

Post by Gerd Isenberg »

mcostalba wrote:Wow !!! it worked !!!

It cut in half hash table access times. Following is VTune profiling:

Before

retrieve() 6,88 % (CPU_CLK_UNHALTED) 1,84 % (INST_RETIRED)

After

retrieve() 3,10 % (CPU_CLK_UNHALTED) 1,91 % (INST_RETIRED)


What it counts is the first value that takes in account memory accesses, the second simply counts needed instructions that are few anyway and are not a problem.

Thanks again
Marco
Fine! How does it translate to % nps increase?
And Elo improvement ;-)
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Cache pollution when reading/writing hash table

Post by mcostalba »

Gerd Isenberg wrote: Fine! How does it translate to % nps increase?
And Elo improvement ;-)
Hi Gerd,

after a begginers luck it seems that reality come back and asked its tribute :-(

These are mine findings after two days of struggling...and still struggling

1) Under Windows Vista is impossible to speed test with precision because OS throttles the CPU at its wishes and I didn't found a way from preventing it to do it.

2) Prefetching should be of 64 bytes (a cache line) but it happens that it is only 32 bytes in reality becasue putting two in a row increases speed with VTune.

Code: Select all

   char* addr = (char*)first_entry(posKey);
  _mm_prefetch(addr, _MM_HINT_T0);
  _mm_prefetch(addr+32, _MM_HINT_T0);
But the most funny things happened under Linux

2) _mm_prefetch() intrinsics is reconized by Intel and sometime by gcc but only if you have Intel library installed, otherwise you need __builtin_prefetch() that is reconized by both gcc and Intel under Linux

3) But here comes the surprise ! __builtin_prefetch() is happily optimized away by Intel compiler. Doing an

objdump -C -S stockfish

shows that asm instruction prefetcht0 is disappeard.

4) Ok, then force a volatile with something like

Code: Select all

  char* addr = (char*)first_entry(posKey);
  asm volatile("prefetcht0 %0" :: "m" (addr));
This works both for gcc and for Intel that now finally produce the prefetcht0 in the asm.

5) The prefetch code does not seem to yield a speed increase, but again speed measures are not very reliable even under Linux, although better then under Windows.


So we are still fighting......
Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: Cache pollution when reading/writing hash table

Post by Gian-Carlo Pascutto »

mcostalba wrote: 2) Prefetching should be of 64 bytes (a cache line) but it happens that it is only 32 bytes in reality becasue putting two in a row increases speed with VTune.
Gerd is right, you are wrong.

You forgot to align your hashtable entries to 64 byte boundaries.
User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Cache pollution when reading/writing hash table

Post by hgm »

Which in itself already produces a significant speedup...
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: Cache pollution when reading/writing hash table

Post by Zach Wegner »

Or he is using a P4. EDIT: I thought I remembered the P4 cache line being 32 bytes, but it appears it's actually 64. Nevermind.
Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: Cache pollution when reading/writing hash table

Post by Gian-Carlo Pascutto »

I thought it was 128 :)
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: Cache pollution when reading/writing hash table

Post by Zach Wegner »

According to this: http://ftinetti.googlepages.com/art_2.pdf

L1 was 64, L2 was 128. I guess I was thinking about PIIs which are 32. At least we can agree that P4 was a really shitty CPU. :)
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Cache pollution when reading/writing hash table

Post by mcostalba »

Gian-Carlo Pascutto wrote: You forgot to align your hashtable entries to 64 byte boundaries.
I don't think so. At the very least, I wouldn't believe this claim without evidence :-)