ThanksGerd Isenberg wrote:http://msdn.microsoft.com/en-us/library/84szxsww.aspxmcostalba wrote:Could you please post some portable prefetching code or post links to code, so that I can see an example of how to integrate prefetch in C++ ?Gerd Isenberg wrote: Yes, on my K8. It only works if you don't do too many other memory operations while "waiting" for the cacheline.
Thanks
Marco
Cache pollution when reading/writing hash table
Moderators: hgm, Rebel, chrisw
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Cache pollution when reading/writing hash table
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Cache pollution when reading/writing hash table
Wow !!! it worked !!!
It cut in half hash table access times. Following is VTune profiling:
Before
retrieve() 6,88 % (CPU_CLK_UNHALTED) 1,84 % (INST_RETIRED)
After
retrieve() 3,10 % (CPU_CLK_UNHALTED) 1,91 % (INST_RETIRED)
What it counts is the first value that takes in account memory accesses, the second simply counts needed instructions that are few anyway and are not a problem.
Thanks again
Marco
It cut in half hash table access times. Following is VTune profiling:
Before
retrieve() 6,88 % (CPU_CLK_UNHALTED) 1,84 % (INST_RETIRED)
After
retrieve() 3,10 % (CPU_CLK_UNHALTED) 1,91 % (INST_RETIRED)
What it counts is the first value that takes in account memory accesses, the second simply counts needed instructions that are few anyway and are not a problem.
Thanks again
Marco
-
- Posts: 2250
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Cache pollution when reading/writing hash table
Fine! How does it translate to % nps increase?mcostalba wrote:Wow !!! it worked !!!
It cut in half hash table access times. Following is VTune profiling:
Before
retrieve() 6,88 % (CPU_CLK_UNHALTED) 1,84 % (INST_RETIRED)
After
retrieve() 3,10 % (CPU_CLK_UNHALTED) 1,91 % (INST_RETIRED)
What it counts is the first value that takes in account memory accesses, the second simply counts needed instructions that are few anyway and are not a problem.
Thanks again
Marco
And Elo improvement
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Cache pollution when reading/writing hash table
Hi Gerd,Gerd Isenberg wrote: Fine! How does it translate to % nps increase?
And Elo improvement
after a begginers luck it seems that reality come back and asked its tribute
These are mine findings after two days of struggling...and still struggling
1) Under Windows Vista is impossible to speed test with precision because OS throttles the CPU at its wishes and I didn't found a way from preventing it to do it.
2) Prefetching should be of 64 bytes (a cache line) but it happens that it is only 32 bytes in reality becasue putting two in a row increases speed with VTune.
Code: Select all
char* addr = (char*)first_entry(posKey);
_mm_prefetch(addr, _MM_HINT_T0);
_mm_prefetch(addr+32, _MM_HINT_T0);
2) _mm_prefetch() intrinsics is reconized by Intel and sometime by gcc but only if you have Intel library installed, otherwise you need __builtin_prefetch() that is reconized by both gcc and Intel under Linux
3) But here comes the surprise ! __builtin_prefetch() is happily optimized away by Intel compiler. Doing an
objdump -C -S stockfish
shows that asm instruction prefetcht0 is disappeard.
4) Ok, then force a volatile with something like
Code: Select all
char* addr = (char*)first_entry(posKey);
asm volatile("prefetcht0 %0" :: "m" (addr));
5) The prefetch code does not seem to yield a speed increase, but again speed measures are not very reliable even under Linux, although better then under Windows.
So we are still fighting......
-
- Posts: 1243
- Joined: Sat Dec 13, 2008 7:00 pm
Re: Cache pollution when reading/writing hash table
Gerd is right, you are wrong.mcostalba wrote: 2) Prefetching should be of 64 bytes (a cache line) but it happens that it is only 32 bytes in reality becasue putting two in a row increases speed with VTune.
You forgot to align your hashtable entries to 64 byte boundaries.
-
- Posts: 27796
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Cache pollution when reading/writing hash table
Which in itself already produces a significant speedup...
-
- Posts: 1922
- Joined: Thu Mar 09, 2006 12:51 am
- Location: Earth
Re: Cache pollution when reading/writing hash table
Or he is using a P4. EDIT: I thought I remembered the P4 cache line being 32 bytes, but it appears it's actually 64. Nevermind.
-
- Posts: 1243
- Joined: Sat Dec 13, 2008 7:00 pm
Re: Cache pollution when reading/writing hash table
I thought it was 128
-
- Posts: 1922
- Joined: Thu Mar 09, 2006 12:51 am
- Location: Earth
Re: Cache pollution when reading/writing hash table
According to this: http://ftinetti.googlepages.com/art_2.pdf
L1 was 64, L2 was 128. I guess I was thinking about PIIs which are 32. At least we can agree that P4 was a really shitty CPU.
L1 was 64, L2 was 128. I guess I was thinking about PIIs which are 32. At least we can agree that P4 was a really shitty CPU.
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Cache pollution when reading/writing hash table
I don't think so. At the very least, I wouldn't believe this claim without evidenceGian-Carlo Pascutto wrote: You forgot to align your hashtable entries to 64 byte boundaries.