The effect of larger hash size

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

jacobbl
Posts: 80
Joined: Wed Feb 17, 2010 3:57 pm

The effect of larger hash size

Post by jacobbl »

Hi, I've been testing the effect of increasing the hash size, and it doesn't seem to help my engine Sjakk. I use a double hash table of 1 mill entries, and I tried to increase this to 5 mill entries. The engine does about 1 mill nps.

When I tested this with CCRL adjusted time control to about 40/20 I got the following result:

Sjakk1Mill 54,4%, Sjakk5Mill 45,6%

So the old version played better. I only played about 500 games, which of course is a bit few, but testing at these time limits take time... Still 500 games with these scores should be significant to tell that the 5Mill version is not significant better.

Using 5 mill entries slows the engine marginally, but not something that should reduce the score in any significant manner.

Does these results indicate a bug with the hashing or shouldn't I suspect any gain from increasing the hash size?

Regards
Jacob
User avatar
Evert
Posts: 2929
Joined: Sat Jan 22, 2011 12:42 am
Location: NL

Re: The effect of larger hash size

Post by Evert »

If you haven't already I suggest making the number of entries in the table a power of two. That way you can calculate the index from the hash key using a bit-wise and (key & (nelem-1)), which is (much) faster than the modulo (key%nelem) you have to use if it isn't.

Don't know that that would explain your difference though. What replacement strategy are you using? I think I used to see worse performance for larger hash sizes with Jazz, but that effect went away after bug fixing (don't remember what the bug I fixed was though; I think I was trying to be too clever with the replacement scheme).
User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: The effect of larger hash size

Post by hgm »

With a good replacement scheme the search speed is only very weakly dependent on the hash size (and of course the tree must be much larger than the has size to see any effect at all). With equidistributed-depth replacement I found that making the table 4096 times as small only slowed down the search (time-to-depth) by a factor two. That suggests it scales as a twelfth root. (Which again would correspond to 8 Elo per doubling of the hash size.)
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: The effect of larger hash size

Post by Rebel »

jacobbl wrote: Using 5 mill entries slows the engine marginally....
Shouldn't be the case indeed.
Does these results indicate a bug with the hashing or shouldn't I suspect any gain from increasing the hash size?
Likely a bug yes. Another possibility is that you are using the TT for other purposes also.
jacobbl
Posts: 80
Joined: Wed Feb 17, 2010 3:57 pm

Re: The effect of larger hash size

Post by jacobbl »

Thanks for your help, changing away from modulo to bit operation gave a marginal increase in nps.

I've looked through my replacement strategy, and I might have a bug on my aging variable. I will have a deeper look into it.

I also never replace a PV node with a none PV node, but in my second hash table I allways replace so I don't think old PV nodes will be a problem.

Jacob
Volker Annuss
Posts: 180
Joined: Mon Sep 03, 2007 9:15 am

Re: The effect of larger hash size

Post by Volker Annuss »

Evert wrote:If you haven't already I suggest making the number of entries in the table a power of two. That way you can calculate the index from the hash key using a bit-wise and (key & (nelem-1)), which is (much) faster than the modulo (key%nelem) you have to use if it isn't.
You can use hash tables of any size without modulo. It is not as fast as bitwise-and but still much faster than modulo even in 32 bit code.

Code: Select all

((key & 0xffffffffull) * nelem) >> 32
or

Code: Select all

((key >> 32) * nelem) >> 32
mar
Posts: 2555
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: The effect of larger hash size

Post by mar »

Volker Annuss wrote: You can use hash tables of any size without modulo. It is not as fast as bitwise-and but still much faster than modulo even in 32 bit code.

Code: Select all

((key & 0xffffffffull) * nelem) >> 32
or

Code: Select all

((key >> 32) * nelem) >> 32
Simple and clever, Volker :)
This access on a x86 should boil down to doing a single mul/use edx as index so it should be very fast.
Avoiding divisions is especially advisable on ARM as it doesn't have (not sure if it still holds for modern ARMs) integer division so div/mod is extremely slow because it has to be emulated on these (iteration). Of course divisions by constant will be optimized by the compiler.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: The effect of larger hash size

Post by Sven »

Volker Annuss wrote:
Evert wrote:If you haven't already I suggest making the number of entries in the table a power of two. That way you can calculate the index from the hash key using a bit-wise and (key & (nelem-1)), which is (much) faster than the modulo (key%nelem) you have to use if it isn't.
You can use hash tables of any size without modulo. It is not as fast as bitwise-and but still much faster than modulo even in 32 bit code.

Code: Select all

((key & 0xffffffffull) * nelem) >> 32
or

Code: Select all

((key >> 32) * nelem) >> 32
I don't understand your remark. "key & 0xffffffffull" and "key >> 32" are not the same, so it seems you want to imply that you could use either the lower or the upper 32 bits of the hash key in that way. But can you explain why and how that works, in either of the two cases, to multiply by "nelem" (which I assume to be the total number of hash table entries) and then to use the upper 32 bits of the result? To me it seems this operation will often result in "0", at least for smaller hash tables.

Also, why is a multiplication by "nelem" significantly faster than a "modulo nelem"?

The "traditional" way is to use a "modulo nelems" operation, which can be optimized (manually or by the compiler) with a bitwise operation if "nelems" is a power of 2.

Sven
mar
Posts: 2555
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: The effect of larger hash size

Post by mar »

Sven Schüle wrote: I don't understand your remark. "key & 0xffffffffull" and "key >> 32" are not the same, so it seems you want to imply that you could use either the lower or the upper 32 bits of the hash key in that way. But can you explain why and how that works, in either of the two cases, to multiply by "nelem" (which I assume to be the total number of hash table entries) and then to use the upper 32 bits of the result? To me it seems this operation will often result in "0", at least for smaller hash tables.

Also, why is a multiplication by "nelem" significantly faster than a "modulo nelem"?

The "traditional" way is to use a "modulo nelems" operation, which can be optimized (manually or by the compiler) with a bitwise operation if "nelems" is a power of 2.

Sven
Ok I will try, hope you don't mind :)
I think Volker's point is that it doesn't really matter if you use lower 32 bits or upper 32 bits to compute entry index. While these two are not functionally equivalent, they shouldn't matter at all because you should have good pseudorandom bits in either upper or lower part of the 64-bit key.
Of course once you have more than 2G entries the distribution will probably not be very good.
How it works: think of 0..0xffffffff it as a real number <0..1), this trick was called fixed point in the good old times.
So you multiply it by entry range and only keep the integer part.
Since zobrist hashes should have a good distribution of bits, it should work just fine.
As for performance: integer division is always slow. Of course if you know in advance you deal with entry_count = power of two, you can mask it of course.
But the point is that forcing TT size to be powers of two means you won't be able to utilize memory efficiently. Say you have 16 gigs of RAM, something goes for the system and so on - so you can only utilize up to 8 gigs for the hash table if you use powers of two.
Using Volker's method you should be able to utilize say 12 gigs without problems.
User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: The effect of larger hash size

Post by hgm »

The point is that taking the 32 low-order bits of a longer variable can be done by simply moving the variable (e.g. from memory) to a 32-bit register. And the i386 architecture has an instruction that multiplies two 32-bit registers to produce a 64-bit result, stored in two registers, one register the low-order 32-bits, another register the high-order 32 bits. You then simply use the latter to get the same effect as doing >>= 32.

Integer division (used to implement the modulo operator, as it produces quotient in one register and remainder in another) is and extremely costly instruction. Bob once reported here a decrease of ~30% in Crafty's NPS just from replacing the masking with the & operator through a more general % operator to get the hash index.