Hash table division

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Hash table division

Post by Rebel »

Houdini wrote:
Rebel wrote:
Houdini wrote:
Rebel wrote:
Houdini wrote:
Rebel wrote:Of course I agree with you that a faster search should be the end result and for me it does, I am just trying to explain the logic behind the approach.
Why does it result in a faster search?
Because of testing, what else is the final measurement?
You didn't answer the question.
WHY would splitting up the hash table accelerate the search?
When the HT becomes full the branch factor goes up. It helps then if the HT that is used most has a bigger size.
It helps if the HT has a bigger size.
That's actually a very good argument for NOT splitting the HT.
If your code on a PC with 1GB ram can allocate 768Mb in one HT then by all means keep it. If you only can allocate 512Mb hash then splitting the HT into 2 parts of 512Mb and 256Mb you have a winner. And my second argument was give the odd-ply-HT the bigger part, the 512Mb.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Hash table division

Post by bob »

diep wrote:
bob wrote:
diep wrote:
bob wrote:
sje wrote:Symbolic has always done this with one table for WTM and one for BTM.
I simply xor a STM value. I'm not sure I'd want two different tables. What happens in positions where one side has lots of mobility and one side is very restricted? One table fills up, the other is barely used...

I've never thought, in any scenario, that two separate tables works, unless you talk about two tables in the Belle sense, where one is a depth-preferred replacement strategy while the other is an always-store table. But for Belle, both store both WTM/BTM entries, so there is no "side-to-move boundary".
Actually Bob you were doing 2 total different lookup until start of this century. Even some years after i noticed it and wrote about it several times in the forum, Crafty kept losing system time to 2 random probes.
I already mentioned that as the "Belle approach" in my post above. But the table was not divided into a black and a white part. Never done that. What I do today, I did in Cray Blitz back in the 80's as anyone can check (we used 8-entry buckets on the Cray since vector loads made that a "for free" operation, as opposed to 4-entry buckets today). And we never had a separate table for black and white, because nothing makes those positions balance evenly.


Now i say this in my own words: when it was convenient to wintel programmer Eugene Nalimov, to change that, he did do so.
Of course not for his employer, it was done well in time before AMD had a faster latency to the RAM with opteron and built in memory controller in k8, so we can't prove any relationship there. Sincethen you're also doing sequential probes Bob :)
And not a day before until Eugene modified that... :)

Vincent

p.s. normally spoken specint2004 would have released in 2004 and crafty also was a candidate for specint2004. Intel didn't have an on die memory controller until years later, so that Nalimov changed that, despite me writing it for years into the forum, was a most lucky change for wintel, which the years before that, despite mewriting it regurary in the CCC, wasn't convenient for them as during the on die memory controllers, intel had the fastest random latency. That changed to AMD's advantage when april 2003 they released the opteron that had an on die memory controller.
Interesting to see how $100 billion companies still are dependant there upon lucky coincidences... ...delaying this change that long really was lucky for them....
Of course all is correct what you write - you were doing 2 random probes. 2 sequential probes is a lot faster though. That's the change i am speaking about.
The belle approach had two advantages:

(1) two tables, one could be 2x the other, letting you use 3/4 of memory for hash, as opposed to just 1/2.

(2) simple to do replacement. Depth in one, just overwrite the other.

When the PC went to 64 byte cache lines, the 4-bucket approach made more sense. This would be two cache line files on a 32 byte cache line machine, which is about as bad as two random probes.. Today I get the entire bucket in one line and avoid that.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Hash table division

Post by diep »

bob wrote:
diep wrote:
bob wrote:
diep wrote:
bob wrote:
sje wrote:Symbolic has always done this with one table for WTM and one for BTM.
I simply xor a STM value. I'm not sure I'd want two different tables. What happens in positions where one side has lots of mobility and one side is very restricted? One table fills up, the other is barely used...

I've never thought, in any scenario, that two separate tables works, unless you talk about two tables in the Belle sense, where one is a depth-preferred replacement strategy while the other is an always-store table. But for Belle, both store both WTM/BTM entries, so there is no "side-to-move boundary".
Actually Bob you were doing 2 total different lookup until start of this century. Even some years after i noticed it and wrote about it several times in the forum, Crafty kept losing system time to 2 random probes.
I already mentioned that as the "Belle approach" in my post above. But the table was not divided into a black and a white part. Never done that. What I do today, I did in Cray Blitz back in the 80's as anyone can check (we used 8-entry buckets on the Cray since vector loads made that a "for free" operation, as opposed to 4-entry buckets today). And we never had a separate table for black and white, because nothing makes those positions balance evenly.


Now i say this in my own words: when it was convenient to wintel programmer Eugene Nalimov, to change that, he did do so.
Of course not for his employer, it was done well in time before AMD had a faster latency to the RAM with opteron and built in memory controller in k8, so we can't prove any relationship there. Sincethen you're also doing sequential probes Bob :)
And not a day before until Eugene modified that... :)

Vincent

p.s. normally spoken specint2004 would have released in 2004 and crafty also was a candidate for specint2004. Intel didn't have an on die memory controller until years later, so that Nalimov changed that, despite me writing it for years into the forum, was a most lucky change for wintel, which the years before that, despite mewriting it regurary in the CCC, wasn't convenient for them as during the on die memory controllers, intel had the fastest random latency. That changed to AMD's advantage when april 2003 they released the opteron that had an on die memory controller.
Interesting to see how $100 billion companies still are dependant there upon lucky coincidences... ...delaying this change that long really was lucky for them....
Of course all is correct what you write - you were doing 2 random probes. 2 sequential probes is a lot faster though. That's the change i am speaking about.
The belle approach had two advantages:

(1) two tables, one could be 2x the other, letting you use 3/4 of memory for hash, as opposed to just 1/2.

(2) simple to do replacement. Depth in one, just overwrite the other.

When the PC went to 64 byte cache lines, the 4-bucket approach made more sense. This would be two cache line files on a 32 byte cache line machine, which is about as bad as two random probes.. Today I get the entire bucket in one line and avoid that.
Actually starting with i7 with DDR3 you get 3 lines of 64 making a grand total of 192 bytes.

It can prematurely abort if you just need under 32 bytes though, delivering that faster. So when doing a few probes you have 192 bytes anyway.

The DDR5 of the GPU's is starting at 256 bytes though, that's far more bandwidth optimized and obviously no nothing premature abort possible.

When around 2007 we did do some latency tests there we were horrified :)

It's interesting now to see how it is at todays tesla hardware though as i'm intending trying to get to work a lossy sieve (quick n dirty so not a real sieve, as its delivering PRP's) for factoring purposes... (google for Diepeveen and Wagstaff and at Trial Factoring, see also Mersenne there).

p.s. how's a hardware chip using hashtables? - SRAM if it already existed probably was pretty expensive back then...