Crafty Transpostion Table Question

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
phhnguyen
Posts: 1434
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Crafty Transpostion Table Question

Post by phhnguyen »

bob wrote:
phhnguyen wrote:
bob wrote: SImple. This is the "lockless hashing" idea for parallel search.

The issue is that different threads can store into the same hash table address at approximately the same time. There are two 64 bit values, the signature (A) and the score/etc (B). If two processors try to store at the same time, you want to get eihere A1,B1 or A2,B2 stored in the table, but due to timing you can get A1,B2 or A2, B1 which is a problem.

If, when you store A1, B1, you xor them to modify the stored signature, and when you probe that entry, you xor again to recover the original signature, you eliminate the possibility of getting a match on A1, but getting data B2 which would be wrong.

The lockless hashing paper is on my web page at www.cis.uab.edu, where you click on faculty then my name...

If you don't do this, you have to use a normal Lock()/Unlock() which is way excessive overhead due to all the hash storing and probing going on.

Hi Bob,

I am wondering if it is another solution for "lockless" if we set the hash key in the middle of the hash data (e. g: 32 bit data + 64 bit hash key + 32 bit data). Any wrong pair of A and B will simply destroy the hash key and we can avoid using the Xor.

What is your opinion / experience? Many thanks.
Pham

In thinking about it, it is not really a given that this would work on the PC. Nothing to prevent one thread from writing the first 4 bytes, then the next 8, then the next 4, but after the first 4, that thread gets suspended for an instant, leaving wrong data combined with right data. The processor is certainly free to complete all writes done by the active process at any point in time after a context-switch, which could be an issue.

Good idea, and it might work most of the time.
Thank Bob,

I have been continueing to develop the idea:

- When we see a correct hash key (in the middle of data), it always makes sure that the whole data is from only one position
- In the worse case, that data includes two half data (of the same position), but from two different search situations so they are different type, move, depth, value...

For the above case and for your current implementation, Crafty cannot use and has to throw the whole hash data away.

But if we set hash key in the middle and reoganize data, we can still use hash data. Solutions may be:

1) Use your 6 bit left (unused) as a verify value for the unique of the whole hash data

2) Re-oganize data so we can use data even they are not belong together:
- 6 unused, 3 bit age, 21 bit move (30 bit)
- 64 hash key
- 2 bit type, 15 bit draft, 17 bit value (34 bit) (If you could "compress" 15 bit draft into 13 bit, you could set the hash key in the "right" middle for easily accessing).

Just 2 cent :)
Pham
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Crafty Transpostion Table Question

Post by bob »

phhnguyen wrote:
bob wrote:
phhnguyen wrote:
bob wrote: SImple. This is the "lockless hashing" idea for parallel search.

The issue is that different threads can store into the same hash table address at approximately the same time. There are two 64 bit values, the signature (A) and the score/etc (B). If two processors try to store at the same time, you want to get eihere A1,B1 or A2,B2 stored in the table, but due to timing you can get A1,B2 or A2, B1 which is a problem.

If, when you store A1, B1, you xor them to modify the stored signature, and when you probe that entry, you xor again to recover the original signature, you eliminate the possibility of getting a match on A1, but getting data B2 which would be wrong.

The lockless hashing paper is on my web page at www.cis.uab.edu, where you click on faculty then my name...

If you don't do this, you have to use a normal Lock()/Unlock() which is way excessive overhead due to all the hash storing and probing going on.

Hi Bob,

I am wondering if it is another solution for "lockless" if we set the hash key in the middle of the hash data (e. g: 32 bit data + 64 bit hash key + 32 bit data). Any wrong pair of A and B will simply destroy the hash key and we can avoid using the Xor.

What is your opinion / experience? Many thanks.
Pham

In thinking about it, it is not really a given that this would work on the PC. Nothing to prevent one thread from writing the first 4 bytes, then the next 8, then the next 4, but after the first 4, that thread gets suspended for an instant, leaving wrong data combined with right data. The processor is certainly free to complete all writes done by the active process at any point in time after a context-switch, which could be an issue.

Good idea, and it might work most of the time.
Thank Bob,

I have been continueing to develop the idea:

- When we see a correct hash key (in the middle of data), it always makes sure that the whole data is from only one position
- In the worse case, that data includes two half data (of the same position), but from two different search situations so they are different type, move, depth, value...

For the above case and for your current implementation, Crafty cannot use and has to throw the whole hash data away.

But if we set hash key in the middle and reoganize data, we can still use hash data. Solutions may be:

1) Use your 6 bit left (unused) as a verify value for the unique of the whole hash data

2) Re-oganize data so we can use data even they are not belong together:
- 6 unused, 3 bit age, 21 bit move (30 bit)
- 64 hash key
- 2 bit type, 15 bit draft, 17 bit value (34 bit) (If you could "compress" 15 bit draft into 13 bit, you could set the hash key in the "right" middle for easily accessing).

Just 2 cent :)
Pham
The thing you have to resolve is that there is a key (signature) and data. And they _must_ match... And you must handle the case where different parts of that thing can be stored in arbitrary order due to context switches and/or hardware memory write reordering. If you use the inner 8 bytes for the signature, with 4 bytes of data on high order end and 4 bytes on lower end, there is no guarantee that if you store the high order 4 bytes of data and 4 bytes of signature, that they will be stored in one indivisible (atomic) operation. And that is a problem.
User avatar
phhnguyen
Posts: 1434
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Crafty Transpostion Table Question

Post by phhnguyen »

Bob,

That is why I suggest that we should reorganize data. In one side of data, type, depth and value come together so you can use them safely. In the other side is the move of the correct position - not problem for using too. The 6-bit unuse you may simply ignore again.

The benefit is clear: we can avoid to use some Xor and can use hash data in some case which current Crafty can't.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Crafty Transpostion Table Question

Post by bob »

phhnguyen wrote:Bob,

That is why I suggest that we should reorganize data. In one side of data, type, depth and value come together so you can use them safely. In the other side is the move of the correct position - not problem for using too. The 6-bit unuse you may simply ignore again.

The benefit is clear: we can avoid to use some Xor and can use hash data in some case which current Crafty can't.
We are not on the same page.

We have two parts. data and key/signature/lock/whatever you want to call it. Before we can use any of the data, we must be sure that the data goes with the "signature". So, given 8 bytes of "data" and 8 bytes of "signature" how are you going to organize them so that if anything gets clobbered the entire position is unusable? You could try interleaving bytes, 1 byte of sig, one byte of data, etc. But that seems far worse than the XOR trick. Interleaving the bytes is a pain, and then the signature match process becomes a pain.

The PC can store _any_ single byte of that 128 bit "entry" at any instant of time, it depends on how the compiler produces the asm to store the stuff, and how the scheduler runs the process that stores the stuff, etc. We can't guarantee 8 byte stores at a chunk unless you force things at the machine code level, which is beyond what can be done directly in C.

The challenge becomes guaranteeing that the 64 bits of data go with the 64 bits of signature, and that _all_ of the data goes together in one entry. The XOR trick doesn't guarantee that will happen, since it is impossible to guarantee that without asm code, but it does guarantee that if the signature doesn't go with any single bit of the data, that the signature will fail to match and all is well...
User avatar
phhnguyen
Posts: 1434
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Crafty Transpostion Table Question

Post by phhnguyen »

I have re-read your post and your technical paper "Lockless Hashing" in which you divided 128 bit hash data into 2 parts (64 bit each) as "atomic" data. But as the previous post you means that "atomic" data may be 32 bit only? Or did I miss something? If yes (32 bit), I think I should surrender that idea.
User avatar
phhnguyen
Posts: 1434
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Crafty Transpostion Table Question

Post by phhnguyen »

Opps, you have posted already when I have been typing so my previous post seems be out of date.

You are right, if "atomic" data is less than 64 bit, your Xor is the best method. I have got wrong understand the size of atomic data when read your paper :(

Thank you very much for making clearly everything,
Pham