On Zobrist keys

steffan · Post by **steffan** » Mon Jun 22, 2009 10:37 am

I too have had success using rotated Zobrist keys. As an experiment (a long time ago!) I modified Crafty to use rotated keys and had no collisions in the small number of self-play games I tried.

Cheers,
Steffan

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Jun 22, 2009 8:00 pm

steffan wrote:I too have had success using rotated Zobrist keys. As an experiment (a long time ago!) I modified Crafty to use rotated keys and had no collisions in the small number of self-play games I tried.

Cheers,
Steffan

Hi Steffan,
I wonder whether random base keys or De Bruin sequences work better. Two instead of more than 100 cachelines for a very frequent access will likely relax L1 a bit. Whether this is worth the rotate instruction, which is bound on cl? Likely.

Code: Select all

U64 CACHEALIGN basekey[16]; // 2 cachelines instead of > 100

ror(basekey[piece], square); // or rol

instead of

Code: Select all

U64 zobristkey[16][64];

zobristkey[piece][square];

Cheers,
Gerd

steffan · Post by **steffan** » Mon Jun 22, 2009 9:58 pm

Gerd Isenberg wrote:I wonder whether random base keys or De Bruin sequences work better.

I think the ideal keys have on average half their bits set, and all bits are mutually independent. Random keys satisfy both criteria, but de Bruijn keys fail the independence criterion : For example, all de Bruijn sequences have exactly half their bits set.

Cheers,
Steffan

Edmund · Post by **Edmund** » Mon Jun 22, 2009 10:01 pm

Gerd Isenberg wrote:
steffan wrote:I too have had success using rotated Zobrist keys. As an experiment (a long time ago!) I modified Crafty to use rotated keys and had no collisions in the small number of self-play games I tried.

Cheers,
Steffan
Hi Steffan,
I wonder whether random base keys or De Bruin sequences work better. Two instead of more than 100 cachelines for a very frequent access will likely relax L1 a bit. Whether this is worth the rotate instruction, which is bound on cl? Likely.
Code: Select all
U64 CACHEALIGN basekey[16]; // 2 cachelines instead of > 100

ror(basekey[piece], square); // or rol
instead of
Code: Select all
U64 zobristkey[16][64];

zobristkey[piece][square]; 
Cheers,
Gerd

or you go for

Code: Select all

U8 zobristkey[16*64+7]
return *(U64 *) (zobristkey + piece*64 + square);

1031 bytes = 16.11 cachelines
and no additional ror/rol instruction

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Jun 22, 2009 11:02 pm

Codeman wrote: or you go for
Code: Select all
U8 zobristkey[16*64+7]
return *(U64 *) (zobristkey + piece*64 + square);
1031 bytes = 16.11 cachelines
and no additional ror/rol instruction

Might be quite expensive on x86, specially if you cross cacheline boarders. May be with some SSE4 unaligned load instruction.

Aleks Peshkov · Post by **Aleks Peshkov** » Tue Jun 23, 2009 12:26 am

steffan wrote:
Gerd Isenberg wrote:I wonder whether random base keys or De Bruin sequences work better.
I think the ideal keys have on average half their bits set, and all bits are mutually independent. Random keys satisfy both criteria, but de Bruijn keys fail the independence criterion : For example, all de Bruijn sequences have exactly half their bits set.

Random keys have random dependence. Set of only 12 de Bruijn keys (instead of 768 pseudo-random) can be easier to test, as they are proved to be good enough for itself rotation.

wgarvin · Post by **wgarvin** » Tue Jun 23, 2009 8:00 am

Gerd Isenberg wrote:
Codeman wrote: or you go for
Code: Select all
U8 zobristkey[16*64+7]
return *(U64 *) (zobristkey + piece*64 + square);
1031 bytes = 16.11 cachelines
and no additional ror/rol instruction
Might be quite expensive on x86, specially if you cross cacheline boarders. May be with some SSE4 unaligned load instruction.

You could however, overlap 12 keys for each square at 2-byte intervals and fit them all within an aligned 32-byte block. That gives you a 2 KB table (32*64 bytes) that requires no rotate and never does a read that crosses cacheline boundaries. Its not as compact as the rotated keys, but its better than the usual 6 KB (12*8*64).

Can anyone can think of a way to use 1-byte intervals and make the table smaller without causing some accesses to cross a cache line boundary? I tried to figure out a way, but have come up with nothing so far.

Zach Wegner · Post by **Zach Wegner** » Tue Jun 23, 2009 8:55 am

You can do it if you muck with the address a bit to get the unused pieces (12-15) to cluster at the top of a cache line:

Code: Select all

unsigned char data[64*16];
s1=square&3;
s2=square>>2;
hashkey = *(u64*)(data+s2*64+s1+piece*4);

...or more bit-twiddlingly:

Code: Select all

unsigned char data[64*16];
hashkey = *(u64*)(data+(square&~3)*15+square+piece*4);

hgm · Post by **hgm** » Tue Jun 23, 2009 1:00 pm

wgarvin wrote:Can anyone can think of a way to use 1-byte intervals and make the table smaller without causing some accesses to cross a cache line boundary? I tried to figure out a way, but have come up with nothing so far.

I am used to 0x88 boards and 32-bit code, and there this is almost automatic. One cache line contains 4 board ranks, but the high addresses in each rank, from which a 4-byte load (or even an 8-byte load) could cross a cache-line boundary, are not used. For the other half-key I swap the use of the black and white tables.

On Zobrist keys

Re: On Zobrist keys

Re: On Zobrist keys

Re: On Zobrist keys

Re: On Zobrist keys

Re: On Zobrist keys

Re: On Zobrist keys

Re: On Zobrist keys

Re: On Zobrist keys

Re: On Zobrist keys