color-flipped perft

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

color-flipped perft

Post by Gerd Isenberg »

My quad-bitboard - sse2-fill-based and color-flip approach starts perfting.

Board structure is a quad-bitboard with "vertical" nibbles as piece codes.
Branchless Make/Unmake is almost xor/bswap of the quad-bitboard and hashkeys by two pre-calculated tables with all from-to aspects of all moves, as well as and/add of ep, castle-rights, material-signature and move50 counter.

Motivation of flipping sides after each make/unmake was to have always white to move, with makes move-generation black/white-symmetric and other stuff like egtb simpler. As a side effect I also get some additional hash-hits in symmetric positions.

So far I made all zobrist keys of black pieces depending on white pieces:

Code: Select all

zobrist[piece][sq] == bswap(zobrist[piece^color][sq^56])
with the consequence that pairs of black/white pieces on vertical mirrored squares became effectivly 32-bit keys instead of 64-bit. 64-bit Hashkeys of symmetrical position reflect this by same values of the flipped bytes {0,7}, {1,6}, {2,5}, {3,4} like 0x1234567878563412.

Of course I am concerned about collisions due to this dependeny of zobrist keys.

That is why i tried hashing of move-counts in perft - to compare perft results with/without hashing.

I use four 128-bit slots per index to eventually replace the entry with lowest perft-count. I store the upper 48-bit of the hashkey for verification, depth, ply&1 and perft-count. I only replace other entries, if the new perft-count to store is at least 16 times bigger than the already stored one - which somehow seems to result in best performance so far.

The reason I store odd-even ply is to determine whether a hash-hit with same depth and key is from a flipped position. That is why I do some kind of "iterative" perft-deepening e.g. perft(1..8) without clearing the hashtable inbetween. So doing e3-e5-e4 from the initial position transposes to the flipped position e4-e5 from perft(n-1) which is probably still available in the hash with same depth. So far I tried a lot of positions, specially all kind of symmetrical positions without any difference with/without hashing.

The perft is not the fastest one with pure legal movegen and popcounting 16 disjoint directionwise movetarget-sets at the leaves. sse2 is still quite slow on K8 and I already do a lot of stuff needed for eval and plausible move-gen aka sorting.

Some sample output of initial and four-knights symmetrical position:

Code: Select all

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
bb[0] 0xffff000000000000
bb[1] 0x2cff00000000ff2c
bb[2] 0x7600000000000076
bb[3] 0x9900000000000099
occup 0xffff00000000ffff
castleRights: 0x0f00000f
hashkey:      0xdcd7fa0d0dfad7dc
matSignature: 0x2822012323012228
perft( 1)             20        0.000 sec
perft( 2)            400        0.000 sec
perft( 3)           8902        0.000 sec
perft( 4)         197281        0.000 sec
perft( 5)        4865609        0.078 sec
perft( 6)      119060324        1.219 sec
perft( 7)     3195901860       15.203 sec
perft( 8)    84998978956      222.890 sec
HashStats:      33554432 entries (512MB)
Stored         201075730 100% 599% of entries
NewEntries      33554210  17% of Stored
ReplOthers     159837973  79% of Stored
ReplSames        7683547   4% of Stored
Hits           218870782 109% of Stored
RevHits          8477931   4% of Hits

Code: Select all

r1bqkb1r/pppp1ppp/2n2n2/4p3/4P3/2N2N2/PPPP1PPP/R1BQKB1R w KQkq - 0 1
bb[0] 0xbdef241000000000
bb[1] 0x2cef00101000ef2c
bb[2] 0x3400240000240034
bb[3] 0x9900000000000099
occup 0xbdef24101024efbd
castleRights: 0x0f00000f
hashkey:      0x2972061212067229
matSignature: 0x2822013030012228
perft( 1)             29        0.000 sec
perft( 2)            837        0.000 sec
perft( 3)          25563        0.000 sec
perft( 4)         773691        0.016 sec
perft( 5)       24503979        0.266 sec
perft( 6)      770077186        5.047 sec
perft( 7)    25206631197       84.734 sec
perft( 8)   818977102124     1745.953 sec
HashStats:      33554432 entries (512MB)
Stored        1586142876 100% 4727% of entries
NewEntries      33554432   2% of Stored
ReplOthers    1542572118  97% of Stored
ReplSames       10016326   1% of Stored
Hits          1324309226  83% of Stored
RevHits          4669107   0% of Hits
I like to ask whether the hash-statistic sounds reasonable to you or what other replacement policy you think might be better for perft. Any ideas for a better approach to color-flip the zobrist-keys appreciated.

Thanks,
Gerd
User avatar
smrf
Posts: 484
Joined: Mon Mar 13, 2006 11:08 am
Location: Klein-Gerau, Germany

Re: color-flipped perft

Post by smrf »

Hi Gerd,

about 10 years ago I made a postbox based approach to avoid colors be maintaining two boards from each side's perspective using merely seven piece types: own P, own N, own B, own R, own Q, own K and a general enemy piece. All routines have been unique for Black and White. But that approach has been significantly slower than my current board representation, because of the flipping overhead.

For your keys: why not finally complementing them if it is Black's move?

For your statistics: following details shown here might be interesting:

Code: Select all

FEN: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

  +-*--b--c--d--*--f--g--*-+ MS Vis.Studio C++ 32-Bit-Vers. 13.10
8 |[r][n][b][q][k][b][n][r]| (Compilation: Nov  4 2007)
7 |[p][p][p][p][p][p][p][p]|
6 |   :::   :::   :::   :::| Perft Testseries
5 |:::   :::   :::   :::   | (with TT caching +512.0 MB / 4-fold)
4 |   :::   :::   :::   :::| TT Access Success +52.0%
3 |:::   :::   :::   :::   |
2 |<P><P><P><P><P><P><P><P>| Smirf Test No.&#58;  0
1 |<R><N><B><Q><K><B><N><R>|
=>+-*--b--c--d--*--f--g--*-+ Break Time&#58; +30.00 Sec.

Ply      Nodes     all &#40;x&#41;     &#40;ep&#41;    all (+)      (#) Prom.     Cstl.    Sec.
-------------------------------------------------------------------------------
1           20           0        0          0        0     0         0       0
2          400           0        0          0        0     0         0       0
3         8902          34        0         12        0     0         0       0
4       197281        1576        0        469        8     0         0       0
5      4865609       82719      258      27351      347     0         0   0.093
6    119060324     2812008     5248     809099    10828     0         0   1.203
7   3195901860   108329926   319617   33103848   435767     0    883453   15.13
8  84998978956  3523740106  7187977  968981593  9852036     0  23605205   196.7
-------------------------------------------------------------------------------

FEN&#58; r1bqkb1r/pppp1ppp/2n2n2/4p3/4P3/2N2N2/PPPP1PPP/R1BQKB1R w KQkq - 0 1

  +-*--b--c--d--*--f--g--*-+ MS Vis.Studio C++ 32-Bit-Vers. 13.10
8 |&#91;r&#93;&#58;&#58;&#58;&#91;b&#93;&#91;q&#93;&#91;k&#93;&#91;b&#93;   &#91;r&#93;| &#40;Compilation&#58; Nov  4 2007&#41;
7 |&#91;p&#93;&#91;p&#93;&#91;p&#93;&#91;p&#93;&#58;&#58;&#58;&#91;p&#93;&#91;p&#93;&#91;p&#93;|
6 |   &#58;&#58;&#58;&#91;n&#93;&#58;&#58;&#58;   &#91;n&#93;   &#58;&#58;&#58;| Perft Testseries
5 |&#58;&#58;&#58;   &#58;&#58;&#58;   &#91;p&#93;   &#58;&#58;&#58;   | &#40;with TT caching +512.0 MB / 4-fold&#41;
4 |   &#58;&#58;&#58;   &#58;&#58;&#58;<P>&#58;&#58;&#58;   &#58;&#58;&#58;| TT Access Success +56.0%
3 |&#58;&#58;&#58;   <N>   &#58;&#58;&#58;<N>&#58;&#58;&#58;   |
2 |<P><P><P><P>   <P><P><P>| Smirf Test No.&#58;  0
1 |<R>   <B><Q><K><B>&#58;&#58;&#58;<R>|
=>+-*--b--c--d--*--f--g--*-+ Break Time&#58; +30.00 Sec.

Ply      Nodes     all &#40;x&#41;    &#40;ep&#41;    all (+)      (#) Prom.      Cstl.    Sec.
-------------------------------------------------------------------------------
1           29           1       0          0        0     0          0       0
2          837          40       0          1        0     0          0       0
3        25563        1394       0        170        0     0        143       0
4       773691       51623       0       6440        8     0       4298   0.016
5     24503979     1763821     149     304731     2204     0     185855   0.422
6    770077186    63575202    5562   10851840    69155     0    5767114   6.625
7  25206631197  2197010126  422385  445000762  4733325     0  204067910   99.59
-------------------------------------------------------------------------------
Regards, Reinhard.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: color-flipped perft

Post by Gerd Isenberg »

Hi Reinhard,

thanks for your suggestions. It seems my hashing or replacement-strategy is not optimal, since your perft 8 time is much lower than mine, while perft 7 was almost equal time. It might be that "plausible" move-sorting is an issue with hash-performance. I see that it makes sense to count also how often the table was probed to get your TT Success to compare with.

Flipping the board vertically plus swapping colors of all pieces is actually quite cheap with quad-bitboards, but of course only works for 8*8 boards.

Complement or negate of "mirrored" zobrist keys does not work.
If I make zobrist[whitePawn][a2] == ~zobrist[blackPawn][a7] and so on, the xor-result would be -1. Therefor the little-big-endian conversion aka bswap, where I keep at least 32 significant bits in the combined hashkey. Otherwise there are no positions with black to move and I like to take the immanent extra hits in symmetrical positions in a real alpha-beta search ;-)

Only odd or even ply-index indicates whether the position is actually flipped or not.

Regards,
Gerd
User avatar
smrf
Posts: 484
Joined: Mon Mar 13, 2006 11:08 am
Location: Klein-Gerau, Germany

Re: color-flipped perft

Post by smrf »

Hi Gerd,
Gerd Isenberg wrote:... It seems my hashing or replacement-strategy is not optimal, since your perft 8 time is much lower than mine, while perft 7 was almost equal time. ...
well, I think, those results could not be compared easily. Because I am working at a iMac Boot Camp XP partition, thus my processor is not the fastest, and I am gathering several different sums, not merely the node count. The TT-success percentage is the last stage of an average value of stored data accesses. Indeed, move sorting is slightly increasing the performance, because it seems that then related positions are handled timely at more dense moments.

Regards, Reinhard.