CCR board representation

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

CCR board representation

Post by maksimKorzh »

A couple of days ago I've found this paper http://www.doiserbia.nb.rs/img/doi/0354 ... 00011V.pdf describing so called "Compact Chessboard Representation". The key concept is to use an array of 8 elements to represent the entire board. Each 32-bit unsigned integer in the array represents a single rank. Pieces are encoded within 4 bits so each rank integer contains 32 / 4 = 8 squares. The author claims the benefits over both
array and bitboard (regardless of x32/x64 arch) representation assuming assembly language is used(but also high level languages with optimization flags are fine according to author). The bound control is done as follows - to check the rank overflow we can check if rank > 8 && rank < 0 (author also mentions that the second case is abandoned in assembly automatically) and the most interesting - checking file overflow is not needed at all because if after shifting the piece left or right along the rank(32-bit unsigned int) if the rank mask value for current piece (like this 0x00000F00) is greater than 0 then it's legal, otherwise it would be 0x00000000.

I've tried to implement move generator in this manner, following the ideas of implementations given by the author, but can't see any benefits, only much more pain in addressing the particular square... I didn't perform many tests, but I really doubt HOW can this system be faster compare to say 0x88? I mean the expression like (board[rank] << file * 4) & 0xF takes two bitwise operations plus one multiplying time more compare to simple memory addressing like board[square].

Did anybody ever face this board representation? or used it? or at least read the paper given above? What am I missing?

P.S. two days before finding that paper the idea to represent entire rank with 4bit piece coding in single unsigned integer came to my mind believe it or not, but I considered that idea to be insane for it takes ages to address each square, so I didn't see any benefits... Two days after I've found that paper...
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: CCR board representation

Post by mar »

maksimKorzh wrote: Sun Nov 25, 2018 12:15 pm The author claims the benefits over both
array and bitboard (regardless of x32/x64 arch) representation assuming assembly language is used(but also high level languages with optimization flags are fine according to author).
I very much doubt this can beat bitboards. Today almost everyone uses magic bitboards, basically every CPU is 64-bit (even mobile CPUs are 64-bit these days).
So the author claims it's faster than bitboards but presents no data for comparison? That's very scientific....
Like faster than bitboards on 64-bit CPUs? I'd love to see that!
Or perhaps he meant another benefit? But what's there to benefit from apart from performance?
Martin Sedlak
grahamj
Posts: 43
Joined: Thu Oct 11, 2018 2:26 pm
Full name: Graham Jones

Re: CCR board representation

Post by grahamj »

mar wrote: Sun Nov 25, 2018 1:33 pm
maksimKorzh wrote: Sun Nov 25, 2018 12:15 pm The author claims the benefits over both
array and bitboard (regardless of x32/x64 arch) representation assuming assembly language is used(but also high level languages with optimization flags are fine according to author).
I very much doubt this can beat bitboards. Today almost everyone uses magic bitboards, basically every CPU is 64-bit (even mobile CPUs are 64-bit these days).
So the author claims it's faster than bitboards but presents no data for comparison? That's very scientific....
Like faster than bitboards on 64-bit CPUs? I'd love to see that!
Or perhaps he meant another benefit? But what's there to benefit from apart from performance?
He was targetting 32-bit CPUs in 2008. I think CCR might suit GPUs in 2018...
Graham Jones, www.indriid.com
User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

Re: CCR board representation

Post by maksimKorzh »

after two days of intensive research I've suddenly realized the answer to my own question - the key trick is NOT USE RAM, but USE PROCESSOR REGISTERS instead to keep track of entire board state. You might argue - the registers often need to take some different values, like the multiplication is possible only using eax and ebx registers(on x86 arch) - that's what the stack is needed for. So we can push the rank value stored in a single register to stack and pop it after. This definitely make an engine arch dependent, if only not using "register" keyword, which doesn't guarantee the variable to be stored in register rather if only been declared explicitly... Well, all this looks like a weird and insane stuff, but that is already pretty close to what I am looking for...

And the idea is very simple actually. As far as there two major approaches in chess programming - bitboards and mailboxes, my original wonder was about speeding up "on the fly" calculation. It seems that this is probably impossible to write something faster than 0x88 so far, BUT the register processing is faster compare to memory access, so if the board state is completely stored in registers we can save some amount of time, there's still some slow down for pushing/popping rank values to stack, but in theory it might be still a bit faster. Well, at least that's the field of my interest and I'm going to check that out myself.
DustyMonkey
Posts: 61
Joined: Wed Feb 19, 2014 10:11 pm

Re: CCR board representation

Post by DustyMonkey »

maksimKorzh wrote: Sun Nov 25, 2018 12:15 pm The bound control is done as follows - to check the rank overflow we can check if rank > 8 && rank < 0 (author also mentions that the second case is abandoned in assembly automatically)
Doesnt require assembly language.

The rank before 0 in the unsigned world is also greater than 7. Testing for the range [0..n] should never require two comparisons.

w.r.t the other efficiencies - this is a hybrid - some things will be faster than 8x8, others faster than bitboards. Bitboards wont be faster for everything until machines are using native 4096-bit integers, finally allowing for efficient arbitrary swizzles of full 64-bit values. With 64-bit integers we can only arbitrarily swizzle 8-bit bytes efficiently.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: CCR board representation

Post by Gerd Isenberg »

Good luck with CCR, you may try using 256-bit ymm register of x86 AVX2 for one register boards, to make/unmake by
board ^= tbl[enumMove] ;-)

related cpw stuff:
https://www.chessprogramming.org/Nibble#ArrayOfNibbles

CCR with "vertical" nibbles:
https://www.chessprogramming.org/Quad-Bitboards
https://www.chessprogramming.org/DirGolem
User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

Re: CCR board representation

Post by maksimKorzh »

Gerd Isenberg wrote: Mon Nov 26, 2018 7:33 pm Good luck with CCR, you may try using 256-bit ymm register of x86 AVX2 for one register boards, to make/unmake by
board ^= tbl[enumMove] ;-)

related cpw stuff:
https://www.chessprogramming.org/Nibble#ArrayOfNibbles

CCR with "vertical" nibbles:
https://www.chessprogramming.org/Quad-Bitboards
https://www.chessprogramming.org/DirGolem
That's fantastic idea to use floating point registers! I didn't think about it. My 32-bit CPU doesn't seem to recognize ymm register so far, but 7 128-bit xmm registers are available and that's just great! It's not that easy to deal with them but really worth trying, thanks a lot!
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: CCR board representation

Post by Gerd Isenberg »

Yes, 128-bit xmm registers with SSE may be treated as vector of four floats or two doubles, as well since SSE2 as integer vectors of 16 bytes, eight words, four double words or two quad words. AVX or AVX2 two times more. There are 8 of these SIMD-registers in 32-bit mode, and 16 in 64-bit mode.

https://www.chessprogramming.org/SSE2
https://www.chessprogramming.org/AVX2

But they are difficult to utilize specially for mailbox boards I guess, L1-cache memory access is not that expensive - and the board array is very likely in L1, and it is a pain to move general purpose to xmm/ymm and vice versa. With quad-bitboards you may generate 16 direction-wise (8 rays + 8 knight directions) strictly legal move target bitboards almost branch-less, intended to hide the hash read latency with see2/avx2 register computation.