A new kind of rotated Bitboard

Discussion of chess software programming and technical issues.

Moderator: Ras

dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

A new kind of rotated Bitboard

Post by dangi12012 »

I have been playing around with color agnostic chess. It has some implications for faster inference since both viewpoints are cheaper and always there - and with AST sifting I found a new form of rotated bitboards.
Instead of maintaing 4 Bitboards we just maintain 2. One for the normal occupancy and one for the rotated occupancy meaning sq = sq ^ 63.

That is we have the normal occ and we have the bitrotated rev_occ at all times.

Now I gave my sifter these symbols:

Code: Select all

//Scalars
1ull;
2ull;
56ull;
sq;
occ;
rev_occ;
//Unary Func
std::popcount(arg0);
std::countl_zero(arg0);
std::countr_zero(arg0);
bit_reverse(arg0);
//Binary func
(arg0 | arg1);
(arg0 & ~arg1);
(arg0 ^ arg1);
(arg0 - arg1);
(arg0 << arg1);
(arg0 >> arg1);
And we solve the positive rays of chess ONLY. The idea being that the negative rays are free too since we have occ' too.

Here is all the solutions for the 4 rays in chess:

Code: Select all

SOLVE HO:
     Solution Found : (occ ^ (occ - 1ull))  #19788
     Solution Found : (occ ^ (occ - 2ull))  #85324
     Solution Found : ((occ - 1ull) ^ occ)  #263388
     Solution Found : ((occ - 2ull) ^ occ)  #267484
     Solution Found : (occ ^ ((occ &~ 1ull) - 1ull))  #310604
     Solution Found : ((occ ^ (occ - 1ull)) | 1ull)  #316618
     Solution Found : ((occ ^ (occ - 1ull)) &~ 1ull)  #316619
     Solution Found : ((occ ^ (occ - 1ull)) ^ 1ull)  #316620
     Solution Found : ((occ ^ (occ - 1ull)) - 1ull)  #316621
     Solution Found : (bit_reverse(rev_occ) ^ (occ - 1ull))  #316828
     Solution Found : (occ ^ ((occ - 1ull) | 1ull))  #318028
     Solution Found : (occ ^ ((occ - 1ull) &~ 1ull))  #318284
     Solution Found : (occ ^ ((occ - 1ull) ^ 1ull))  #318540
     Solution Found : (occ ^ ((occ - 1ull) - 1ull))  #318796
     Solution Found : (occ ^ (bit_reverse(rev_occ) - 1ull))  #367948
     Solution Found : (occ ^ (occ - popcount(1ull)))  #413004
     Solution Found : (occ ^ (occ - (1ull | 1ull)))  #675148
     Solution Found : (occ ^ (occ - (1ull << 1ull)))  #937292
     Solution Found : (occ ^ (occ - 1ull))  #1068364
     Solution Found : (occ ^ (occ - 2ull))  #1133900
     Solution Found : ((occ - 1ull) ^ occ)  #1311964
     Solution Found : ((occ - 2ull) ^ occ)  #1316060
     Solution Found : ((occ ^ (occ - 2ull)) | 1ull)  #1365194
     Solution Found : ((occ ^ (occ - 2ull)) &~ 1ull)  #1365195
     Solution Found : ((occ ^ (occ - 2ull)) ^ 1ull)  #1365196
     Solution Found : (bit_reverse(rev_occ) ^ (occ - 2ull))  #1365404
     Solution Found : (occ ^ ((occ - 2ull) | 1ull))  #1366604
     Solution Found : (occ ^ ((occ - 2ull) &~ 1ull))  #1366860
     Solution Found : (occ ^ ((occ - 2ull) ^ 1ull))  #1367116
     Solution Found : (occ ^ (bit_reverse(rev_occ) - 2ull))  #1416524
     Solution Found : (occ ^ (occ - popcount(2ull)))  #1461580
     Solution Found : (occ ^ (occ - countr_zero(2ull)))  #1592652
SOLVE VE:
     Solution Found : ((occ - 1ull) << 1ull)  #1246
     Solution Found : ((occ << 1ull) - 1ull)  #1261
     Solution Found : ((occ - 2ull) << 1ull)  #5342
     Solution Found : ((occ << 2ull) - 1ull)  #5357
     Solution Found : ((occ - 56ull) << 1ull)  #9438
     Solution Found : (((occ &~ 1ull) - 1ull) << 1ull)  #19422
     Solution Found : (((occ &~ 1ull) << 1ull) - 1ull)  #19437
     Solution Found : (occ ^ (occ - 1ull))  #19788
     Solution Found : (((occ - 1ull) | 1ull) << 1ull)  #19886
     Solution Found : (((occ - 1ull) &~ 1ull) << 1ull)  #19902
     Solution Found : (((occ - 1ull) ^ 1ull) << 1ull)  #19918
     Solution Found : (((occ - 1ull) - 1ull) << 1ull)  #19934
     Solution Found : (((occ - 1ull) << 1ull) | 1ull)  #19946
     Solution Found : (((occ - 1ull) << 1ull) &~ 1ull)  #19947
     Solution Found : (((occ - 1ull) << 1ull) ^ 1ull)  #19948
     Solution Found : (((occ - 1ull) << 1ull) - 1ull)  #19949
     Solution Found : (((occ - 1ull) << 1ull) << 1ull)  #19950
     Solution Found : (((occ << 1ull) &~ 1ull) - 1ull)  #20157
     Solution Found : (((occ << 1ull) - 1ull) | 1ull)  #20186
     Solution Found : (((occ << 1ull) - 1ull) &~ 1ull)  #20187
     Solution Found : (((occ << 1ull) - 1ull) ^ 1ull)  #20188
     Solution Found : (((occ << 1ull) - 1ull) - 1ull)  #20189
     Solution Found : (((occ << 1ull) - 1ull) << 1ull)  #20190
     Solution Found : (((occ << 1ull) << 1ull) - 1ull)  #20205
     Solution Found : ((bit_reverse(rev_occ) - 1ull) << 1ull)  #23006
     Solution Found : ((bit_reverse(rev_occ) << 1ull) - 1ull)  #23021
     Solution Found : (bit_reverse((rev_occ >> 1ull)) - 1ull)  #24477
     Solution Found : ((occ - popcount(1ull)) << 1ull)  #25822
     Solution Found : ((occ << popcount(1ull)) - 1ull)  #25837
     Solution Found : ((occ - countl_zero(1ull)) << 1ull)  #29918
     Solution Found : ((occ - (1ull | 1ull)) << 1ull)  #42206
     Solution Found : ((occ << (1ull | 1ull)) - 1ull)  #42221
SOLVE D1:
     Solution Found : ((occ - 1ull) << 1ull)  #1246
     Solution Found : ((occ << 1ull) - 1ull)  #1261
     Solution Found : ((occ - 2ull) << 1ull)  #5342
     Solution Found : ((occ << 2ull) - 1ull)  #5357
     Solution Found : ((occ - 56ull) << 1ull)  #9438
     Solution Found : (((occ &~ 1ull) - 1ull) << 1ull)  #19422
     Solution Found : (((occ &~ 1ull) << 1ull) - 1ull)  #19437
     Solution Found : (occ ^ (occ - 1ull))  #19788
     Solution Found : (((occ - 1ull) | 1ull) << 1ull)  #19886
     Solution Found : (((occ - 1ull) &~ 1ull) << 1ull)  #19902
     Solution Found : (((occ - 1ull) ^ 1ull) << 1ull)  #19918
     Solution Found : (((occ - 1ull) - 1ull) << 1ull)  #19934
     Solution Found : (((occ - 1ull) << 1ull) | 1ull)  #19946
     Solution Found : (((occ - 1ull) << 1ull) &~ 1ull)  #19947
     Solution Found : (((occ - 1ull) << 1ull) ^ 1ull)  #19948
     Solution Found : (((occ - 1ull) << 1ull) - 1ull)  #19949
     Solution Found : (((occ - 1ull) << 1ull) << 1ull)  #19950
     Solution Found : (((occ << 1ull) &~ 1ull) - 1ull)  #20157
     Solution Found : (((occ << 1ull) - 1ull) | 1ull)  #20186
     Solution Found : (((occ << 1ull) - 1ull) &~ 1ull)  #20187
     Solution Found : (((occ << 1ull) - 1ull) ^ 1ull)  #20188
     Solution Found : (((occ << 1ull) - 1ull) - 1ull)  #20189
     Solution Found : (((occ << 1ull) - 1ull) << 1ull)  #20190
     Solution Found : (((occ << 1ull) << 1ull) - 1ull)  #20205
     Solution Found : ((bit_reverse(rev_occ) - 1ull) << 1ull)  #23006
     Solution Found : ((bit_reverse(rev_occ) << 1ull) - 1ull)  #23021
     Solution Found : (bit_reverse((rev_occ >> 1ull)) - 1ull)  #24477
     Solution Found : ((occ - popcount(1ull)) << 1ull)  #25822
     Solution Found : ((occ << popcount(1ull)) - 1ull)  #25837
     Solution Found : ((occ - countl_zero(1ull)) << 1ull)  #29918
     Solution Found : ((occ - (1ull | 1ull)) << 1ull)  #42206
     Solution Found : ((occ << (1ull | 1ull)) - 1ull)  #42221
SOLVE D2:
     Solution Found : ((occ - 1ull) << 1ull)  #1246
     Solution Found : ((occ << 1ull) - 1ull)  #1261
     Solution Found : ((occ - 2ull) << 1ull)  #5342
     Solution Found : ((occ << 2ull) - 1ull)  #5357
     Solution Found : ((occ - 56ull) << 1ull)  #9438
     Solution Found : ((occ - sq) << 1ull)  #13534
     Solution Found : (((occ &~ 1ull) - 1ull) << 1ull)  #19422
     Solution Found : (((occ &~ 1ull) << 1ull) - 1ull)  #19437
     Solution Found : (occ ^ (occ - 1ull))  #19788
     Solution Found : (((occ - 1ull) | 1ull) << 1ull)  #19886
     Solution Found : (((occ - 1ull) &~ 1ull) << 1ull)  #19902
     Solution Found : (((occ - 1ull) ^ 1ull) << 1ull)  #19918
     Solution Found : (((occ - 1ull) - 1ull) << 1ull)  #19934
     Solution Found : (((occ - 1ull) << 1ull) | 1ull)  #19946
     Solution Found : (((occ - 1ull) << 1ull) &~ 1ull)  #19947
     Solution Found : (((occ - 1ull) << 1ull) ^ 1ull)  #19948
     Solution Found : (((occ - 1ull) << 1ull) - 1ull)  #19949
     Solution Found : (((occ - 1ull) << 1ull) << 1ull)  #19950
     Solution Found : (((occ << 1ull) &~ 1ull) - 1ull)  #20157
     Solution Found : (((occ << 1ull) - 1ull) | 1ull)  #20186
     Solution Found : (((occ << 1ull) - 1ull) &~ 1ull)  #20187
     Solution Found : (((occ << 1ull) - 1ull) ^ 1ull)  #20188
     Solution Found : (((occ << 1ull) - 1ull) - 1ull)  #20189
     Solution Found : (((occ << 1ull) - 1ull) << 1ull)  #20190
     Solution Found : (((occ << 1ull) << 1ull) - 1ull)  #20205
     Solution Found : ((bit_reverse(rev_occ) - 1ull) << 1ull)  #23006
     Solution Found : ((bit_reverse(rev_occ) << 1ull) - 1ull)  #23021
     Solution Found : (bit_reverse((rev_occ >> 1ull)) - 1ull)  #24477
     Solution Found : ((occ - popcount(1ull)) << 1ull)  #25822
     Solution Found : ((occ << popcount(1ull)) - 1ull)  #25837
     Solution Found : ((occ - countl_zero(1ull)) << 1ull)  #29918
     Solution Found : ((occ - (1ull | 1ull)) << 1ull)  #42206
Now most of these are redundant but there is a beauty in there. Putting a distinct set of all 4 rays shows this solution. (mask is implicit I will write it out)

occ := occ & mask
rev_occ := occ' & mask

General Solution (set intersection of all possible solutions): POSITIVE
(occ ^ (occ - 1ull)) & mask

For good measure here is the negative ray:
General Solution (set intersection of all possible solutions): NEGATIVE
bit_reverse((rev_occ ^ (rev_occ - 1ull)) & mask)

Now you can imagine that the bit reversal is not needed because we need this perspective as well. For the cost of a single OR and XOR per move get something unique. What that means in terms of nps compared to other approaches I cannot say yet.


Self analysis:
This is Hyperbola qsc for a single line:

Code: Select all

uint64_t o = pieces & mask;
return ((o - (1ull << x)) ^ bit_bswap(bit_bswap(o) - (1ull << (x ^ 56)))) & mask;
This is Bitmirror:

Code: Select all

uint64_t h = occ & r.ho; uint64_t H = occ & r.HO;
return ((h ^ (h - 1ull)) & r.ho) | bit_rotate((H ^ (H - 1ull)) & r.HO);
Other than the mask there is no dependency on the square (x) in the calculation anymore as well as both rays being truly independent. In hyperqsc the xor in the middle cannot be a OR. Also depending on how the rest of the architecture is layed out the second bitrotation can be made obsolete as well.

This is actually a trend I see continuing. When making chesscode exponentially faster the first thing that goes is conditional code, the second thing that goes is the work on single uint64_t's either we work on them in bulk via YMM or we maintain a bigger board structure that enables additional shortcuts. Either way maintining color agnosticity has more advantages than people realize yet and above code becomes INSANE when we work with YMM registers since all 4 rays in 8 directions of a queen can be solved forward and reverse order like this:

Code: Select all

    //Bulk of the work with AVX2
    u64x4 shvxy = (hvxy ^ (hvxy - one)) & dirs;   
    u64x4 SHVXY = (HVXY ^ (HVXY - one)) & DIRS; //to be reversed
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: A new kind of rotated Bitboard

Post by dangi12012 »

The corresponding board structure:

Code: Select all

#include <algorithm>
#include <stdint.h>
struct brd{
    uint64_t a,b,c,d;
    uint64_t A,B,C,D;

    void swapcolor(){
        std::swap(a,A);
        std::swap(b,B);
        std::swap(c,C);
        std::swap(d,D);
    }
};

void movebrd(brd& b){
    b.move();
}
https://godbolt.org/z/TcnPdEEoP
Of course the swap is not even needed when the backend can handle 2 colors (like the usual approach)

Anyways when being color agnostic you would like a,b,c,d to be the own perspective always so it may be needed and for that
Zen4 is a gamechanger for computerchess in so many ways - while Intel is removing AVX512 from all consumer chips. sad
For instance it can run above code like this:

Code: Select all

movebrd(brd&):                        # @movebrd(brd&)
        vmovdqu64       zmm0, zmmword ptr [rdi]
        vshufi64x2         zmm0, zmm0, zmm0, 78    # zmm0 = zmm0[4,5,6,7,0,1,2,3]
        vmovdqu64       zmmword ptr [rdi], zmm0
        vzeroupper
        ret
English: Load - shuffle - store - embedded in actual code the store would not even happen. It would be compiled into a dependency chain of the corresponding snippet.

Zen 3:

Code: Select all

movebrd(brd&):                        # @movebrd(brd&)
        vmovups ymm0, ymmword ptr [rdi]
        vmovups ymm1, ymmword ptr [rdi + 32]
        vmovups ymmword ptr [rdi], ymm1
        vmovups ymmword ptr [rdi + 32], ymm0
        vzeroupper
        ret
English: load, load, store, store

The AST Sifter that can produce and find arbitrary equivalent code by supplying 8 or 16 instances of scalar, unary and binary functions will be made available soon (TM). It runs up to 11 symbols in a reasonable time. So any permutation of 16 different C++ tokens you can type out and say for instance find a shorter form of the snippet you wrote.

I love bulkwise C++. For instance a "pop2bits" which literally gives the results of std::countr_zero for 2 set bits or 1 or 0 bits also already in the testing phase.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer