Split Index Super Set Yielding (SISSY) Bitboards

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Split Index Super Set Yielding (SISSY) Bitboards

Post by Michael Sherwin »

mar wrote: Mon Feb 17, 2020 7:15 pm
Michael Sherwin wrote: Mon Feb 17, 2020 5:40 pm @Martin - If you haven't given up yet on SISSY then I have the following code I'd like you to try. If you have given up then what is your conclusion? I'm kinda hanging out in the wind here.
I haven't given up Michael, I'll look at it in the coming days, no worries :)
I'm glad that you have not given up! :) About the move eax, eax I do not see right off how it would be useful but I will give it some thought. Thanks!
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Split Index Super Set Yielding (SISSY) Bitboards

Post by mar »

Michael Sherwin wrote: Mon Feb 17, 2020 8:07 pm About the move eax, eax I do not see right off how it would be useful but I will give it some thought. Thanks!
Of course, that was a bad example :) What I meant was that add eax,edx will clear upper 32 bits of rax, the same applies to mov eax,edx and so on
Martin Sedlak
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Split Index Super Set Yielding (SISSY) Bitboards

Post by Michael Sherwin »

What's a Creel? on his YouTube channel featured a benchmark that he wrote to test various assembly language instructions. Bottom line for this thread is avoid shift instructions on intel as much as possible. His base machine was an 11 year old Phenom II which he normalized to 1.0. An i9-9900k running at 5ghz on all 16 threads only did 0.7731 Phenom II 810 quad core with 4 threads.

The benchmark: https://github.com/TheRainDoodle/Phenom-II-Benchmark
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Split Index Super Set Yielding (SISSY) Bitboards

Post by mar »

So what's the current status?
I'm not going to try any assembly, I think your last improvement was to reuse part of qss for the rnk table?
Also I think you were talking about reducing the number of lookups for bishops as well?
Martin Sedlak
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Split Index Super Set Yielding (SISSY) Bitboards

Post by Michael Sherwin »

mar wrote: Thu Feb 20, 2020 7:59 pm So what's the current status?
I'm not going to try any assembly, I think your last improvement was to reuse part of qss for the rnk table?
Also I think you were talking about reducing the number of lookups for bishops as well?
I have been a bit under the weather for the last few days so I have not done much. Was -22 degrees fahrenheit (-30c)here this morning! My thoughts about folding the bishop is to shift down the upper 32 bits by 24 after masking out the non relative bits. That will make 3 ranks of relevant bits instead of 6. Then the initialization will have to continue with merging the upper suppersets with the lower supersets. All that would save a little time but how much idk.

You mentioned cache aligning so I am very interested in those figures combined with the rnk included improvement. That would tell me a lot. Also the Clang timings. I don't think the timings you gave previously included the better Clang code.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Split Index Super Set Yielding (SISSY) Bitboards

Post by Michael Sherwin »

NVM. I have decided to implement SISSY bitboards within a SearchOnePlyQ() function written entirely in x64 assembler. That method of move generation would have several advantages. It allows the efficiency of pseudo legal move generation that nonetheless produces a legal list of moves each of which has a score from the Eval() function. Assembler would allow the use of true jump table code which C/C++ just can't generate. The most time consuming part of the search at the leaves would totally be in one very efficient function. It can be programmed as a DLL (.dll) or an SLL (.lib) file that others could use in their projects. This is what I want to do. This is what makes sense to me at this time. :idea:
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Mike Sherwin
Posts: 860
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Split Index Super Set Yielding (SISSY) Bitboards

Post by Mike Sherwin »

I'd like to revisit SISSY bitboards. I think that they can be highly optimised just using C without resorting to assembly. Just considering the bishop for now. Code from the Chess Programming Wiki.

Code: Select all

U64 bss[6][64][64]; // 192 K

U64 bishopAttacks(U64 occ, enumSquare sq) {
  return
    bss[0][sq][(occ >>  9) & 63] &  // 2nd rank
    bss[1][sq][(occ >> 17) & 63] &  
    bss[2][sq][(occ >> 25) & 63] &  
    bss[3][sq][(occ >> 33) & 63] &  
    bss[4][sq][(occ >> 41) & 63] &  
    bss[5][sq][(occ >> 49) & 63] ;  // 7th rank
}    
I agree that is a little expensive. HOWEVER, that can be reduced to only two lookups! First shift the upper 32 bits of the occupancy up by 8 bits. Then and it with the lower 32 bits and then shift the 4th rank down by either 8 or 16 bits depending on the from square. That will produce a split index between the 2nd and third ranks. That is a whole lot of saved shifting and anding. The problem I'm having is the initializing everything into two ranks. I have correct working code that initializes for 6 ranks. Now I just need to do a second level of combining those values into 2 ranks.

Code: Select all

void InitializeB() {
  u08 sq, sqr, i, j, k, l;
  s08 x, dx, y, dy;
  u64 b, bb;

  for (sq = 0; sq < 64; sq++) {
    y = sq >> 3;
    x = sq & 7;
    for (i = 0; i < 128; i++) {
      if (i ^ 1) {
        j = i >> 1;
        for (k = 8, l = 0; k <= 48; k += 8, l++) {
          bb = 0;
          b = (u64)i << k;
          for (dx = +1, dy = +1; x + dx < +8 && y + dy < +8; dx++, dy++) {
            sqr = (((y + dy) << 3) + x + dx);
            bb |= ONE << sqr;
            if ((ONE << sqr) & b) break;
          }
          for (dx = -1, dy = +1; x + dx > -1 && y + dy < +8; dx--, dy++) {
            sqr = (((y + dy) << 3) + x + dx);
            bb |= ONE << sqr;
            if ((ONE << sqr) & b) break;
          }
          for (dx = +1, dy = -1; x + dx < +8 && y + dy > -1; dx++, dy--) {
            sqr = (((y + dy) << 3) + x + dx);
            bb |= ONE << sqr;
            if ((ONE << sqr) & b) break;
          }
          for (dx = -1, dy = -1; x + dx > -1 && y + dy > -1; dx--, dy--) {
            sqr = (((y + dy) << 3) + x + dx);
            bb |= ONE << sqr;
            if ((ONE << sqr) & b) break;
          }
          bss[sq][j][l] = bb;
        }
      }
    }
  }
}
And as can be seen the initialisation to 6 ranks is not that long nor very difficult. But further reducing them down to two ranks is giving me too much trouble. Mar showed that SISSY was not that far away from magic to begin with. Isn't it worth further investigation?

The queen is also extremely promising. Here is my code from Bricabrac. Perft numbers are accurate so the accuracy has been verified in Bricabrac and by Mar.

Code: Select all

   case WQ:
    case BQ:
      bb = qss[fs][(aPieces >> (fs & 56)) & 127][0]
        & qss[fs][(aPieces >> 8) & 255][1]
        & qss[fs][(aPieces >> 16) & 255][2]
        & qss[fs][(aPieces >> 24) & 255][3]
        & qss[fs][(aPieces >> 32) & 255][4]
        & qss[fs][(aPieces >> 40) & 255][5]
        & qss[fs][(aPieces >> 48) & 255][6]
        & notme;
As can be seen the code has already been optimised a little by embedding a rank lookup into the existing queen table thus reducing the lookups to 7 instead of the wiki's 8. THEN if the rank is removed and the file is shifted to the a file like in the wiki all that is left is the diagonals that can be manipulated into 2 ranks. That reduces the 7 kind of expensive lookups down to only 3 plus the rank lookup similar to Gerd's Kindergarten bitboards. The Queen was already competitive with magic and maybe even better. The rook also seems promising if handled like the wiki suggest.
https://www.chessprogramming.org/SISSY_Bitboards
Can someone explain to me why there has been no interest in seeing what SISSY bitboards could become? I have to say, I just don't understand.
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Split Index Super Set Yielding (SISSY) Bitboards

Post by mar »

Hi Michael, that's very interesting indeed. As usual you try to come up with new and fresh ideas.

I guess SISSY didn't spark enough interest because they didn't beat magic (yet).
I think that beating queen attacks isn't so interesting either, because mixing magics for bishops/rooks and SISSY for queen would require even more memory for lookups to already a lot (relatively) for magics. I belive Volker (or someone else?) made an effort to compress the LUTs for magics IIRC so there can be some performance gain as well.

So - if you can make SISSY on par (or faster) than magics, I'm pretty sure everybody will switch to SISSY. If anyone can do it, then it's you.

What I like about SISSY is straightforward initialization with smaller LUT. Unfortunately what matters at the end of the day is performance - the only thing SISSY needs is to catch up to fancy magics. Fingers crossed.

Actually if Dany in the other thread has problems with bitboards (performance) - I guess SISSY might be a good starting point for him?
Martin Sedlak
Mike Sherwin
Posts: 860
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Split Index Super Set Yielding (SISSY) Bitboards

Post by Mike Sherwin »

mar wrote: Sat Feb 13, 2021 3:42 pm Hi Michael, that's very interesting indeed. As usual you try to come up with new and fresh ideas.

I guess SISSY didn't spark enough interest because they didn't beat magic (yet).
I think that beating queen attacks isn't so interesting either, because mixing magics for bishops/rooks and SISSY for queen would require even more memory for lookups to already a lot (relatively) for magics. I belive Volker (or someone else?) made an effort to compress the LUTs for magics IIRC so there can be some performance gain as well.

So - if you can make SISSY on par (or faster) than magics, I'm pretty sure everybody will switch to SISSY. If anyone can do it, then it's you.

What I like about SISSY is straightforward initialization with smaller LUT. Unfortunately what matters at the end of the day is performance - the only thing SISSY needs is to catch up to fancy magics. Fingers crossed.

Actually if Dany in the other thread has problems with bitboards (performance) - I guess SISSY might be a good starting point for him?
I don't think I have it in me to be able to do the initialization for 2 rank storage. It is not as straightforward and my memory problem is getting worse. I just end up in confusion when I try. But, thanks for the vote of confidence. It makes me want to try harder. I was looking at my notes for the rook and once the rank is removed it becomes trivial to fold the file down to one lookup with only 4 shifts to only 6 bits per square. That reduces the rook to only 2 lookups and 1 &&. All I need is the initialization done. I can still do the rest, hopefully.

I would like it if Danny teamed up with me on this. The last couple of times that I responded to him I did not get a reply. He might be upset with me for some reason, idk. However, whomever does the initialization first whether it is me or someone else does it best and will earn full credit for their work. :)
Mike Sherwin
Posts: 860
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Split Index Super Set Yielding (SISSY) Bitboards

Post by Mike Sherwin »

Okay, I just took a short meditative knap. What came to me to go from 6 ranks for the bishop down to three is for every index in rank 2 that has a superset stored find all indexes in rank 5 that also have a superset stored whose bits do not collide with rank 2 bits and combine them with & and store that superset at the new index. Then repeat for ranks (3, 6) and (4, 7). I think that sounds correct? I'll give it a try. :)