Some questions from a beginner

Gerd Isenberg · Post by **Gerd Isenberg** » Tue Apr 05, 2016 4:45 pm

Ok, first there are n = 5..12 relevant occupancy bits pers square for bishops / rooks, thus 2^n (1 << n), max 4096 possible relevant occupancy states. The first step per square in Tord's find magic routine is to initialize the b[] and a[] arrays with all possible occupancies (b[]) and slowly computed attacks (a[], batt, ratt) by square and occupancy b.

Code: Select all

  mask = bishop? bmask&#40;sq&#41; &#58; rmask&#40;sq&#41;;
  n = count_1s&#40;mask&#41;;
 
  for&#40;i = 0; i < &#40;1 << n&#41;; i++) &#123;
    b&#91;i&#93; = index_to_uint64&#40;i, n, mask&#41;;
    a&#91;i&#93; = bishop? batt&#40;sq, b&#91;i&#93;) &#58; ratt&#40;sq, b&#91;i&#93;);
  &#125;

index_to_uint64 (may be a better name is index_to_occupancy) is responsible to compute all occupancies for that square by mask, and n (popcnt of mask), and an enumerator index, from 0 to 2^n -1.

Code: Select all

uint64 index_to_uint64&#40;int index, int bits, uint64 m&#41; &#123;
  int i, j;
  uint64 result = 0ULL;
  for&#40;i = 0; i < bits; i++) &#123;
    j = pop_1st_bit&#40;&m&#41;;
    if&#40;index & &#40;1 << i&#41;) result |= &#40;1ULL << j&#41;;
  &#125;
  return result;
&#125;

The routine starts with occupancy result of zero.

Code: Select all

  uint64 result = 0ULL;

And loops over all possible occupancy bits (5 ... 12) ...

Code: Select all

  for&#40;i = 0; i < bits; i++) &#123;

and gets the next set bit-index from the mask (bitscan forward with reset)

Code: Select all

    j = pop_1st_bit&#40;&m&#41;;

It then sets the corresponding bit (2^j, 1 << j) inside the result, if the index (0..2^n-1, max 4095) has bit i set.

Code: Select all

    if&#40;index & &#40;1 << i&#41;) result |= &#40;1ULL << j&#41;;

This is how all possible occupancies are computed by all possible indices. Note that there are far less attack sets than occupancies, since bits behind the first blocker don't affect the attacks. Thus, while all b are different, many a are equal for different i.

===

The next step is to find a magic factor, chosen randomly by trial and error (up to 100000000 trials).

Code: Select all

  for&#40;k = 0; k < 100000000; k++) &#123;
    magic = random_uint64_fewbits&#40;);

The next line is some empirical optimization (I guess), since it turned out that "good" magics to try require at least six bit set in the upper byte of a 64-bit product:

Code: Select all

    if&#40;count_1s&#40;&#40;mask * magic&#41; & 0xFF00000000000000ULL&#41; < 6&#41; continue;

The used[] is cleared to check for collisions ...

Code: Select all

    for&#40;i = 0; i < 4096; i++) used&#91;i&#93; = 0ULL;

Looping over all occupancies again so long there is no destructive collission ...

Code: Select all

    for&#40;i = 0, fail = 0; !fail && i < &#40;1 << n&#41;; i++) &#123;

calculate the occupancy index, using precalculated b[] again (see above) ...

Code: Select all

      j = transform&#40;b&#91;i&#93;, magic, m&#41;;

if not used yet, store attacks

Code: Select all

      if&#40;used&#91;j&#93; == 0ULL&#41; used&#91;j&#93; = a&#91;i&#93;;

if used, we have a collission, but may be a (desired) constructive one, if different occupancies share same attacks due to different occupancy bits "behind" first blocker. That is why we store attacks in used, and magic only fails, if attacks are different too.

Code: Select all

      else if&#40;used&#91;j&#93; != a&#91;i&#93;) fail = 1;

otherwise we have found a destructive collission free magic factor ...

Code: Select all

    if&#40;!fail&#41; return magic;

And yes, you can compute the magics on the fly while startup ...

Sven · Post by **Sven** » Tue Apr 05, 2016 5:40 pm

Sven Schüle wrote:
The_Algebraist wrote:One question I came up with when thinking about it:

In the article: http://chessprogramming.wikispaces.com/ ... for+Magics
I think I understand most of the algorithm that is suggested. But one thing is unclear to me, what is the point of this function:
Code: Select all
uint64 index_to_uint64&#40;int index, int bits, uint64 m&#41; &#123;
  int i, j;
  uint64 result = 0ULL;
  for&#40;i = 0; i < bits; i++) &#123;
    j = pop_1st_bit&#40;&m&#41;;
    if&#40;index & &#40;1 << i&#41;) result |= &#40;1ULL << j&#41;;
  &#125;
  return result;
&#125;
It is called here:
Code: Select all
uint64 find_magic&#40;int sq, int m, int bishop&#41; &#123;
  uint64 mask, b&#91;4096&#93;, a&#91;4096&#93;, used&#91;4096&#93;, magic;
  int i, j, k, n, fail;
 
  mask = bishop? bmask&#40;sq&#41; &#58; rmask&#40;sq&#41;;
  n = count_1s&#40;mask&#41;;
 
  for&#40;i = 0; i < &#40;1 << n&#41;; i++) &#123;
    b&#91;i&#93; = index_to_uint64&#40;i, n, mask&#41;;
    a&#91;i&#93; = bishop? batt&#40;sq, b&#91;i&#93;) &#58; ratt&#40;sq, b&#91;i&#93;);
  &#125;
  for&#40;k = 0; k < 100000000; k++) &#123;
    magic = random_uint64_fewbits&#40;);
    if&#40;count_1s&#40;&#40;mask * magic&#41; & 0xFF00000000000000ULL&#41; < 6&#41; continue;
    for&#40;i = 0; i < 4096; i++) used&#91;i&#93; = 0ULL;
    for&#40;i = 0, fail = 0; !fail && i < &#40;1 << n&#41;; i++) &#123;
      j = transform&#40;b&#91;i&#93;, magic, m&#41;;
      if&#40;used&#91;j&#93; == 0ULL&#41; used&#91;j&#93; = a&#91;i&#93;;
      else if&#40;used&#91;j&#93; != a&#91;i&#93;) fail = 1;
    &#125;
    if&#40;!fail&#41; return magic;
  &#125;
  printf&#40;"***Failed***\n");
  return 0ULL;
&#125;
and I assume describes the blockers. Anyways I don't quite understand how the function 'index_to_uint64' is connected with the computation of the blockers or the rest of the algorithm?
index_to_uint64() calculates the "blockers". For a given square and piece type (bishop/rook) there may be two or more different occupancies sharing the same set of blockers. E.g. a bishop on d1 does not attack g4 if there is a piece on e2, regardless whether f3 is occupied as well or not. Same blockers must imply same attack set. The check that is built-in into the generation algorithm verifies that the current magic number fulfills this last condition by checking that the transformation (i.e., magic multiplication + shift) of the obtained "blockers" bitboard returns a unique attack set index. If this is not the case then the current magic number is rejected.

Sorry, I got something wrong, forget this post above, please. Gerd is right, b[] is the occupancy bitboard, not the "blockers" bitboard. Therefore I also deleted my short reply to Gerd's post which had been visible for a while.

So a better wording would be:

index_to_uint64() calculates the occupancy bitboard. The check that is built-in into the generation algorithm verifies that the current magic number fulfills the condition that the transformation (i.e., magic multiplication + shift) of the occupancy bitboard returns a unique attack set index. If this is not the case then the current magic number is rejected.

The_Algebraist · Post by **The_Algebraist** » Tue Apr 05, 2016 11:07 pm

Thanks for your detailed explanations, Gerd! And also to you Sven!
This really helps me a lot! I'm looking forward to reimplementing the slider attacks with the magics

I just fed my move generation with the 218-move-position and the engine found them all

Gerd Isenberg wrote: And yes, you can compute the magics on the fly while startup ...

Isn't that what most engines do, that use magic bitboards?

wgarvin · Post by **wgarvin** » Wed Apr 06, 2016 3:49 am

The_Algebraist wrote:Thanks for your detailed explanations, Gerd! And also to you Sven!
This really helps me a lot! I'm looking forward to reimplementing the slider attacks with the magics

I just fed my move generation with the 218-move-position and the engine found them all

Gerd Isenberg wrote: And yes, you can compute the magics on the fly while startup ...
Isn't that what most engines do, that use magic bitboards?

I think some engines use a pre-computed list of magics, but then generate the actual tables from them at startup time.

Mincho Georgiev · Post by **Mincho Georgiev** » Wed Apr 06, 2016 7:10 am

The_Algebraist wrote:Thanks for your detailed explanations, Gerd! And also to you Sven!
This really helps me a lot! I'm looking forward to reimplementing the slider attacks with the magics

I just fed my move generation with the 218-move-position and the engine found them all

Gerd Isenberg wrote: And yes, you can compute the magics on the fly while startup ...
Isn't that what most engines do, that use magic bitboards?

I use them statically, but I intentionally left my code in the branch, even not connected to the cmd line interface, so I can show how the numbers have been generated. I will strongly recommend you read Kannan's papers first, so you can clear everything out for yourself. Also Reul's thesis would be great for you.

Code: Select all

const unsigned int b_bits&#91;64&#93; =
&#123; 6, 5, 5, 5, 5, 5, 5, 6, 
  5, 5, 5, 5, 5, 5, 5, 5, 
  5, 5, 7, 7, 7, 7, 5, 5, 
  5, 5, 7, 9, 9, 7, 5, 5, 
  5, 5, 7, 9, 9, 7, 5, 5, 
  5, 5, 7, 7, 7, 7, 5, 5, 
  5, 5, 5, 5, 5, 5, 5, 5, 
  6, 5, 5, 5, 5, 5, 5, 6
&#125;;

const unsigned int r_bits&#91;64&#93; =
&#123; 12, 11, 11, 11, 11, 11, 11, 12, 
  11, 10, 10, 10, 10, 10, 10, 11, 
  11, 10, 10, 10, 10, 10, 10, 11, 
  11, 10, 10, 10, 10, 10, 10, 11, 
  11, 10, 10, 10, 10, 10, 10, 11, 
  11, 10, 10, 10, 10, 10, 10, 11, 
  11, 10, 10, 10, 10, 10, 10, 11, 
  12, 11, 11, 11, 11, 11, 11, 12
&#125;;

//magic multipliers generation
#define MAX_TRIALS 10000

void print_magics&#40;uint64 *m&#41;
&#123;
  int i,j;
  FILE *f;
	
  f = fopen&#40;"magics.txt", "a");
  j = 1;
  for&#40;i = 0; i < 64; i++)
  &#123; fprintf&#40;f, "\t%#.16llx%s, ", m&#91;i&#93;,"ULL");
    if&#40;++j > 4&#41;
    &#123; j = 1;
      fprintf&#40;f, "%s", "\n");
    &#125;
  &#125;	
  fprintf&#40;f, "%s", "\n\n");
  fclose&#40;f&#41;;
&#125;

uint64 rand64_1bits&#40;int bitcount&#41;
&#123;
  int i;
  uint64 pos, rnum = 0ULL;

  for&#40;i = 0; i < bitcount; i++)
  &#123; for&#40;;;)
    &#123; pos = 1ULL << &#40;rand&#40;) % 63&#41;;
      if&#40;!&#40;pos & rnum&#41;)
      &#123; rnum |= pos;
        break;
      &#125;
    &#125;
  &#125;
  return &#40;rnum&#41;;
&#125;

uint64 gen_multiplier&#40;int sq, bitboard_t mask, int bits, int *mbits, int *ibits, int rook&#41;
&#123;
  int fail,b;
  int trials;
  int bit_trials;
  uint64 magic;
  uint64 index;
  bitboard_t x, blockers&#91;4096&#93;, solution&#91;4096&#93;, used&#91;4096&#93;;
	
  for&#40;x = 0ULL; x < &#40;1ULL << bits&#41;; x++)
  &#123; blockers&#91;x&#93; = get_blockers&#40;x, mask&#41;;
    solution&#91;x&#93; = &#40;rook&#41; ? &#40;get_rook_atk&#40;sq,blockers&#91;x&#93;)) &#58; 
      &#40;get_bishop_atk&#40;sq, blockers&#91;x&#93;));
  &#125;
	
  trials = 0; bit_trials = 1;
  for&#40;;;)
  &#123; ibits&#91;sq&#93; = 0;
    magic = rand64_1bits&#40;bit_trials&#41;;
    for&#40;x = 0ULL; x < &#40;1ULL << bits&#41;; x++) used&#91;x&#93; = 0;
    fail = 0;
    for&#40;x = 0ULL; x < &#40;1ULL << bits&#41;; x++)
    &#123; index = &#40;blockers&#91;x&#93; * magic&#41; >> &#40;64 - bits&#41;;
      if&#40;used&#91;index&#93; == 0&#41; 
      &#123; used&#91;index&#93; = solution&#91;x&#93;;
        b = popcnt&#40;blockers&#91;x&#93; * magic&#41;;
        if&#40;ibits&#91;sq&#93; < b&#41; ibits&#91;sq&#93; = b;
      &#125;
      else if&#40;used&#91;index&#93; != solution&#91;x&#93;)
      &#123; fail = 1;
        break;
      &#125;
    &#125;
    if&#40;!fail&#41;
    &#123; mbits&#91;sq&#93; = bit_trials;
      return magic;
    &#125;
    trials++;
    if&#40;&#40;trials > MAX_TRIALS&#41; && &#40;bit_trials < bits&#41;)
    &#123; bit_trials++;
      trials = 0;
    &#125;
  &#125;
&#125;

void generate_magics&#40;)
&#123;
  int i,j,h,m,s;
  uint64 magics&#91;64&#93;;
  int magic_bits&#91;64&#93;;
  int index_bits&#91;64&#93;;
  time_t t1,t2;

  srand&#40;&#40;uint32&#41;time&#40;0&#41;);
  time&#40;&t1&#41;;
	
  printf&#40;"generating magic numbers\n\nprogress&#58;\n");
  for&#40;i = 0; i < 64; i++)
  &#123; magics&#91;i&#93; = gen_multiplier&#40;i, b_mask&#91;i&#93;, b_bits&#91;i&#93;, magic_bits, index_bits, 0&#41;;
    printf&#40;".");
  &#125;
  print_magics&#40;magics&#41;;
  printf&#40;"\n\nused bits in bishop magic numbers by %d trials&#58;\n\n",MAX_TRIALS&#41;;
  for&#40;i = 0,j = 1; i < 64; i++,j++)
  &#123; printf&#40;"%d, ",magic_bits&#91;i&#93;);
    if&#40;j >= 8&#41; 
    &#123; printf&#40;"\n");
      j = 0;
    &#125;
  &#125;
  printf&#40;"\n\nmax. multiplication bits for bishop by %d trials&#58;\n\n",MAX_TRIALS&#41;;
  for&#40;i = 0,j = 1; i < 64; i++,j++)
  &#123; printf&#40;"%d, ",index_bits&#91;i&#93;);
    if&#40;j >= 8&#41; 
    &#123; printf&#40;"\n");
      j = 0;
    &#125;
  &#125;	
  printf&#40;"\n\nprogress&#58;\n");
  for&#40;i = 0; i < 64; i++)
  &#123; magics&#91;i&#93; = gen_multiplier&#40;i, r_mask&#91;i&#93;, r_bits&#91;i&#93;, magic_bits, index_bits, 1&#41;;
    printf&#40;".");
  &#125;
  print_magics&#40;magics&#41;;
  printf&#40;"\n\nused bits in rook magic numbers by %d trials&#58;\n\n",MAX_TRIALS&#41;;
  for&#40;i = 0,j = 1; i < 64; i++,j++)
  &#123; printf&#40;"%d, ",magic_bits&#91;i&#93;);
    if&#40;j >= 8&#41; 
    &#123; printf&#40;"\n");
      j = 0;
    &#125;
  &#125;
  printf&#40;"\n\nmax. multiplication bits for rook by %d trials&#58;\n\n",MAX_TRIALS&#41;;
  for&#40;i = 0,j = 1; i < 64; i++,j++)
  &#123; printf&#40;"%d, ",index_bits&#91;i&#93;);
    if&#40;j >= 8&#41; 
    &#123; printf&#40;"\n");
      j = 0;
    &#125;
  &#125;
  printf&#40;"\n\n");
	
  time&#40;&t2&#41;;
  s = &#40;int&#41;&#40;t2-t1&#41;; 
  h = &#40;s / 3600&#41;; m = &#40;s / 60&#41; % 60; s %= 60;
  printf&#40;"time&#58; %dh&#58;%dm&#58;%ds\n\n", h, m, s&#41;;
&#125;

generate_magics() is the "interface" function.

The_Algebraist · Post by **The_Algebraist** » Thu Apr 07, 2016 6:12 pm

Thanks for all your help and tips!
You guys a really great and make it much easier to get into programming a chess engine.

Thanks Mincho for your code sample, I will definitely look into that closely once I come back to replace the classical approach with the magic bitboards.

I finished the move generator of my engine now. Then I ran all the positions suggested on the perft test page of the cpw. After fixing some minor bugs, I get correct results for all of them now.

At least until like depth 4 or 5. If I go deeper, it not only takes veeeery long, but it starts messing up my computer. Everything slows down and after a while the program crashes.

I looked over the code and didn't find anything that catched my eye, but my time of c++ programming is a little while ago and I've become a bit rusty with c++. So I guess there is something wrong somewhere, maybe a memory leak or anyting similar.

Have any of you had a similar problem?
How did you try to find the root of the problem?
I have many asserts for pretty much everything all over the code and it is running in debug mode.

The_Algebraist · Post by **The_Algebraist** » Thu Apr 07, 2016 9:34 pm

I found it! 1 missing delete... That got out of hand really fast though.
But now it runs smoothly, yet not very fast.

I only get 7M nodes per sec. Is this only due to the classical approach approach I used instead of the magic bitboards, or must there be some sloppy technique somewhere?

kbhearn · Post by **kbhearn** » Thu Apr 07, 2016 10:47 pm

The_Algebraist wrote:I found it! 1 missing delete... That got out of hand really fast though.
But now it runs smoothly, yet not very fast.

I only get 7M nodes per sec. Is this only due to the classical approach approach I used instead of the magic bitboards, or must there be some sloppy technique somewhere?

well the fact that there was a delete missing indicates you're engaged in one cardinal sin: dynamic memory allocation. You should be able to get a fair bit of speed by reusing preallocated objects or creating objects on the stack (local variables) instead of new and delete.

That said, if 7M nodes per second is 7M makes and unmakes you're doing pretty good. If you're bulk counting at the leaves (rather than make and unmake the final ply just tallying the number of legal moves) it's a bit slow, but the algorithm shouldn't be the problem.

Also regardless it's not so slow as to prevent you from continuing to make your engine play chess and then come back to it later.

The_Algebraist · Post by **The_Algebraist** » Thu Apr 07, 2016 10:53 pm

kbhearn wrote:well the fact that there was a delete missing indicates you're engaged in one cardinal sin: dynamic memory allocation. You should be able to get a fair bit of speed by reusing preallocated objects or creating objects on the stack (local variables) instead of new and delete.

There aren't too many new and delete in the code, but maybe I can try to eliminate them. I will have a look at that tomorrow! Thanks for your suggestion Kevin!

kbhearn wrote:That said, if 7M nodes per second is 7M makes and unmakes you're doing pretty good. If you're bulk counting at the leaves (rather than make and unmake the final ply just tallying the number of legal moves) it's a bit slow, but the algorithm shouldn't be the problem.

It is indeed 7M make and unmake.
It just seems so slow. I was running stockfish in another console to compare the results, and they were so much faster, at over 100M nodes on my laptop.

Don't they make/unmake the moves? So are they filtering the illegal moves already during move generation?

kbhearn · Post by **kbhearn** » Thu Apr 07, 2016 11:01 pm

They do bulk counting of the leaves in perft (just looked at their routine) which is much faster (and would allow you to do higher perft ply numbers as such) but less indicative of how much your move generation/make/unmake will affect the search (that said perft isn't a great way to test that anyways).

Some questions from a beginner

Re: Some questions from a beginner

Re: Some questions from a beginner

Re: Some questions from a beginner

Re: Some questions from a beginner

Re: Some questions from a beginner

Re: Some questions from a beginner

Re: Some questions from a beginner

Re: Some questions from a beginner

Re: Some questions from a beginner

Re: Some questions from a beginner