Further progress on new move generator

Michael Sherwin · Post by **Michael Sherwin** » Wed Mar 13, 2019 1:13 am

It is a lookup style mg. So let's say we have a bishop on e4 which is square index 28. I look in the bishop index array (int b[64]) to find the first cell in the giant move table. That is 240. The move table starting at 240 looks like this.

cell:
239 ts--ns---nd ; to square -- next square -- next direction
240 35 241 244 ; d5,,if d5 is empty next square is in cell 241 otherwise it is in cell 244
241 42 242 244
242 49 243 244
243 56 244 244
244 37 245 247 ; f5 starts the next direction
245 46 246 247
246 55 247 247
247 19 248 250 ; d3
248 10 249 250
249 01 250 250
250 21 251 000 ; f3 0 here means no more directions
251 14 252 000
252 07 000 000 ; the first 0 means no next square

Just for clarity here is the bishop on a1.
000 09 001 000 ; b2; only one direction so 0
001 18 002 000 ; c3
002 27 003 000 ; d4
003 36 004 000 ; e5
004 45 005 000 ; f6
005 54 006 000 ; g7
006 63 000 000 ; h8

I have finished both the bishop and the rook initialization. Ran the debugger and checked every cell manually to make sure all entries are correct. The bishop and rook require 1455 cells. Here is the code so far.

Code: Select all

typedef struct {
  int ts;
  int ns;
  int nd;
} mt;

mt m[3000];
 
int b[64];
int r[64];
int q[64];
int n[64];
int k[64];
int p[64];

void InitMovTbl();
void Initialize();
int main();

void InitMovTbl() {
  int sq, ns, y, x, dy, dx, o, p, i;

  for (sq = ns = 0; sq < 64; sq++) {
    y = sq >> 3;
    x = sq & 7;
    b[sq] = ns;

    for (p = ns, dy = 1, dx = -1; y + dy < 8 && x + dx > -1; dy++, dx--) {
      o = p;
      m[ns].ts = sq + (dy << 3) + dx;
      m[ns].ns = ns + 1; ns++;
    } for (i = p; i < ns; i++) m[i].nd = ns;

    for (p = ns, dy = 1, dx = 1; y + dy < 8 && x + dx < 8; dy++, dx++) {
      o = p;
      m[ns].ts = sq + (dy << 3) + dx;
      m[ns].ns = ns + 1; ns++;
    } for (i = p; i < ns; i++) m[i].nd = ns;

    for (p = ns, dy = -1, dx = -1; y + dy > -1 && x + dx > -1; dy--, dx--) {
      o = p;
      m[ns].ts = sq + (dy << 3) + dx;
      m[ns].ns = ns + 1; ns++;
    } for (i = p; i < ns; i++) m[i].nd = ns;

    for (p = ns, dy = -1, dx = 1; y + dy > -1 && x + dx < 8; dy--, dx++) {
      o = p;
      m[ns].ts = sq + (dy << 3) + dx;
      m[ns].ns = ns + 1; ns++;
    } for (i = p; i < ns; i++) m[i].nd = ns;

    m[ns - 1].ns = 0;
    for (i = o; i < ns; i++) m[i].nd = 0;
  }
  
  for (sq = 0; sq < 64; sq++) {
    y = sq >> 3;
    x = sq & 7;
    r[sq] = ns;

    for (p = ns, dx = -1; x + dx > -1; dx--) {
      o = p;
      m[ns].ts = sq + dx;
      m[ns].ns = ns + 1; ns++;
    } for (i = p; i < ns; i++) m[i].nd = ns;

    for (p = ns, dy = 1; y + dy < 8; dy++) {
      o = p;
      m[ns].ts = sq + (dy << 3);
      m[ns].ns = ns + 1; ns++;
    } for (i = p; i < ns; i++) m[i].nd = ns;

    for (p = ns, dx = 1; x + dx < 8; dx++) {
      o = p;
      m[ns].ts = sq + dx;
      m[ns].ns = ns + 1; ns++;
    } for (i = p; i < ns; i++) m[i].nd = ns;

    for (p = ns, dy = -1; y + dy > -1; dy--) {
      o = p;
      m[ns].ts = sq + (dy << 3);
      m[ns].ns = ns + 1; ns++;
    } for (i = p; i < ns; i++) m[i].nd = ns;

    m[ns - 1].ns = 0;
    for (i = o; i < ns; i++) m[i].nd = 0;
  }
}

void Initialize() {
  InitMovTbl();
}

int main() {
  Initialize();
}

hgm · Post by **hgm** » Wed Mar 13, 2019 9:22 am

I think that move generators of this type will always be slow. Because the overwhelming majority of nodes in a search tree will be QS nodes, where you are only need captures. And the lookup tables do not provide a way to directly skip to the end of a slider ray; you would still need to test the board on every square along the ray for occupancy.

So I think it can never be competitive with a method that directly tells you the target square at the end of a ray. Such as a neighbor table, or bitmaps of the ray occupancy.

Michael Sherwin · Post by **Michael Sherwin** » Wed Mar 13, 2019 7:35 pm

I do have my engine Carnivor that has a similar GNUChess 4 style table look up move generator that on my 2013 i7-3930k 4.2 GHz processor single thread searches 20 million positions per second. And in the middle game searches 33 million positions per second. And I mean during a game and not just in bench. Although the evaluation function is only a piece square table. I do not know of any bitboard engine than can even get close to that in their perft. What am I missing?

mar · Post by **mar** » Wed Mar 13, 2019 8:19 pm

Michael Sherwin wrote: ↑Wed Mar 13, 2019 7:35 pm I do have my engine Carnivor that has a similar GNUChess 4 style table look up move generator that on my 2013 i7-3930k 4.2 GHz processor single thread searches 20 million positions per second. And in the middle game searches 33 million positions per second. And I mean during a game and not just in bench. Although the evaluation function is only a piece square table. I do not know of any bitboard engine than can even get close to that in their perft. What am I missing?

I did a quick test in my engine (I search ~2Mn/s on single core) and using PSQ-only eval only gave 33% speedup. Simplifying search a lot however gave almost 3x speedup (I only tried startpos).
I was actually a bit surprised because I expected psq-only eval to do better.
I went from 2 to to 6.5Mn/s by disabling eval and most search features, losing (guessing) some 300 elo along the way.
So raw nps is nice (actually 33Mn/s is really impressive), but better search more than compensates for the loss of raw speed (better eval too).
You must be doing minimal amount of work per node.
Now SF searches ~1.6Mn/s on my machine, but is over 500 elo stronger than my engine

It also depends on how you count nodes. I count everything including QS and TT cutoffs, which exaggerates a bit.

odomobo · Post by **odomobo** » Wed Mar 13, 2019 8:45 pm

This approach looks 1 step away from having a big table of instructions. I'm not sure which approach would be faster. Your table would be much smaller, leading to less cache pressure, but it also has a level of indirection.

I suppose to create a jump table like this, you'd need to write a helper program which outputs a giant function in c (or your language of choice). In my mind, you could do it like this:

Code: Select all

function_pointer mg_bishop_table[64] = {
    ...
    mg_bishop_24, // index 24
    ...
};

void movegen_bishop(int index, move_list m) {
    mg_bishop_table[index](m);
}

...

void mg_bishop_24(move_list m) {
d5:
    m.add("d5");
    if (occupied("d5"))
        goto f5;
    m.add("c6");
    if (occupied("c6"))
        goto f5;
    ...
f5:
    m.add("f5");
    ...
}

...

Of course the logic in the jump table would be a bit more complex, which makes me think that the instruction count might be enormous, hurting any benefit you might otherwise get. I'm curious what you think of this approach. Also, let me know if I'm missing anything.

Michael Sherwin · Post by **Michael Sherwin** » Wed Mar 13, 2019 8:46 pm

mar wrote: ↑Wed Mar 13, 2019 8:19 pm
Michael Sherwin wrote: ↑Wed Mar 13, 2019 7:35 pm I do have my engine Carnivor that has a similar GNUChess 4 style table look up move generator that on my 2013 i7-3930k 4.2 GHz processor single thread searches 20 million positions per second. And in the middle game searches 33 million positions per second. And I mean during a game and not just in bench. Although the evaluation function is only a piece square table. I do not know of any bitboard engine than can even get close to that in their perft. What am I missing?
I did a quick test in my engine (I search ~2Mn/s on single core) and using PSQ-only eval only gave 33% speedup. Simplifying search a lot however gave almost 3x speedup (I only tried startpos).
I was actually a bit surprised because I expected psq-only eval to do better.
I went from 2 to to 6.5Mn/s by disabling eval and most search features, losing (guessing) some 300 elo along the way.
So raw nps is nice (actually 33Mn/s is really impressive), but better search more than compensates for the loss of raw speed (better eval too).
You must be doing minimal amount of work per node.
Now SF searches ~1.6Mn/s on my machine, but is over 500 elo stronger than my engine
It also depends on how you count nodes. I count everything including QS and TT cutoffs, which exaggerates a bit.

Yes, it is very minimal and yet it plays legal chess and used to compete at Olivier Deville's Open War and Chess War events. I count moves made. If a move is made and unmaid it counts one node.

Michael Sherwin · Post by **Michael Sherwin** » Wed Mar 13, 2019 9:16 pm

odomobo wrote: ↑Wed Mar 13, 2019 8:45 pm This approach looks 1 step away from having a big table of instructions. I'm not sure which approach would be faster. Your table would be much smaller, leading to less cache pressure, but it also has a level of indirection.

I suppose to create a jump table like this, you'd need to write a helper program which outputs a giant function in c (or your language of choice). In my mind, you could do it like this:
Code: Select all
function_pointer mg_bishop_table[64] = {
    ...
    mg_bishop_24, // index 24
    ...
};

void movegen_bishop(int index, move_list m) {
    mg_bishop_table[index](m);
}

...

void mg_bishop_24(move_list m) {
d5:
    m.add("d5");
    if (occupied("d5"))
        goto f5;
    m.add("c6");
    if (occupied("c6"))
        goto f5;
    ...
f5:
    m.add("f5");
    ...
}

...
Of course the logic in the jump table would be a bit more complex, which makes me think that the instruction count might be enormous, hurting any benefit you might otherwise get. I'm curious what you think of this approach. Also, let me know if I'm missing anything.

Seems a bit computationally heavy compared with this jump table method in assembler. The pieces on the board are by index. Not by type. A piece structure is used to hold each pieces info.

Code: Select all

; Initial piece type move function
iptmf       dd          0
            dd          wpm,wpm,wpm,wpm,wpm,wpm,wpm,wpm
            dd          wnm,wnm,wbm,wbm,wrm,wrm,wqm,wkm
            dd          wck,wcq,wmter
            dd          cbcam
            dd          0
            dd          bpm,bpm,bpm,bpm,bpm,bpm,bpm,bpm
            dd          bnm,bnm,bbm,bbm,brm,brm,bqm,bkm
            dd          bck,bcq,bmter
            dd          cwcam

; Initial piece type capture function
iptcf       dd          0
            dd          wpc,wpc,wpc,wpc,wpc,wpc,wpc,wpc
            dd          wnc,wnc,wbc,wbc,wrc,wrc,wqc,wkc
            dd          wcter,wcter,wcter
            dd          cbcac
            dd          0
            dd          bpc,bpc,bpc,bpc,bpc,bpc,bpc,bpc
            dd          bnc,bnc,bbc,bbc,brc,brc,bqc,bkc
            dd          bcter,bcter,bcter
            dd          cwcac

; White bishop move function
wbmf        dd          0
            dd          wbmnd,wbmnd,wbmnd,wbmnd,wbmnd,wbmnd,wbmnd,wbmnd
            dd          wbmnd,wbmnd,wbmnd,wbmnd,wbmnd,wbmnd,wbmnd,wbmnd
            dd          0,0,0
            dd          wbmrm
            dd          0
            dd          wbmrc,wbmrc,wbmrc,wbmrc,wbmrc,wbmrc,wbmrc,wbmrc
            dd          wbmrc,wbmrc,wbmrc,wbmrc,wbmrc,wbmrc,wbmrc,wmcbki
            dd          wnxtm
            
; White bishop  capture function            
wbcf        dd          0
            dd          wbcnd,wbcnd,wbcnd,wbcnd,wbcnd,wbcnd,wbcnd,wbcnd
            dd          wbcnd,wbcnd,wbcnd,wbcnd,wbcnd,wbcnd,wbcnd,wbcnd
            dd          0,0,0
            dd          wbcns
            dd          0
            dd          wbcrc,wbcrc,wbcrc,wbcrc,wbcrc,wbcrc,wbcrc,wbcrc
            dd          wbcrc,wbcrc,wbcrc,wbcrc,wbcrc,wbcrc,wbcrc,wccbki
            dd          wnxtc

; White move generator            
wmg:        push        ebp
            push        edi
            push        esi
            mov         ebp,[ply]
            mov         eax,[first+ebp*4]
            mov         [lis+ebp*4],eax
            mov         edi,[nxt]
            jmp         [ptmf+edi*4] 
            
; White Next Move            
wnxtm:      mov         edi,[nxt+edi*4]
            jmp         [ptmf+edi*4]            

; White bishop move generator            
wbm:        mov         ecx,[ps+edi*4]
            mov         esi,[bol+ecx*4]
            movsx       ebx,[bns+esi+ecx]
            mov         edx,[brd+ebx*4]
            jmp         [wbmf+edx*4]
        
White bishop move record move        
wbmrm:      mov         [tree.fsq+eax*8],cl
            mov         [tree.tsq+eax*8],bl
            mov         [tree.typ+eax*8],BMOV
            inc         eax
            movsx       ebx,[bns+esi+ebx]
            mov         edx,[brd+ebx*4]
            jmp         [wbmf+edx*4]

; White bishop move record capture
wbmrc:      mov         [tree.fsq+eax*8],cl
            mov         [tree.tsq+eax*8],dl
            mov         [tree.typ+eax*8],BCAP
            inc         eax
            
Whit bishop move next direction            
wbmnd:      movsx       ebx,[bnd+esi+ebx]
            mov         edx,[brd+ebx*4]
            jmp         [wbmf+edx*4]

Look at the bishop move generator, it only has 4 move instruction and a jump to either record move record capture or generate the next move or get the next piece. It is all controlled by the jump tables.

Michael Sherwin · Post by **Michael Sherwin** » Thu Mar 14, 2019 1:33 am

The move table initialization is finished!

Code: Select all

typedef struct {
  int ts;
  int ns;
  int nd;
} mt;

mt m[1959];
 
int b[64];
int r[64];
int n[64];
int wp[64];
int bp[64];

void InitMovTbl();
void Initialize();
int main();

void InitMovTbl() {
  int sq, ns, y, x, dy, dx, o, p, i;

  for (sq = ns = 0; sq < 64; sq++) {
    y = sq >> 3;
    x = sq & 7;
    b[sq] = ns;

    for (p = ns, dy = 1, dx = -1; y + dy < 8 && x + dx > -1; dy++, dx--) {
      o = p;
      m[ns].ts = sq + (dy << 3) + dx;
      m[ns].ns = ns + 1; ns++;
    } for (i = p; i < ns; i++) m[i].nd = ns;

    for (p = ns, dy = 1, dx = 1; y + dy < 8 && x + dx < 8; dy++, dx++) {
      o = p;
      m[ns].ts = sq + (dy << 3) + dx;
      m[ns].ns = ns + 1; ns++;
    } for (i = p; i < ns; i++) m[i].nd = ns;

    for (p = ns, dy = -1, dx = -1; y + dy > -1 && x + dx > -1; dy--, dx--) {
      o = p;
      m[ns].ts = sq + (dy << 3) + dx;
      m[ns].ns = ns + 1; ns++;
    } for (i = p; i < ns; i++) m[i].nd = ns;

    for (p = ns, dy = -1, dx = 1; y + dy > -1 && x + dx < 8; dy--, dx++) {
      o = p;
      m[ns].ts = sq + (dy << 3) + dx;
      m[ns].ns = ns + 1; ns++;
    } for (i = p; i < ns; i++) m[i].nd = ns;

    m[ns - 1].ns = 0;
    for (i = o; i < ns; i++) m[i].nd = 0;
  }
  
  for (sq = 0; sq < 64; sq++) {
    y = sq >> 3;
    x = sq & 7;
    r[sq] = ns;

    for (p = ns, dx = -1; x + dx > -1; dx--) {
      o = p;
      m[ns].ts = sq + dx;
      m[ns].ns = ns + 1; ns++;
    } for (i = p; i < ns; i++) m[i].nd = ns;

    for (p = ns, dy = 1; y + dy < 8; dy++) {
      o = p;
      m[ns].ts = sq + (dy << 3);
      m[ns].ns = ns + 1; ns++;
    } for (i = p; i < ns; i++) m[i].nd = ns;

    for (p = ns, dx = 1; x + dx < 8; dx++) {
      o = p;
      m[ns].ts = sq + dx;
      m[ns].ns = ns + 1; ns++;
    } for (i = p; i < ns; i++) m[i].nd = ns;

    for (p = ns, dy = -1; y + dy > -1; dy--) {
      o = p;
      m[ns].ts = sq + (dy << 3);
      m[ns].ns = ns + 1; ns++;
    } for (i = p; i < ns; i++) m[i].nd = ns;

    m[ns - 1].ns = 0;
    for (i = o; i < ns; i++) m[i].nd = 0;
  }

  for (sq = 0; sq < 64; sq++) {
    y = sq >> 3;
    x = sq & 7;
    n[sq] = ns;

    if (y + 1 <  8 && x - 2 > -1) { m[ns].ts = sq +  6; m[ns].ns = ns + 1; ns++; }
    if (y + 1 <  8 && x + 2 <  8) { m[ns].ts = sq + 10; m[ns].ns = ns + 1; ns++; }
    if (y + 2 <  8 && x - 1 > -1) { m[ns].ts = sq + 15; m[ns].ns = ns + 1; ns++; }
    if (y + 2 <  8 && x + 1 <  8) { m[ns].ts = sq + 17; m[ns].ns = ns + 1; ns++; }
    if (y - 1 > -1 && x - 2 > -1) { m[ns].ts = sq - 10; m[ns].ns = ns + 1; ns++; }
    if (y - 1 > -1 && x + 2 <  8) { m[ns].ts = sq -  6; m[ns].ns = ns + 1; ns++; }
    if (y - 2 > -1 && x - 1 > -1) { m[ns].ts = sq - 17; m[ns].ns = ns + 1; ns++; }
    if (y - 2 > -1 && x + 1 <  8) { m[ns].ts = sq - 15; m[ns].ns = ns + 1; ns++; }

    m[ns - 1].ns = 0;
  }

  for (sq = 8; sq < 56; sq++) {
    y = sq >> 3;
    x = sq & 7;
    wp[sq] = ns;

    if (x - 1 > -1) { m[ns].ts = sq + 7; m[ns].ns = ns + 1; ns++; }
    if (x + 1 <  8) { m[ns].ts = sq + 9; m[ns].ns = ns + 1; ns++; }

    m[ns - 1].ns = 0;
  }

  for (sq = 56; sq > 7; sq--) {
    y = sq >> 3;
    x = sq & 7;
    bp[sq] = ns;

    if (x - 1 > -1) { m[ns].ts = sq - 9; m[ns].ns = ns + 1; ns++; }
    if (x + 1 <  8) { m[ns].ts = sq - 7; m[ns].ns = ns + 1; ns++; }

    m[ns - 1].ns = 0;
  }
}

void Initialize() {
  InitMovTbl();
}

int main() {
  Initialize();
}

hgm · Post by **hgm** » Thu Mar 14, 2019 8:37 am

Michael Sherwin wrote: ↑Wed Mar 13, 2019 7:35 pm I do have my engine Carnivor that has a similar GNUChess 4 style table look up move generator that on my 2013 i7-3930k 4.2 GHz processor single thread searches 20 million positions per second. And in the middle game searches 33 million positions per second. And I mean during a game and not just in bench. Although the evaluation function is only a piece square table. I do not know of any bitboard engine than can even get close to that in their perft. What am I missing?

This sounds like a case of suspect counting. When you compare with other engines, you must be sure you count nodes in the same way. Counting moves to nodes where you do nothing at all (stand-pat cutoff on the PST eval that was already calculated incrementally in MakeMove) is a sure way to highly inflate the node count. Micro-Max also has only an incremental eval, but this allows it to judge whether the daughter will have a stand-pat cutoff with 100% accuracy after just calculating the updated eval, without really making the move. So it refrains from making the move in those cases (futility pruning). But it then also doesn't count the node. This effectively speeds up the engine, but with your counting method it would lower its NPS!

Of course micro-Max doesn't have a sophisticated move generator at all. But I noticed that hashing has a huge impact on its speed. Where micro-Max 4.8 (which does use a hash table) runs typically at 2MNPS, micro-Max 1.6 (which does not have hashing) runs at over 6MNPS.

If you want to judge the speed of a move generator, you should really count the number of move generations. Not MakeMoves. And when you do that in an engine that does not use a hash table, also compare it with other engines that run without hash table.

BTW, in my engine HaQiKi D I also use per-square tables for getting the starting index in a large array of move steps. The main reason for this was to implement the confinement of pieces to their zones (HaQiKi D is a Xiangqi engine, so the King must not be allowed to step out of the Palace, etc.) But it also gives some speedup by not wasting time on trying to step pieces near the board edge in directions that can only land off-board.

To speed up capture generation for sliders, I think using a 'neighbor table' neighbor[sqr][dir], containing square numbers of the nearest piece in the given direction for all 8 principal directions, for every occupied square, would easily beat what you are doing here. In this case you could still use a master index table idx[type][sqr], but it would point to a list of directions rather than a list of to-squares. In each direction you then only have to look up the to-square of the neighbor in that direction. Without repetitively testing the board on all intermediate squares. Like:

Code: Select all

int dirNr = idx[sliderType][fromSqr];
while((dir = dirTable[dirNr++])) {
  int toSqr = neighbor[fromSqr][dir];
  int victim = board[toSqr];
  if(victim & xstm) GenOneCapture(fromSqr, toSqr, victim);
}

This assumes the neighbor table contains the number of some dummy off-board cell (e.g. board[64]) when there is no neighbor in the corresponding direction, and that this off-board square is empty. The point is that for leapers you can omit moves that capture off-board, but for sliders that are not on an edge square, you can never know whether their moves will hit something before they run off board. So this test seems necessary overhead for sliders. It is probably faster to just let the test for an enemy target take care of this than to throw in an extra test on the toSqr number itself (which would save you a board access, but would be done in vain on every toSqr that is on board). On crowded boards most slider moves would hit a piece rather than the edge (when the moves with no squares between slider and edge have already been suppressed).

For generation of leaper captures you can simply use lists of to-squares for each from-square. There would be no need to tabulate a next-index for use in case the square was occupied, as for leapers you would always have to go to the next leap, whether the current leap target was occupied or not.

Code: Select all

int dirNr = idx[leaperType][fromSqr];
while((toSqr = dirTable[dirNr++]) >= 0) {
  int victim = board[toSqr];
  if(victim & xstm) GenOneCapture(fromSqr, toSqr, victim);
}

(Note I assumed the lists to be terminated by a -1 sentinel here, as 0 might be a valid square number appearing as toSqr. The same might hold for directions, btw, depending on how you encode those.)

Also note that you might trade off a small inefficiency in the number of instructions by tabulating leap steps, rather than to-squares (requiring an extra addition toSqr = fromSqr + step) for the possibility to highly condense the dirTable[]. Because every from-square not disturbed by the board edge would then get the same list of steps (and edge or corner lists could often be implemented as the final part of an already present larger list). This might gain you more speed by reducing the level-1 cache pressure than the extra addition costs. Note that the lists of slider directions already naturally condenses this way, as for any non-edge square a slider would have the same set of possible move directions.

Of course there still is the question of how much overhead it takes to update the neighbor table. As (due to the preponderance of QS nodes) most moves will be captures, the typical update would only have to alter one neighbor in each of the 8 neighbors of the evacuated square, for the direction that was looking back towards that square. It might be possible to save on this by deferring the update until after move generation, so that you only have to do it after you are sure that you generated moves that you actually want to search. (After all, in true leaf nodes, of which there are many, you don't search anything, and it would be a pity if you already had updated the table to only discover you don't need that info at all, and have to restore it to its old state before you can return alpha.) This would lead to erroneous generation of slider moves that pass over the evacuated square, which now would seem to end on that square instead (and then be rejected, as that erroneous to-square would not contain an opponent, but be empty instead). This can be solved by testing on a per-move basis.

Code: Select all

int dirNr = idx[sliderType][fromSqr];
while((dir = dirTable[dirNr++])) {
  int toSqr = neighbor[fromSqr][dir];
  if(toSqr == lastFromSqr) toSqr = neighbor[toSqr][dir]; // extend discovered moves
  int victim = board[toSqr];
  if(victim & xstm) GenOneCapture(fromSqr, toSqr, victim);
}

As almost all slider moves would not pass over the evacuated square, the extra work would usually be limited to an easily predictable branch instruction. The GenOneCapture could keep track of the attacked pieces or piece type, e.g. by setting bits corresponding to those in an integer ('attacked |= bitCode[victim];'). Then after the generation you can trivially test if a non-futile victim is under attack (i.e. mvvValue[attacked] > alpha - curEval - MARGIN). If not, you can return alpha without having updated or having to restore anything. If the MVV is non-futile (meaning you are going to search the capture of it), you can then update the neighbor table.

So the modest work of updating the neighbor table after a capture in QS

Code: Select all

for(dir=0; dir<4; dir++) { // for all ray orientations
  int up = neighbor[lastFromSqr][dir];   // get up- and downstream neighbor along ray
  int dn = neighbor[lastFromSqr][dir+4]; // assumes opposite directions encoded as dir, dir+4
  neighbor[up][dir+4] = dn; // point them to each other
  neighbor[dn][dir] = up;
}

and the restoring:

Code: Select all

for(dir=0; dir<4; dir++) {
  int up = neighbor[lastFromSqr][dir];
  int dn = neighbor[lastFromSqr][dir+4];
  neighbor[up][dir+4] = lastFromSqr; // point them back to 
  neighbor[dn][dir] = lastFromSqr;
}

would be shared by all daughter nodes (which is at least one).

Steve Maughan · Post by **Steve Maughan** » Thu Mar 14, 2019 12:20 pm

hgm wrote: ↑Wed Mar 13, 2019 9:22 am I think that move generators of this type will always be slow. Because the overwhelming majority of nodes in a search tree will be QS nodes, where you are only need captures. And the lookup tables do not provide a way to directly skip to the end of a slider ray; you would still need to test the board on every square along the ray for occupancy.

Maverick uses a hybrid magic bitboard and lookup-table method. The bitboards are used to generate the moves, which are then stored as a list of pointers into the move table. This has the advantage of bitboard move generation of captures etc, and the lookup table advantage of make-unmake. I'm sure it could be optimized more but Maverick was doing single core 8 million nps on a 2015 i7 when it had a basic piece-square table.

Steve

Further progress on new move generator

Further progress on new move generator

Re: Further progress on new move generator

Re: Further progress on new move generator

Re: Further progress on new move generator

Re: Further progress on new move generator

Re: Further progress on new move generator

Re: Further progress on new move generator

Re: Further progress on new move generator

Re: Further progress on new move generator

Re: Further progress on new move generator