Michael Sherwin wrote: ↑Wed Mar 13, 2019 7:35 pm
I do have my engine Carnivor that has a similar GNUChess 4 style table look up move generator that on my 2013 i7-3930k 4.2 GHz processor single thread searches 20 million positions per second. And in the middle game searches 33 million positions per second. And I mean during a game and not just in bench. Although the evaluation function is only a piece square table. I do not know of any bitboard engine than can even get close to that in their perft. What am I missing?
This sounds like a case of suspect counting. When you compare with other engines, you must be sure you count nodes in the same way. Counting moves to nodes where you do nothing at all (stand-pat cutoff on the PST eval that was already calculated incrementally in MakeMove) is a sure way to highly inflate the node count. Micro-Max also has only an incremental eval, but this allows it to judge whether the daughter will have a stand-pat cutoff with 100% accuracy after just calculating the updated eval, without really making the move. So it refrains from making the move in those cases (futility pruning). But it then also doesn't count the node. This effectively speeds up the engine, but with your counting method it would lower its NPS!
Of course micro-Max doesn't have a sophisticated move generator at all. But I noticed that hashing has a huge impact on its speed. Where micro-Max 4.8 (which does use a hash table) runs typically at 2MNPS, micro-Max 1.6 (which does not have hashing) runs at over 6MNPS.
If you want to judge the speed of a move generator, you should really count the number of move generations. Not MakeMoves. And when you do that in an engine that does not use a hash table, also compare it with other engines that run without hash table.
BTW, in my engine HaQiKi D I also use per-square tables for getting the starting index in a large array of move steps. The main reason for this was to implement the confinement of pieces to their zones (HaQiKi D is a Xiangqi engine, so the King must not be allowed to step out of the Palace, etc.) But it also gives some speedup by not wasting time on trying to step pieces near the board edge in directions that can only land off-board.
To speed up capture generation for sliders, I think using a 'neighbor table' neighbor[sqr][dir], containing square numbers of the nearest piece in the given direction for all 8 principal directions, for every occupied square, would easily beat what you are doing here. In this case you could still use a master index table idx[type][sqr], but it would point to a list of
directions rather than a list of to-squares. In each direction you then only have to look up the to-square of the neighbor in that direction. Without repetitively testing the board on all intermediate squares. Like:
Code: Select all
int dirNr = idx[sliderType][fromSqr];
while((dir = dirTable[dirNr++])) {
int toSqr = neighbor[fromSqr][dir];
int victim = board[toSqr];
if(victim & xstm) GenOneCapture(fromSqr, toSqr, victim);
}
This assumes the neighbor table contains the number of some dummy off-board cell (e.g. board[64]) when there is no neighbor in the corresponding direction, and that this off-board square is empty. The point is that for leapers you can omit moves that capture off-board, but for sliders that are not on an edge square, you can never know whether their moves will hit something before they run off board. So this test seems necessary overhead for sliders. It is probably faster to just let the test for an enemy target take care of this than to throw in an extra test on the toSqr number itself (which would save you a board access, but would be done in vain on every toSqr that is on board). On crowded boards most slider moves would hit a piece rather than the edge (when the moves with no squares between slider and edge have already been suppressed).
For generation of leaper captures you can simply use lists of to-squares for each from-square. There would be no need to tabulate a next-index for use in case the square was occupied, as for leapers you would always have to go to the next leap, whether the current leap target was occupied or not.
Code: Select all
int dirNr = idx[leaperType][fromSqr];
while((toSqr = dirTable[dirNr++]) >= 0) {
int victim = board[toSqr];
if(victim & xstm) GenOneCapture(fromSqr, toSqr, victim);
}
(Note I assumed the lists to be terminated by a -1 sentinel here, as 0 might be a valid square number appearing as toSqr. The same might hold for directions, btw, depending on how you encode those.)
Also note that you might trade off a small inefficiency in the number of instructions by tabulating leap
steps, rather than to-squares (requiring an extra addition toSqr = fromSqr + step) for the possibility to highly condense the dirTable[]. Because every from-square not disturbed by the board edge would then get the same list of steps (and edge or corner lists could often be implemented as the final part of an already present larger list). This might gain you more speed by reducing the level-1 cache pressure than the extra addition costs. Note that the lists of slider directions already naturally condenses this way, as for any non-edge square a slider would have the same set of possible move directions.
Of course there still is the question of how much overhead it takes to update the neighbor table. As (due to the preponderance of QS nodes) most moves will be captures, the typical update would only have to alter one neighbor in each of the 8 neighbors of the evacuated square, for the direction that was looking back towards that square. It might be possible to save on this by deferring the update until
after move generation, so that you only have to do it after you are sure that you generated moves that you actually want to search. (After all, in true leaf nodes, of which there are many, you don't search anything, and it would be a pity if you already had updated the table to only discover you don't need that info at all, and have to restore it to its old state before you can return alpha.) This would lead to erroneous generation of slider moves that pass over the evacuated square, which now would seem to end on that square instead (and then be rejected, as that erroneous to-square would not contain an opponent, but be empty instead). This can be solved by testing on a per-move basis.
Code: Select all
int dirNr = idx[sliderType][fromSqr];
while((dir = dirTable[dirNr++])) {
int toSqr = neighbor[fromSqr][dir];
if(toSqr == lastFromSqr) toSqr = neighbor[toSqr][dir]; // extend discovered moves
int victim = board[toSqr];
if(victim & xstm) GenOneCapture(fromSqr, toSqr, victim);
}
As almost all slider moves would not pass over the evacuated square, the extra work would usually be limited to an easily predictable branch instruction. The GenOneCapture could keep track of the attacked pieces or piece type, e.g. by setting bits corresponding to those in an integer ('attacked |= bitCode[victim];'). Then after the generation you can trivially test if a non-futile victim is under attack (i.e. mvvValue[attacked] > alpha - curEval - MARGIN). If not, you can return alpha without having updated or having to restore anything. If the MVV is non-futile (meaning you are going to search the capture of it), you can then update the neighbor table.
So the modest work of updating the neighbor table after a capture in QS
Code: Select all
for(dir=0; dir<4; dir++) { // for all ray orientations
int up = neighbor[lastFromSqr][dir]; // get up- and downstream neighbor along ray
int dn = neighbor[lastFromSqr][dir+4]; // assumes opposite directions encoded as dir, dir+4
neighbor[up][dir+4] = dn; // point them to each other
neighbor[dn][dir] = up;
}
and the restoring:
Code: Select all
for(dir=0; dir<4; dir++) {
int up = neighbor[lastFromSqr][dir];
int dn = neighbor[lastFromSqr][dir+4];
neighbor[up][dir+4] = lastFromSqr; // point them back to
neighbor[dn][dir] = lastFromSqr;
}
would be shared by all daughter nodes (which is at least one).