Perft search speed bottleneck

JarvisXerxes · Post by **JarvisXerxes** » Fri Jun 07, 2013 4:09 am

Hi everyone,

It's my first post here ... Recently I've started my chess engine project in C++ (from scratch).
I just completed the very basics - the magic bitboards, movegen and make/unmake moves.
I choose to generate only pseudo-legal moves in movegen(), and leave the legality check to an external bool function.

Therefore my perft is something like: (U64 is unsigned long long)

Code: Select all

U64 perft(depth)
{
    U64 nodes = 0;
    if (depth == 0)
        return 1;
    moveGen();  // to an external buffer, all pseudo moves
    for (int i = currentPointer; i < nexPointer; i++)
    {
        Move m = moveBuf[i];
        makeMove(m);

        if (!isOppositeKingAttacked())  // external legality check
            nodes += perft(depth - 1);

        unmakeMove(m);  // restore everything exactly
    }
    return nodes;
}

Checking the perft tables on chess-pedia, I find that my code works accurately.
But the speed is extremely disappointing. Only around 3500 kn/s on my i7 Dell Inspiron with 8G RAM. I also compiled the qperft.c by H.G. Muller: his perft runs at a stunning 55000 kn/s, almost 1500% my speed.

I tried for two days in vain. I'd appreciate any advice the experts on this forum might give me.
Thanks in advance.

Steve Maughan · Post by **Steve Maughan** » Fri Jun 07, 2013 5:38 am

Hi Jim,

You may find my blog of interest:

http://www.chessprogramming.net

I have discussed perft quite a bit.

Best regards,

Steve

lucasart · Post by **lucasart** » Fri Jun 07, 2013 6:18 am

JarvisXerxes wrote:Hi everyone,

It's my first post here ... Recently I've started my chess engine project in C++ (from scratch).
I just completed the very basics - the magic bitboards, movegen and make/unmake moves.
I choose to generate only pseudo-legal moves in movegen(), and leave the legality check to an external bool function.

Therefore my perft is something like: (U64 is unsigned long long)
Code: Select all
U64 perft(depth)
{
    U64 nodes = 0;
    if (depth == 0)
        return 1;
    moveGen();  // to an external buffer, all pseudo moves
    for (int i = currentPointer; i < nexPointer; i++)
    {
        Move m = moveBuf[i];
        makeMove(m);

        if (!isOppositeKingAttacked())  // external legality check
            nodes += perft(depth - 1);

        unmakeMove(m);  // restore everything exactly
    }
    return nodes;
}
Checking the perft tables on chess-pedia, I find that my code works accurately.
But the speed is extremely disappointing. Only around 3500 kn/s on my i7 Dell Inspiron with 8G RAM. I also compiled the qperft.c by H.G. Muller: his perft runs at a stunning 55000 kn/s, almost 1500% my speed.

I tried for two days in vain. I'd appreciate any advice the experts on this forum might give me.
Thanks in advance.

From my experience, I can give you the following advice:

1/ be organized. do not just modify the code like that. use a source code management system like git, and use patches and branches, as well as unit tests (anytime you modify the code check that your perft results *stay* correct). This may seem like overkill at first, but really pays off massively.

2/ do not try to guess, but always measure. this is true in general, but in the specific case of speed optimization, it means that you need to use a code profiler, and identify the bottlenecks. Only by doing that you will understand where the inefficiencies are.

3/ remember what matters, and do not spend too much time on unimportant things. you can always come back to those later. in the present case, you may not realize it yet, but perft raw speed has little impact on program strength and even speed in NPS. At the beginning it will, but the more things your engine does at each node (eval + complex search tricks) the less the perft raw speed will matter, and it eventually becomes a drop in the ocean. That being said, 3500 nodes per second is really too low. There must be some serious inefficiencies, and only 1/ will help you fix them (my crystal cannot tell what you're doing wrong).

PS: keep it clean and keep things independant and with no side effects (as much as possible). think of it this way: if I remove this line of code, do I have to read the whole program to understand what side effect it will have and what else will break? For example, writing code like this is a shootable offense

Code: Select all

moveGen();  // to an external buffer, all pseudo moves

For now, you know that this function has a side effect on a global array. But once you start to multiply things like that, even you will get lost in your own code, and make mistakes. Not to mention the day when you want to implement SMP, and global variables and side effects will be your worst nightmare.

JarvisXerxes · Post by **JarvisXerxes** » Fri Jun 07, 2013 6:19 am

Hi Steve,

Thanks for your reply!
I'm reading through your articles on perft. Quick question:
Do you perform the legality check before making the move, or after? In my implementation, I just make a pseudo-legal move and check if my king is attacked (by sliders and nonsliders alike) - so I never computed any pins as described in your article.

I'm kinda lost on when and how I should perform the legality check.

Thanks again.

JarvisXerxes · Post by **JarvisXerxes** » Fri Jun 07, 2013 7:06 am

Hi Lucas,

These are very sound advice. Thank you for your time.

Yes, I did use the visual studio profiler. In fact, it already helped me boost the speed from around 1200 kn/s to 3500 kn/s, though still horribly slow as you've pointed out.

Currently, the profiler analysis report is not so useful. It simply tells me that most of the time is spent on make/unmake move and movegen, but they are intended to be major time consumers in the first place. Stepping into these functions isn't helpful as well, because they're just a sequence of fast bitboard incremental updates.

That's why I guess I've flawed at a higher level. I might ask you the same question I just posted to Steve.

Thanks again.

Chan Rasjid · Post by **Chan Rasjid** » Fri Jun 07, 2013 7:50 am

JarvisXerxes wrote:Hi Steve,

Thanks for your reply!
I'm reading through your articles on perft. Quick question:
Do you perform the legality check before making the move, or after? In my implementation, I just make a pseudo-legal move and check if my king is attacked (by sliders and nonsliders alike) - so I never computed any pins as described in your article.

I'm kinda lost on when and how I should perform the legality check.

Thanks again.

If, as you said, Muller's perft speed is faster by 1500%, then definitely there is something very badly done in your movegen()/make/unmake() or validation routine. No one can do that much better in speeding up a chess program. Or are you ruuning your debug version with assert() turned on?

My engine's NPS is close to that of the top engines. My move validation is done before make() - so I think there should be no reason why validating a move before making it poses any special difficulty.

Rasjid.

hgm · Post by **hgm** » Fri Jun 07, 2013 7:50 am

JarvisXerxes wrote:I also compiled the qperft.c by H.G. Muller: his perft runs at a stunning 55000 kn/s, almost 1500% my speed.

Note that the 55000 is actually moves/sec, not nodes/sec. When qperft reaches a node it generates the moves there, and counts them. It does not make them, like you do. (Or at least, almost never.) When you count how many MakeMoves qperft does, you would have to divide by the average moves / leave node (about 22 for a perft from the opening position), so it does only 2500 kn/ps. (But in this case the counting would be biased in favor of your program, because qperft would do a full move gen after each MakeMove, while you do something simpler.)

So the secret of qperft's speed is that it doesn't use MakeMove() in the last ply on moves where it serves no purpose. Your code needs to make every move to determine its legality. Qperft only makes King moves (which there aren't very many of), to test if the King wandered into check (to make sure it would not try to 'step in its own shadow', although probably this could be tested in a more clever way without fully making the move, for a further speedup). And (even rarer) e.p. captures, to be sure it wouldn't miss the infamous 'double-discovered' check.

Because the moves are generated piece-by-piece, and e.p. captures in a dedicated code section, it can easily determine which section of the move list has to be tested for legality. It then loops only over those, similar to what you do for all moves (Make - CheckTest - UnMake). It then subtracts the number of illegals it finds there from the size of the move list, so it does not even have to loop through the list for counting.

The reason it can do this, is that it does not have to test every move for exposing its own King by unblocking an enemy slider, because it would not generate such moves in the first place. Move generation starts by detection of its own pinned pieces, starting with detection of enemy slider alignment with its own King. This is pretty cheap, as there are only 5 enemy sliders, and most of the time they are not even aligned, so you can stop there. Only if an alignment is detected, it checks if one of its own pieces is actually pinned (and detects if he is in distant check as a free spin-off). Such pieces would be exempted from the regular move-gen process (by temporarily removing them from the piece list), and generate their moves immediately using the details on the pin (only moves that stay on the ray or capture pinner, if the piece has moves in the direction of that ray). So in stead of adding time to every move for having to test it for legality, it saves time by making sure it only generates legal moves in the first place (as far as pins are concerned).

JarvisXerxes · Post by **JarvisXerxes** » Fri Jun 07, 2013 8:15 am

Chan Rasjid wrote: Or are you ruuning your debug version with assert() turned on?

I discovered the terrible blunder right before Chan posted. You're absolutely right. I built my project in visual studio under "Debug mode" instead of "Release", which definitely slows things down by an order of magnitude.

But even a correct build gives me only 20000 kn/s, still way below the standard.

Thanks a lot to HGM for your detailed explanation. This clarifies a lot of mysteries. I'm glad to listen to advice directly from qperft's author.
I'd try out the proposed optimizations and restructure my movegen.

lucasart · Post by **lucasart** » Fri Jun 07, 2013 8:41 am

JarvisXerxes wrote:Hi Lucas,

These are very sound advice. Thank you for your time.

Yes, I did use the visual studio profiler. In fact, it already helped me boost the speed from around 1200 kn/s to 3500 kn/s, though still horribly slow as you've pointed out.

Currently, the profiler analysis report is not so useful. It simply tells me that most of the time is spent on make/unmake move and movegen, but they are intended to be major time consumers in the first place. Stepping into these functions isn't helpful as well, because they're just a sequence of fast bitboard incremental updates.

That's why I guess I've flawed at a higher level. I might ask you the same question I just posted to Steve.

Thanks again.

You should have a look at my code:
https://github.com/lucasart/chess/blob/ ... movegen.cc

It may sound arrogant, but I'm quite confident that it blows away _almost_ anyone's perft here by a margin. About 5 million nodes/second: no hash table, no SMP, no cheating by counting leaves (I count interior nodes and do bulk counting at the leaves, if I coult the leaves, then it's 200 million nodes per second).

The key idea is that I never have to play/undo to check for legality or even to filter pseudo-legal moves by a function like

Code: Select all

move_is_self_check(pseudo_legal_move)

such a function is inevitable atrocious and full of branches.

I always manage to precalculate the bitboard of target squares to as not to generate illegal moves. I generate legal moves directly, and with nosignificant performance loss compared to a pseudo-legal generator, but with the huge gain of never having to check anything afterwards.

For example, to generate Knight moves (pseudo-simplified code to give you the idea)

Code: Select all

from_squares = Knight bitboard
for from_square in from_squares
  to_squares = knight_attacks[from_square]
  if from square belongs to pinned // pinned is a precalculated biboard
    to_squares &= ray[our_king_square][from_square]  // only possible target squares are to stay on the pin-ray

There are lots of annoying cases, such as king evasion, castling moves, or en-passant self-check detection through the en passant captures pawn. It would be a very long post to explain what I did and why I did it that way. It's best to let the code speak for itself.

tpetzke · Post by **tpetzke** » Fri Jun 07, 2013 8:55 am

Hi Jim,

welcome to the forum.

for (int i = currentPointer; i < nexPointer; i++)

Where are current pointer and nex pointer initialized ? Do you maintain them outside of your function ? I would recommend to change that.

Let your move generator return you a pointer to the list of generated moves.

What is your move representation? Most engines have it as an plain int so it is passed around rather quickly and easy to store in the transposition table later.

Where you do the legality check does not influence the speed. For perft a significant speedup however is to perform a legal check at depth 1 and return here the number of legal moves instead of executing them all and return 1. For later search performance this does not help much.

And make sure your are compiling your code in Visual Studio - Release Mode. I'm sure you are doing that, but I wasn't with my first release.

Thomas...

Perft search speed bottleneck

Perft search speed bottleneck

Re: Perft search speed bottleneck

Re: Perft search speed bottleneck

Re: Perft search speed bottleneck

Re: Perft search speed bottleneck

Re: Perft search speed bottleneck

Re: Perft search speed bottleneck

Re: Perft search speed bottleneck

Re: Perft search speed bottleneck

Re: Perft search speed bottleneck