Thought bitboards was faster :-)

Daniel Anulliero · Post by **Daniel Anulliero** » Fri Feb 12, 2021 12:37 am

Hi
Thanks to whose who answered to my question , especially thanks to Bo Persson who take his time to look at my (so bad

) code and his more detailled answer

Seems I have a lot of work to do !
Best
Dany

Rein Halbersma · Post by **Rein Halbersma** » Fri Feb 12, 2021 8:31 am

Patrice Duhamel wrote: ↑Thu Feb 11, 2021 6:38 pm Using compiler explorer (I don't have latest MSVC installed), it seems MSVC add a test for popcount, and other <bit> functions.
Code: Select all
int bitcount(uint64_t b)
{
    return std::popcount<uint64_t>(b);
}

Small nitpick: it's almost never necessary to help the compiler by providing explicit template arguments. You can and should rely on template argument deduction by writing

Code: Select all

int bitcount(uint64_t b)
{
    return std::popcount(b); // b is of type uint64_t and the compiler will select the right template instantiation
}

Sven · Post by **Sven** » Fri Feb 12, 2021 9:16 am

mvanthoor wrote: ↑Thu Feb 11, 2021 8:55 pm
hgm wrote: ↑Thu Feb 11, 2021 8:33 pm I must admit that Qperft doesn't do pure bulk counting, because it doesn't use a fully-legal move generator.
That is just the ting; my move generator is also pseudo-legal. To be able to bulk-count, I would therefore have to move the is-in-check function from make_move to the move generator; then it will become fully legal, but this is bad for chess performance.

I could add it into the move generator, and then make both of them mutually exclusive; use the is-in-check in the move generator when running perft, and use the one in make_move when playing games. That feels like a hack though.

Creating a fully legal move generator by not generating illegal moves (by detecting pinned pieces and such) is an enhancement I wish to keep for a (much) later date.

There have been a gazillion of threads about this topic in the past. In the most recent one I remember, from few weeks ago, I have explained once more why writing an "almost fully legal move generator" is simple and does not decrease overall performance.

Henk · Post by **Henk** » Fri Feb 12, 2021 10:28 am

Ras wrote: ↑Thu Feb 11, 2021 9:15 pm
mvanthoor wrote: ↑Thu Feb 11, 2021 7:21 pmDoubling the speed your perft runs at, should gain about 50 Elo;
No way unless you also double everything else in speed because move generation isn't where engines spend the majority of their time.

Maybe If you do only piece square evaluation then movegeneration slows down performance.
Also when computing perfts you don't like to wait for minutes to test if movegeneration is working fine.
Bugs encountered during perft computation are so time consuming and tedious to fix. Like a rat walking in a mill.
So watch out for each small change in your code affecting move generation. Maybe put all code used by move generation in a package so that it is isolated and should never change.

federico · Post by **federico** » Fri Feb 12, 2021 3:06 pm

Daniel Anulliero wrote: ↑Fri Feb 12, 2021 12:37 am Hi
Thanks to whose who answered to my question , especially thanks to Bo Persson who take his time to look at my (so bad) code and his more detailled answer
Seems I have a lot of work to do !
Best
Dany

Hey Dany,

Here are my 2 cents.

it would seem to me there is something else on your movegenerator that is making it slow. Even your 0x88 perft is extremely slow. For reference, I would not believe Ceibo's movegen is particularly fast, and would believe it to be a little more than suboptimal. Even then, Ceibo 0.6 reaches perft 6 in 6.64 secs on an i6700k (no hash, no bulk counting). That's nearly 5 times faster than Isa's. Don't know which hardware you used, but can't imagine there could be such difference other than related to the movegen implementation itself.

Which leads to me the following... Does your move generator do anything else aside from actually generating the moves ? Like scoring , ordering, complex move encoding or anything like that ? , which may contribute to making it slower than expected, other than the actual move generator?

Also, i dont know anything about bitboards, but here is just a thought on how to find the issue. Perhaps you could try setting a board with 1 piece at a time, do perft and compare with normal movegen. Eg.Start with only a king on each side. do perft and compare. Then add a rook , and so on so forth for each piece type. Perhaps that could help you isolate any big differences and if there is anything wrong with implementation of pawns, or sliding pieces, etc.

Good luck Dany! I hope you get this resolved. Can't wait to see a bitboard ISA vs 0x88 Ceibo match !!

JVMerlino · Post by **JVMerlino** » Fri Feb 12, 2021 7:08 pm

I don't have much help to offer, other than to say that the perft speed for Myrddin more than doubled when I converted from 0x88 to bitboards.

Myrddin v0.85
perft 6 = 119060324 in time 6422

Myrddin v.086
perft 6 = 119060324 in time 2672

To be fair, this is using bulk counting, but it's the ratio that's important. Also, I used Pradyu Kannan's excellent "magicmoves" code for the sliders, but I wrote the movegen for knights/pawns/kings myself.

emadsen · Post by **emadsen** » Fri Feb 12, 2021 7:21 pm

This discussion prompted me to adjust how I count nodes per sec in my engine to a more honest measurement (not incrementing node count when verifying legality).

Also note that in many engines the large majority of time is spent in evaluation. So even a bulk-counting perft, which is a good measure for how fast the move generation + move making in the engine takes, can be sped up a factor 2, while the engine then only benefits 5%. Because 90% of the time it was doing other things. Like move sorting or evaluation.

I agree with H.G.

federico wrote: ↑Fri Feb 12, 2021 3:06 pm Which leads to me the following... Does your move generator do anything else aside from actually generating the moves ? Like scoring , ordering, complex move encoding or anything like that ? , which may contribute to making it slower than expected, other than the actual move generator?

I agree with federico, this is a key question. If you are mimicking search logic then I'd say your perft speed is a little slow but not too bad. If you're not mimicking search logic- meaning you've optimized perft solely to generate and count legal moves- then your perft speed is slow.

I advise using the same move generator in perft as you do in your search function. My engine's perft function runs the same code as my search function, except a) it doesn't call the evaluation function and b) it doesn't take beta cutoffs- meaning it plays every legal move in every position.

For the sake of comparision, my engine MadChess (2550 Elo) uses bitboards and its perft function mimics search. Even in perft it does all of the following:

Staged Pseudolegal Move Generation
Finds potentially pinned pieces (pinned to king)
Prioritizes (history score lookup always 0 for perft) and sorts moves
Verifies move legality
Marks move as played (used by history heuristic for quiet moves that fail to cause beta cutoff when later quiet move does)
Plays and undoes move
No bulk counting

Perft runs at 4.7 million moves per sec. So it completes perft 6 in 25.2 seconds.

Running the same code in perft as search has one drawback:

You lose the perft race compared to other engines.

But has two very important benefits:

You ensure your search function is examining the correct number of moves at every position.
You can accurately measure the speed of your evaluation function.

If you have separate move generators (one optimized for perft, another optimized for search), correct perft tests (in terms of node count) do not guarantee your search examines the same nodes. This opens the possibility of insidious move generation bugs that weaken your engine and cannot be found by perft tests.

When I add a call to the evaluation function (meaning, do a real search), MadChess slows down to 860K moves per sec. This confirms what H.G. said. Chess engines spend most of their time in the evaluation function.

I have ideas for optimizing my move generator, specifically the code that verifies move legality. With this framework in place I know if I measure a speed improvement in perft it's present in search also, and therefore will improve engine playing strength.

Mike Sherwin · Post by **Mike Sherwin** » Sat Feb 13, 2021 1:48 am

The subject of perft speed always leaves me confused. I suggest forming a committee, formalizing a definition and holding an ongoing competition in which contestants submit source code for verification and compilation on a freely available compiler. I suggest that one person host the competition on a fixed Intel system and another person host the competition on a fixed AMD system.

Mike Sherwin · Post by **Mike Sherwin** » Sat Feb 13, 2021 2:42 am

I have an AMD 3950x overclocked to 4.2 GHz. There are 5 single threaded perft examples I can contribute. No bulk counting. No use of hash. Make and unmake every move. MakeMove() and TakeBack() are like in a real engine keeping track of material and other things, but not standardized.

CARNIVOR - Only real chess engine. Enhanced GNUChess4 style move generator. Written in C. Compiled by Turbo C 1.0
bench 6 2.266 seconds
bench 7 60.695 seconds

GODZILLA - Same type mg as Carnivor. Mostly written in x86 32 bit assembler. Perft() written in TC 1.0.
bench 6 1.748 seconds
bench 7 47.239 seconds

CONUNDRUM - I barely remember how it works, sort of. It is incremental move generation only generating moves for pieces affected by previous moves.
perft 6 2.258 seconds
perft 7 61.401 seconds

HALFWIT - Classical bitboards similar to Chess4.5. TC 1.0. Identical to mg in RomiChess.
perft 6 3.496 seconds
perft 7 90.937 seconds

BRICABRAC - SISSY bitboards. MSVC 2019. C code. Some differences from above. Uses printf() to print number for each root move. And calls InCheck() after making each move. No attempt to optimize anything. Just a raw first write with shortcuts. Was going to optimize later but ran out of brain power.
perft 6 slightly less than 4 seconds. (never added timing code)
perft 7 124 seconds

Mike Sherwin · Post by **Mike Sherwin** » Sat Feb 13, 2021 3:40 am

Also why isn't Qsearch included in perft()? Isn't the ability of Qsearch to discover Illegal positions a big part of move generation efficiency?

Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)