Hi
Thanks to whose who answered to my question , especially thanks to Bo Persson who take his time to look at my (so bad) code and his more detailled answer
Seems I have a lot of work to do !
Best
Dany
Thought bitboards was faster :-)
Moderators: hgm, Rebel, chrisw
-
- Posts: 759
- Joined: Fri Jan 04, 2013 4:55 pm
- Location: Nice
Re: Thought bitboards was faster :-)
Isa download :
-
- Posts: 741
- Joined: Tue May 22, 2007 11:13 am
Re: Thought bitboards was faster :-)
Small nitpick: it's almost never necessary to help the compiler by providing explicit template arguments. You can and should rely on template argument deduction by writingPatrice Duhamel wrote: ↑Thu Feb 11, 2021 6:38 pm Using compiler explorer (I don't have latest MSVC installed), it seems MSVC add a test for popcount, and other <bit> functions.
Code: Select all
int bitcount(uint64_t b) { return std::popcount<uint64_t>(b); }
Code: Select all
int bitcount(uint64_t b)
{
return std::popcount(b); // b is of type uint64_t and the compiler will select the right template instantiation
}
-
- Posts: 4052
- Joined: Thu May 15, 2008 9:57 pm
- Location: Berlin, Germany
- Full name: Sven Schüle
Re: Thought bitboards was faster :-)
There have been a gazillion of threads about this topic in the past. In the most recent one I remember, from few weeks ago, I have explained once more why writing an "almost fully legal move generator" is simple and does not decrease overall performance.mvanthoor wrote: ↑Thu Feb 11, 2021 8:55 pmThat is just the ting; my move generator is also pseudo-legal. To be able to bulk-count, I would therefore have to move the is-in-check function from make_move to the move generator; then it will become fully legal, but this is bad for chess performance.
I could add it into the move generator, and then make both of them mutually exclusive; use the is-in-check in the move generator when running perft, and use the one in make_move when playing games. That feels like a hack though.
Creating a fully legal move generator by not generating illegal moves (by detecting pinned pieces and such) is an enhancement I wish to keep for a (much) later date.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
-
- Posts: 7218
- Joined: Mon May 27, 2013 10:31 am
Re: Thought bitboards was faster :-)
Maybe If you do only piece square evaluation then movegeneration slows down performance.
Also when computing perfts you don't like to wait for minutes to test if movegeneration is working fine.
Bugs encountered during perft computation are so time consuming and tedious to fix. Like a rat walking in a mill.
So watch out for each small change in your code affecting move generation. Maybe put all code used by move generation in a package so that it is isolated and should never change.
-
- Posts: 32
- Joined: Sun Oct 22, 2017 4:36 am
- Location: Canada
- Full name: Federico Rojo
Re: Thought bitboards was faster :-)
Daniel Anulliero wrote: ↑Fri Feb 12, 2021 12:37 am Hi
Thanks to whose who answered to my question , especially thanks to Bo Persson who take his time to look at my (so bad) code and his more detailled answer
Seems I have a lot of work to do !
Best
Dany
Hey Dany,
Here are my 2 cents.
it would seem to me there is something else on your movegenerator that is making it slow. Even your 0x88 perft is extremely slow. For reference, I would not believe Ceibo's movegen is particularly fast, and would believe it to be a little more than suboptimal. Even then, Ceibo 0.6 reaches perft 6 in 6.64 secs on an i6700k (no hash, no bulk counting). That's nearly 5 times faster than Isa's. Don't know which hardware you used, but can't imagine there could be such difference other than related to the movegen implementation itself.
Which leads to me the following... Does your move generator do anything else aside from actually generating the moves ? Like scoring , ordering, complex move encoding or anything like that ? , which may contribute to making it slower than expected, other than the actual move generator?
Also, i dont know anything about bitboards, but here is just a thought on how to find the issue. Perhaps you could try setting a board with 1 piece at a time, do perft and compare with normal movegen. Eg.Start with only a king on each side. do perft and compare. Then add a rook , and so on so forth for each piece type. Perhaps that could help you isolate any big differences and if there is anything wrong with implementation of pawns, or sliding pieces, etc.
Good luck Dany! I hope you get this resolved. Can't wait to see a bitboard ISA vs 0x88 Ceibo match !!
-
- Posts: 1357
- Joined: Wed Mar 08, 2006 10:15 pm
- Location: San Francisco, California
Re: Thought bitboards was faster :-)
I don't have much help to offer, other than to say that the perft speed for Myrddin more than doubled when I converted from 0x88 to bitboards.
Myrddin v0.85
perft 6 = 119060324 in time 6422
Myrddin v.086
perft 6 = 119060324 in time 2672
To be fair, this is using bulk counting, but it's the ratio that's important. Also, I used Pradyu Kannan's excellent "magicmoves" code for the sliders, but I wrote the movegen for knights/pawns/kings myself.
Myrddin v0.85
perft 6 = 119060324 in time 6422
Myrddin v.086
perft 6 = 119060324 in time 2672
To be fair, this is using bulk counting, but it's the ratio that's important. Also, I used Pradyu Kannan's excellent "magicmoves" code for the sliders, but I wrote the movegen for knights/pawns/kings myself.
-
- Posts: 434
- Joined: Thu Apr 26, 2012 1:51 am
- Location: Oak Park, IL, USA
- Full name: Erik Madsen
Re: Thought bitboards was faster :-)
This discussion prompted me to adjust how I count nodes per sec in my engine to a more honest measurement (not incrementing node count when verifying legality).
I advise using the same move generator in perft as you do in your search function. My engine's perft function runs the same code as my search function, except a) it doesn't call the evaluation function and b) it doesn't take beta cutoffs- meaning it plays every legal move in every position.
For the sake of comparision, my engine MadChess (2550 Elo) uses bitboards and its perft function mimics search. Even in perft it does all of the following:
Running the same code in perft as search has one drawback:
When I add a call to the evaluation function (meaning, do a real search), MadChess slows down to 860K moves per sec. This confirms what H.G. said. Chess engines spend most of their time in the evaluation function.
I have ideas for optimizing my move generator, specifically the code that verifies move legality. With this framework in place I know if I measure a speed improvement in perft it's present in search also, and therefore will improve engine playing strength.
I agree with H.G.Also note that in many engines the large majority of time is spent in evaluation. So even a bulk-counting perft, which is a good measure for how fast the move generation + move making in the engine takes, can be sped up a factor 2, while the engine then only benefits 5%. Because 90% of the time it was doing other things. Like move sorting or evaluation.
I agree with federico, this is a key question. If you are mimicking search logic then I'd say your perft speed is a little slow but not too bad. If you're not mimicking search logic- meaning you've optimized perft solely to generate and count legal moves- then your perft speed is slow.federico wrote: ↑Fri Feb 12, 2021 3:06 pm Which leads to me the following... Does your move generator do anything else aside from actually generating the moves ? Like scoring , ordering, complex move encoding or anything like that ? , which may contribute to making it slower than expected, other than the actual move generator?
I advise using the same move generator in perft as you do in your search function. My engine's perft function runs the same code as my search function, except a) it doesn't call the evaluation function and b) it doesn't take beta cutoffs- meaning it plays every legal move in every position.
For the sake of comparision, my engine MadChess (2550 Elo) uses bitboards and its perft function mimics search. Even in perft it does all of the following:
- Staged Pseudolegal Move Generation
- Finds potentially pinned pieces (pinned to king)
- Prioritizes (history score lookup always 0 for perft) and sorts moves
- Verifies move legality
- Marks move as played (used by history heuristic for quiet moves that fail to cause beta cutoff when later quiet move does)
- Plays and undoes move
- No bulk counting
Running the same code in perft as search has one drawback:
- You lose the perft race compared to other engines.
- You ensure your search function is examining the correct number of moves at every position.
- You can accurately measure the speed of your evaluation function.
When I add a call to the evaluation function (meaning, do a real search), MadChess slows down to 860K moves per sec. This confirms what H.G. said. Chess engines spend most of their time in the evaluation function.
I have ideas for optimizing my move generator, specifically the code that verifies move legality. With this framework in place I know if I measure a speed improvement in perft it's present in search also, and therefore will improve engine playing strength.
My C# chess engine: https://www.madchess.net
-
- Posts: 860
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Thought bitboards was faster :-)
The subject of perft speed always leaves me confused. I suggest forming a committee, formalizing a definition and holding an ongoing competition in which contestants submit source code for verification and compilation on a freely available compiler. I suggest that one person host the competition on a fixed Intel system and another person host the competition on a fixed AMD system.
-
- Posts: 860
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Thought bitboards was faster :-)
I have an AMD 3950x overclocked to 4.2 GHz. There are 5 single threaded perft examples I can contribute. No bulk counting. No use of hash. Make and unmake every move. MakeMove() and TakeBack() are like in a real engine keeping track of material and other things, but not standardized.
CARNIVOR - Only real chess engine. Enhanced GNUChess4 style move generator. Written in C. Compiled by Turbo C 1.0
bench 6 2.266 seconds
bench 7 60.695 seconds
GODZILLA - Same type mg as Carnivor. Mostly written in x86 32 bit assembler. Perft() written in TC 1.0.
bench 6 1.748 seconds
bench 7 47.239 seconds
CONUNDRUM - I barely remember how it works, sort of. It is incremental move generation only generating moves for pieces affected by previous moves.
perft 6 2.258 seconds
perft 7 61.401 seconds
HALFWIT - Classical bitboards similar to Chess4.5. TC 1.0. Identical to mg in RomiChess.
perft 6 3.496 seconds
perft 7 90.937 seconds
BRICABRAC - SISSY bitboards. MSVC 2019. C code. Some differences from above. Uses printf() to print number for each root move. And calls InCheck() after making each move. No attempt to optimize anything. Just a raw first write with shortcuts. Was going to optimize later but ran out of brain power.
perft 6 slightly less than 4 seconds. (never added timing code)
perft 7 124 seconds
CARNIVOR - Only real chess engine. Enhanced GNUChess4 style move generator. Written in C. Compiled by Turbo C 1.0
bench 6 2.266 seconds
bench 7 60.695 seconds
GODZILLA - Same type mg as Carnivor. Mostly written in x86 32 bit assembler. Perft() written in TC 1.0.
bench 6 1.748 seconds
bench 7 47.239 seconds
CONUNDRUM - I barely remember how it works, sort of. It is incremental move generation only generating moves for pieces affected by previous moves.
perft 6 2.258 seconds
perft 7 61.401 seconds
HALFWIT - Classical bitboards similar to Chess4.5. TC 1.0. Identical to mg in RomiChess.
perft 6 3.496 seconds
perft 7 90.937 seconds
BRICABRAC - SISSY bitboards. MSVC 2019. C code. Some differences from above. Uses printf() to print number for each root move. And calls InCheck() after making each move. No attempt to optimize anything. Just a raw first write with shortcuts. Was going to optimize later but ran out of brain power.
perft 6 slightly less than 4 seconds. (never added timing code)
perft 7 124 seconds
-
- Posts: 860
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Thought bitboards was faster :-)
Also why isn't Qsearch included in perft()? Isn't the ability of Qsearch to discover Illegal positions a big part of move generation efficiency?