Thought bitboards was faster :-)

mvanthoor · Post by **mvanthoor** » Thu Feb 11, 2021 7:10 pm

About 33 seconds for perft 6? That is -really- slow. I think there's something wrong somewhere; making allocations that you don't have to make, that sort of thing.

Rustic @ i7-6700K. No bulk-counting, 1 thread. First set is without TT, second is with a TT of 1 MB. Even a 1 MB TT improves the result by 30%.

Speed without hash is 3.28 seconds for perft 6, which is 10 times faster than your engine. Even if you are not using magic bitboards, the speed difference shouldn't be that big. And, because of implementing incremental evaluation, the engine took a speed hit of about 10%; it could have been even faster. At the moment though, I'm not optimizing for perft speed anymore. I did that to remove bottlenecks from make/unmake, and the move generator, and those haven't changed for a long time. The "only" thing left in that regard is to add enhancements to the move generator.

I'd be looking for things like unnecessary allocations or copies in loops within your move generator, make/unmake, and maybe the perft-function itself.

Code: Select all

Hash table: disabled.

Perft 1: 20 (0 ms, inf leaves/sec)
Perft 2: 400 (0 ms, inf leaves/sec)
Perft 3: 8902 (0 ms, inf leaves/sec)
Perft 4: 197281 (5 ms, 39456200 leaves/sec)
Perft 5: 4865609 (140 ms, 34754350 leaves/sec)
Perft 6: 119060324 (3283 ms, 36265709 leaves/sec)
Perft 7: 3195901860 (94987 ms, 33645676 leaves/sec)
Total time spent: 98415 ms
Execution speed: 33735044 leaves/second

Code: Select all

Hash table: 1 MB

Perft 1: 20 (0 ms, inf leaves/sec)
Perft 2: 400 (0 ms, inf leaves/sec)
Perft 3: 8902 (0 ms, inf leaves/sec)
Perft 4: 197281 (5 ms, 39456200 leaves/sec)
Perft 5: 4865609 (97 ms, 50160917 leaves/sec)
Perft 6: 119060324 (2131 ms, 55870635 leaves/sec)
Perft 7: 3195901860 (60747 ms, 52610036 leaves/sec)
Total time spent: 62980 ms
Execution speed: 52715693 leaves/second

mvanthoor · Post by **mvanthoor** » Thu Feb 11, 2021 7:21 pm

lithander wrote: ↑Thu Feb 11, 2021 5:17 pm Just to make sure: That's how fast you can generate legal moves for tree search, too? It's not just spitting out a perft count but actually does generate all the 119,060,324 moves?

If you do NOT use a hash table and do NOT use bulk counting, you will have to search down to depth == 0 (which are the leaf nodes) and count every move made one by one. The speed tells you the number of leaves/second the program can count, and thus how fast it can generate, make, and unmake moves. So yes, if you can increase that number, it means your engine is faster in generating/making/unmaking moves.

Doubling the speed your perft runs at, should gain about 50 Elo; in effect, it would be the same as if your processor became twice as fast.

Your Ryzen 5 3600 is about twice as fast as my i7-6700K. (Passmark benchmark 17.868 vs 8.957). Therefore, if I run your perft 6 on my machine, it would take about 32 seconds. That would be 10 times slower than Rustic. All else in the engine being the same (such as: same optimizations in the search, similar material+PST evaluation, both playing without a hash table and a limited opening book), that alone could account for a 200 Elo difference.

lithander · Post by **lithander** » Thu Feb 11, 2021 7:51 pm

Joost Buijs wrote: ↑Thu Feb 11, 2021 6:55 pm You probably mean how fast the move-picker in my tree search is, it's been a while since I timed this, if I remember it well it does around 25 mnps.

That's impressive! My simple implementation is at ~1 mnps just counting material!

Seems like the best practices in chess programming like undo/redo moves, bitboards and hashtables are used by literally everyone (except me, yet) not out of a sense of fashion. They really do make a large difference.

mvanthoor wrote: ↑Thu Feb 11, 2021 7:21 pm Doubling the speed your perft runs at, should gain about 50 Elo; in effect, it would be the same as if your processor became twice as fast.

That's a very interesting rule of thumb. Good to know!

mvanthoor wrote: ↑Thu Feb 11, 2021 7:21 pm Your Ryzen 5 3600 is about twice as fast as my i7-6700K. (Passmark benchmark 17.868 vs 8.957).

Does your perft function use all logical cores? If yes: mine doesn't which explains the 10x performance difference. If no: you should compare the single threaded performance of our CPUs. Which I'm pretty sure isn't twice as good. (My CPU got 50% more cores but not twice the single-core performance)

But that's just nitpicking. Your advice is of course sound: After my modest goals with Minimal Chess I'll try again and aim higher!

mvanthoor · Post by **mvanthoor** » Thu Feb 11, 2021 8:01 pm

lithander wrote: ↑Thu Feb 11, 2021 7:51 pm Does your perft function use all logical cores? If yes: mine doesn't which explains the 10x performance difference. If no: you should compare the single threaded performance of our CPUs. Which I'm pretty sure isn't twice as good. (My CPU got 50% more cores but not twice the single-core performance)

I forgot to take core-count into account. Now I'll have to go and look up the numbers again.

My perft (and engine) only use 1 core. I do have a (very simple) multi-threaded perft implemented as a test, and it works, but I have to redo that again someday. I'm going to look into that again when I'm going to implement Lazy SMP, far into the future.

For now, perft is single-threaded (1 core), has no bulk-counting (might add it some day), but can use a hash table.

But that's just nitpicking. Your advice is of course sound: After my modest goals with Minimal Chess I'll try again and aim higher!

I look forward to it

I'm just at the point where I'm using perft to test the hash table I've implemented (= recovered and adapted from Rustic before it even could play a full game of chess, about a year ago), and it works. I just need to extend it so it can be used in-game. I'm curious how much faster the engine will become because of the saving of transpositions and hash move sorting.

But with regard to perft... it's the ultimate unit test for a chess engine. If you add something and the perft suite still works, you probably didn't break your engine

hgm · Post by **hgm** » Thu Feb 11, 2021 8:17 pm

lithander wrote: ↑Thu Feb 11, 2021 5:17 pm [quote="Joost Buijs" post_id=882137 time=1613051129 user_id=44Just to make sure: That's how fast you can generate legal moves for tree search, too? It's not just spitting out a perft count but actually does generate all the 119,060,324 moves?

Qperft is mailbox, and does actually generate all 119M moves in 0.72 sec. (3.2GHz i7, single thread, no hashing.)

What it doesn't do is actually make these moves, or even count them by looping through the move list and incrementing the counter by 1 for each move. Instead it just adds the length of the move list to the perft count ('bulk counting').

Perfts that make the last move correlate very poorly with the speed of the corresponding engine, because engines do not make the last move either. In the leaf nodes they generate moves, to discover that they don't want to search (and thus make) any of them. Without bulk counting, the perft basically measueres the speed of MakeMove/UnMake, which is not what your engine will be spending its time on. Then alterations that speed up the perft can easily slow down the engine.

Bitboard is not especially fast for perft. Its main virtue is that it can be fast for evaluation, compared to a mailbox implementation that doesn't use any advanced techniques.

Also note that in many engines the large majority of time is spent in evaluation. So even a bulk-counting perft, which is a good measure for how fast the move generation + move making in the engine takes, can be sped up a factor 2, while the engine then only benefits 5%. Because 90% of the time it was doing other things. Like move sorting or evaluation.

mvanthoor · Post by **mvanthoor** » Thu Feb 11, 2021 8:28 pm

hgm wrote: ↑Thu Feb 11, 2021 8:17 pm Perfts that make the last move correlate very poorly with the speed of the corresponding engine, because engines do not make the last move either. In the leaf nodes they generate moves, to discover that they don't want to search (and thus make) any of them. Without bulk counting, the perft basically measueres the speed of MakeMove/UnMake, which is not what your engine will be spending its time on. Then alterations that speed up the perft can easily slow down the engine.

So in your opinion, bulk counting should be the default with regard to perft speed? Then I'll have to add it asap

And yes, speeding up the engine, using techniques such as incremental PST values for evaluation, slows down perft.

hgm · Post by **hgm** » Thu Feb 11, 2021 8:33 pm

Well, bulk counting makes it more representative of engine performance, and it is faster. So I never saw a reason not to use it. Where possible.

I must admit that Qperft doesn't do pure bulk counting, because it doesn't use a fully-legal move generator. The King moves are only pseudo-legal. So it does make those, in order to throw an IsSquareAttacked() on the resulting positions, to see if they should be counted or not. The number of King moves is not very large, though. Especially in the initial position of orthodox Chess.

mvanthoor · Post by **mvanthoor** » Thu Feb 11, 2021 8:55 pm

hgm wrote: ↑Thu Feb 11, 2021 8:33 pm I must admit that Qperft doesn't do pure bulk counting, because it doesn't use a fully-legal move generator.

That is just the ting; my move generator is also pseudo-legal. To be able to bulk-count, I would therefore have to move the is-in-check function from make_move to the move generator; then it will become fully legal, but this is bad for chess performance.

I could add it into the move generator, and then make both of them mutually exclusive; use the is-in-check in the move generator when running perft, and use the one in make_move when playing games. That feels like a hack though.

Creating a fully legal move generator by not generating illegal moves (by detecting pinned pieces and such) is an enhancement I wish to keep for a (much) later date.

Ras · Post by **Ras** » Thu Feb 11, 2021 9:15 pm

mvanthoor wrote: ↑Thu Feb 11, 2021 7:21 pmDoubling the speed your perft runs at, should gain about 50 Elo;

No way unless you also double everything else in speed because move generation isn't where engines spend the majority of their time.

hgm · Post by **hgm** » Thu Feb 11, 2021 10:34 pm

mvanthoor wrote: ↑Thu Feb 11, 2021 8:55 pmCreating a fully legal move generator by not generating illegal moves (by detecting pinned pieces and such) is an enhancement I wish to keep for a (much) later date.

That is indeed a problem. It is easy to recognize when moves are King move (especially as the moves are generated piece by piece). It is much harder to test whether a move with another piece exposes the Kings, and every move is a candidate for that. That pretty much excludes bulk counting.

This is also the reason why I never support perft in any of my engines. I would have to add too much code to ensure the last ply is legal that is not used in the engine itself. Many of my engines do not do pin detection before move generation, and to be able to do null move before move generation they then have to test each move individual for exposing the King in the reply node. That is expensive compared to the MakeMove that brought you there. But still cheap compared to what you are going to do in that child node (and its children). So for the engine as a whole it is rather insignificant.

Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)

Re: Thought bitboards was faster :-)