A comparison of some Perft programs

phhnguyen · Post by **phhnguyen** » Sat Jan 10, 2026 11:25 pm

Recently, I have been working with Perft functions for a while. Many times, I have to run Perft on other programs to verify my results. Typically, I use some chess engines in the Stockfish family (but for Xiangqi). It is a convenience since those chess engines are always available on my computers. Their results are trustworthy, too. However, they may require a significant amount of time for computations, especially for high depths. When waiting for them, I question myself if there is a better choice - a much faster dedicated Perft engine. Thus, I spent time finding and comparing Perft speeds of some programs. I post here in the hope it may help others in similar situations. You may use it as a guideline, but it still needs to be verified further. Some results are true on my hardware, my compilers, my conditions, but may not be true in other environments.

Computer: I use my old PC with AMD Ryzen 7 1700 Eight-Core Processor, 3.00 GHz, RAM 16.0 GB, Windows 10 Pro. I guess it could run AVX2 fine, but not with high performance. I have a newer laptop (MacBook Pro M1 Max) too, but some programs, such as chessbit, Gigantua are optimised, dedicated for Intel PC and cannot be compiled for my laptop.

Perft depth 8 for start position: I don’t want to wait too long. That depth is reasonable for all programs. It looks like a sweet depth when elapsed times are just right (not too short nor too long). I focused on computing Perft for the start position; thus, I ignored all other ones. Later, I may make another comparison using multiple positions. The good news is that all programs create the correct node count for that Perft.

Perft speed up techniques: So far, I know some techniques to speed up Pertt: 1) bulk counting, 2) hashing, 3) threading, 4) C++ templates (used to expand the code when compiling to reduce conditions/branches to gain speed), 5) optimisations using bit-manipulation instructions (BMI) for modern hardware to speed up move generators, especially for magic-bitboard ones. Typically, the first three methods can improve Perft speed significantly. The last two methods may get some gains. All programs in this comparison are open source. I looked at and studied all their code quickly to get their ideas of implementations for Perft.

1) Stockfish 17.1 by Stockfish team
https://github.com/official-stockfish/Stockfish

Stockfish is only a chess engine in this comparison when all other programs are Perft dedicated. I can’t compile the latest/development one on my PC. Thus, I downloaded and ran the exe file of version 17.1 (the file stockfish-windows-x86-64-avx2.exe).

Stockfish uses straightforward Perft code, using bulk counting, but not hashing or multithreading. It is also well optimised for modern hardware and uses a lot of C++ templates to gain speed. I use Stockfish to create the baseline speed. Surprisingly, Stockfish is the second slowest, not at the bottom of the comparison list.

2) qperft (Quick Perft) by H.G. Muller
https://home.hccnet.nl/h.g.muller/dwnldpage.html

Perhaps the program was released 20 years ago, and the latest update was about 12 years ago (based on some discussions on forums).
I downloaded and ran the exe file (qperft.exe). It’s a surprise for the oldest one, still running fine and got a good position in the list above two newer programs, including Stockfish. It seems to use mainly bulk counting and hashing to speed up. However, it is not much faster than Stockfish, probably because it uses mailbox board representation. From my experience, it is not easy to optimise the mailbox generators for speeding. It is only one using the mailbox in the list.

3) BBPerft by Manik Charan
https://github.com/Mk-Chan/BBPerft

The project was updated 6 years ago. I downloaded and compiled using the make command.

At a glance at the code, it uses templates but doesn't use any good gain technique (such as bulk counting, hashing, or multithreading). It doesn’t optimise for modern hardware either. Thus, it is amazing since it is (a bit) faster than Stockfish.

4) Juddperft by Judd Niemann
https://github.com/jniemann66/juddperft

The project was updated recently. I downloaded and ran the exe file for 64-bit.

It is significantly faster than Stockfish, 9 times faster. At a glance at the code, it uses all good techniques from bulk counting, hashing, and multithreading to speed up Persft speed. However, it doesn’t use templates, and it is not optimised for modern hardware

5) chessbit by Thomas Albert
https://github.com/thuijbregts/chessbit

The project was updated recently. I downloaded and ran the file chessbit_fastest.exe.

The program is faster than Stockfish. It is also faster than Gigantua, as the author claimed. However, the gap is not large. His other claim, "The fastest Perft engine", failed, at least on this test. At a glance at the code, the program is too complicated for me, using heavy templates. It is optimised for modern hardware. However, it looks like it doesn't use hashing or multithreading, which is why it cannot match the faster ones.

6) Gigantua by Daniel Infuehr
https://github.com/Gigantua/Gigantua

The project was updated 6 years ago. I downloaded and ran its exe file.

I knew there were some long, hard forum discussions about this program and the author's strong claims, such as "Worlds-fastest-Bitboard-Chess-Movegenerator" and "Worlds Fastest CPU Movegenerator". However, it is slower than Stockfish on my PC, and it is at the bottom of the list. Those all claims failed, at least on this test. It looks like the author tried to gain speed mostly by applying heavy templates and optimisations. Perhaps my computer is too old for those optimisations. As on one of his forum posts, the program uses bulk counting but not hashing or multithreading, thus it may be the main reason for lagging behind all others. Somewhat, it is similar to BBPerft in using techniques, but surprisingly, it is slower.

7) MPerft by Richard Delorme
https://github.com/abulmo/MPerft

The project was updated 6 years ago. I downloaded and compiled it via the make command.

It is the fastest and significantly faster than other programs in this comparison, 17 times faster than Stockfish. On the screen, it prints clearly that it uses hashing and multithreading. A quick code study reveals it doesn’t use templates, and it is not optimised for modern hardware.

The author has another and newer dedicated Perft program named hqperft, but it runs a bit slower on my computer.

The table below lists the names of programs and their elapsed times to complete Perft 8 of the start position, tested on a PC AMD Ryzen 7 1700 Eight-Core Processor, 3.00 GHz, 16 GB RAM, Windows 10 Pro:

Code: Select all

program            elapsed (ms)
1 MPerft                  32032
2 Juddperft               62094
3 BBPerft                431687
4 chessbit               441761
5 qperft                 516060
6 Stockfish 17.1         543070
7 Gigantua               636927

hgm · Post by **hgm** » Sun Jan 11, 2026 11:46 am

The key to doing fast perfts seems to be hashing. I remember Steven Edwards had made a fast perft program, and wanted to calculate perft(13) of the initial Chess position with it. We hed a pool for guessing what the result would be. After the calculation had been running a few months on Steven's machine, someone stepped in and calculated the result in a day. His method:

Do a perft(7) (or 8, I forgot) to generate and store all different leaf positions, counting of each how many times they occur. (So the depth of this initial perft depends a bit on how much you can store.) Then run a parft(6) (or 5) on each such position, multiply the result with how many times it occured in the initial perft, add... Done!

But this is just implementing by hand and disk usage what a hash table is trying to do in memory. If the table was large enough to hold all the visited position at depth 7, it would only do the remining perft(6) for the sub-tree once, and then finish the perft(13) like it was a perft(8) by getting hash cutoffs at each revisit.

Another lesson is that it would be OK to use the disk for a hash table at the initial levels of a deep perft. Where the time it takes to visit a new position (let alone the same position) is excessively long anyway, so that the time of a disk access is still an extremely small fraction of the time you would need to recalculate a result there.

petero2 · Post by **petero2** » Sun Jan 11, 2026 8:51 pm

phhnguyen wrote: ↑Sat Jan 10, 2026 11:25 pm 7) MPerft by Richard Delorme
https://github.com/abulmo/MPerft

The project was updated 6 years ago. I downloaded and compiled it via the make command.

It is the fastest and significantly faster than other programs in this comparison, 17 times faster than Stockfish. On the screen, it prints clearly that it uses hashing and multithreading.

Are you sure it is using multithreading? On my computer it says it is using hashing and bulk counting, but it doesn't say anything about multiple threads, and only uses one thread. Also looking at the source code I see nothing about threads in it.

On my 16 core AMD 7950X3D I get:

Code: Select all

my perft 32 threads   1393 ms
MPerft               13627 ms
my perft 1 thread    17651 ms

Sopel · Post by **Sopel** » Sun Jan 11, 2026 9:42 pm

FWIW on 7800x3d on windows I get ~300Mnps for SF 17.1 and 1650Mnps for gigantua on perft 7

phhnguyen · Post by **phhnguyen** » Mon Jan 12, 2026 10:59 am

hgm wrote: ↑Sun Jan 11, 2026 11:46 am The key to doing fast perfts seems to be hashing.

Agreed. Hashing is one of the largest gain methods for Perft.

However, from my experience as well as reading somewhere, hashing may give a factor of about 2 times speed. My computer has 8 cores, and multithreading could bring me a factor of 8 times faster

phhnguyen · Post by **phhnguyen** » Mon Jan 12, 2026 11:03 am

petero2 wrote: ↑Sun Jan 11, 2026 8:51 pm
phhnguyen wrote: ↑Sat Jan 10, 2026 11:25 pm 7) MPerft by Richard Delorme
https://github.com/abulmo/MPerft

The project was updated 6 years ago. I downloaded and compiled it via the make command.

It is the fastest and significantly faster than other programs in this comparison, 17 times faster than Stockfish. On the screen, it prints clearly that it uses hashing and multithreading.
Are you sure it is using multithreading? On my computer it says it is using hashing and bulk counting, but it doesn't say anything about multiple threads, and only uses one thread. Also looking at the source code I see nothing about threads in it.

You are right! It is my mistake! MPerft uses bulk counting and hashing only!

petero2 wrote: ↑Sun Jan 11, 2026 8:51 pm On my 16 core AMD 7950X3D I get:
Code: Select all
my perft 32 threads   1393 ms
MPerft               13627 ms
my perft 1 thread    17651 ms

Your Perft is so fast. Did you publish it?

phhnguyen · Post by **phhnguyen** » Mon Jan 12, 2026 11:30 am

Sopel wrote: ↑Sun Jan 11, 2026 9:42 pm FWIW on 7800x3d on windows I get ~300Mnps for SF 17.1 and 1650Mnps for gigantua on perft 7

Your information is ambiguous to me. SF doesn't print out both elapsed and speed when computing Perft. How can you get the speed of 300 Mbps? If it is the speed of normal search, we cannot use it to compare with Gigantua since (AFAK) it uses leave nodes from bulk counting, but not nodes by making/undoing to calculate Perft speed.

The best if you could do, please measure the time for calculating Perft 8 for both programs. SF doesn't print out elapsed, but you can easily add a few commands to SF code (e.g., such as: TimePoint startPoint = now();... elapsed = now() - startPoint;), recompile and run. You may find a 3rd party program to run it for timing, too. For Gigantua, you can download its 64-bit exe file from its GitHub and run it as below. It will calculate elapsed time itself:

Code: Select all

Gigantua.exe "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1" 8

FYI: I have just re-run some Perft programs from my PC after resetting. There is no chance for Gigantua on my old PC: it is still lagging behind SF. Hope it performs better on your PC.

hgm · Post by **hgm** » Mon Jan 12, 2026 1:29 pm

phhnguyen wrote: ↑Mon Jan 12, 2026 10:59 am
hgm wrote: ↑Sun Jan 11, 2026 11:46 am The key to doing fast perfts seems to be hashing.
Agreed. Hashing is one of the largest gain methods for Perft.

However, from my experience as well as reading somewhere, hashing may give a factor of about 2 times speed. My computer has 8 cores, and multithreading could bring me a factor of 8 times faster

Well, going down from a few months to roughly a day seems more than a factor 2 to me...

In reality there is no fixed factor, but that it depends on depth. What it does is reduce the effective branching factor. Could be that it reduced that only by a factor 2-3, but taken to the power of the depth that can grow huge. (The first 2 depths cannot benefit, as there are no transpositions there yet.)

E.g. for qperft with various hash-table sizes:

Code: Select all

depth       7            8            9
no hash  10.7 sec  295.1 sec
16MB      2.4 sec   41.7 sec
32MB      2.2 sec   36.3 sec
64MB      2.0 sec   30.9 sec
128MB     1.9 sec   27.3 sec
256MB     1.9 sec   25.3 sec   435 sec

So we gain more than a factor 10 on perft(8), and that this doesn't seem saturated yet w.r.t. tablesize. On perft(7) we gained a factor 5.

In general you would not care very much whether execution time goes down from 10 sec to 1 sec or 100msec. You would care a great deal though, whether the calculation would take a month or just a week. So it is especially the savings on the deep perfts that are relevant.

Ajedrecista · Post by **Ajedrecista** » Mon Jan 12, 2026 7:50 pm

Hello:

There are some threads from time to time regarding this topic. I remember this one:

What is a good perft speed?

Where I benchmarked some perft counters. YMMV depending on hardware, hash used and so on.

Regards from Spain.

Ajedrecista.

chessbit · Post by **chessbit** » Mon Jan 12, 2026 8:49 pm

Fyi, with hashing and parallel processing, chessbit is about 10 times faster. I'm curious how you compiled the code but I'm going to assume it's not using the highest optimization, because it was implemented with the intel compiler (best results by far).

It takes ~1s to do perft 8 on my machine (9800x3d).

Code: Select all

perft -d 8
Depth:          8
Nodes:          84998978956
Time:           1059 ms
Average:        80263 Mn/s

I was curious to try the other engines but I couldn't get even close to this result on them. Maybe I didn't compile correctly to get the best of them though... I just used exactly the same config as for my engine.

If you're interested, I have pushed the code including the TT and multithread here

A comparison of some Perft programs

A comparison of some Perft programs

Re: A comparison of some Perft programs

Re: A comparison of some Perft programs

Re: A comparison of some Perft programs

Re: A comparison of some Perft programs

Re: A comparison of some Perft programs

Re: A comparison of some Perft programs

Re: A comparison of some Perft programs

Re: A comparison of some perft programmes.

Re: A comparison of some Perft programs