My Perft Results

RedBedHed · Post by **RedBedHed** » Wed Dec 07, 2022 7:52 pm

JoAnnP38 wrote: ↑Thu Nov 03, 2022 7:44 pm In writing a chess engine for mostly the first time, I finally have my move generation somewhat complete and so I threw together a Perft test to compare my results with others. (Thanks, by the way for those who were nice enough to post node counts for others to compare.) Here are some of my timings for tests up to 6 ply and I was hoping that someone could weigh in on whether I am on the right track or whether I still have a lot of optimization or redesign to do. Currently, I am writing my engine in C#. It is single threaded running on an AMD Ryzen 5 4500U processor. My design currently uses a simple 64 element array as the board, and I am using piece lists to iterate through the pieces during move generation. For simplicity's sake, I am currently dynamically allocating a list to hold all of the generated moves, and this happens on every call to GenerateMoves. I suspect this needs to be changed to use a single array that is shared by all moves to eliminate the heap allocations. But I don't have concrete evidence of that yet.

Results:
1: Elapsed = 00:00:00.0000166
2: Elapsed = 00:00:00.0002136
3: Elapsed = 00:00:00.0286717
4: Elapsed = 00:00:00.1139569
5: Elapsed = 00:00:02.8700864
6: Elapsed = 00:01:01.1239183

While the node counts are all "correct" for these tests, I have to admit I am surprised by the exponential explosion in timings, especially when going from 4 to 5 or 6 ply. All constructive comments welcome.

Hey! Sorry I'm a little late, but congrats on your move generator!

Here are the bulk counted, single-threaded, no-hashing perft results for my move generator, Charon, (running in WSL on an 11th gen intel i7):

perft(1) - 0.000 seconds - 20 nodes visited.
perft(2) - 0.000 seconds - 400 nodes visited.
perft(3) - 0.000 seconds - 8902 nodes visited.
perft(4) - 0.001 seconds - 197281 nodes visited.
perft(5) - 0.010 seconds - 4865609 nodes visited.
perft(6) - 0.244 seconds - 119060324 nodes visited.

Charon uses a few patterns from Stockfish and many techniques from Hacker's Delight. It generates strictly legal moves. It is a bitboard generator. It uses PEXT on new intel processors and Magic otherwise for sliding attacks.

Although you may have already fixed this, I do have to say that dynamic allocation, when used repetitively, is very slow. It usually requires communication with your OS, as well as a linear search to find space on the heap. You want to ensure that you only dynamically allocate at the beginning of perft, if possible.

Best of luck in your endeavors!

RedBedHed · Post by **RedBedHed** » Wed Dec 07, 2022 9:59 pm

https://github.com/RedBedHed/Charon/blob/main/test.sh

Here is the test suite that I used with Charon. It is a hodge-podge of positions suggested by programmers on this forum. It has 162 positions!

gaard · Post by **gaard** » Thu Dec 08, 2022 4:43 am

RedBedHed wrote: ↑Wed Dec 07, 2022 9:59 pm https://github.com/RedBedHed/Charon/blob/main/test.sh

Here is the test suite that I used with Charon. It is a hodge-podge of positions suggested by programmers on this forum. It has 162 positions!

I will give that a go. From my past experience, it has been impossible to beat https://github.com/elcabesa/vajolet/blo ... /perft.txt

I believe there are some duplicates in there, but they should be easy to filter. The (stale)mated positions have caught me more than once.

JoAnnP38 · Post by **JoAnnP38** » Wed Feb 22, 2023 10:45 am

It has been a while since I have posted to this thread, but while I was waiting for convergence on my evaluation tuning (more on this in another thread), I decided to try my hand porting over enough of the pedantic move logic to C++ so that I could get a first hand look at how much performance C++ offers over C# (at least with the way I code.) Keep in mind that it has been a while since I have programmed in C++. I believe for any serious work I was using Bordland or Zortech C++. Yes, that was a long time ago. The second edition of Bjarne Stroustrup's brown book hadn't been out long and compilers were just starting to replace AT&T cfront. So yeah, fuddy-duddy here. But WOW!!! -- C++ has really grown up! It may take a couple of years for me to really explore all the ins and out of it, but I still knew enough to port my C# code. The last timings I posted for C# were:

Code: Select all

1: Elapsed = 00:00:00.0011450, Mnps: 0.02, nodes = 20
2: Elapsed = 00:00:00.0003211, Mnps: 1.25, nodes = 400
3: Elapsed = 00:00:00.0027505, Mnps: 3.24, nodes = 8902
4: Elapsed = 00:00:00.0625392, Mnps: 3.15, nodes = 197281
5: Elapsed = 00:00:00.3371379, Mnps: 14.43, nodes = 4865609
6: Elapsed = 00:00:06.9856145, Mnps: 17.04, nodes = 119060324
7: Elapsed = 00:03:11.5586089, Mnps: 16.68, nodes = 3195901860

That was the most juice I could squeeze out of my C# code. After porting to C++ here are my new timings:

Code: Select all

1: Elapsed = 00:00:00.000, Mnps:    inf, nodes: 20
2: Elapsed = 00:00:00.000, Mnps:    inf, nodes: 400
3: Elapsed = 00:00:00.000, Mnps:    inf, nodes: 8902
4: Elapsed = 00:00:00.006, Mnps:  32.88, nodes: 197281
5: Elapsed = 00:00:00.149, Mnps:  32.66, nodes: 4865609
6: Elapsed = 00:00:03.547, Mnps:  33.57, nodes: 119060324
7: Elapsed = 00:01:35.160, Mnps:  33.58, nodes: 3195901860

So that looks to be about a 100% improvement in Mnps. Not bad. Especially since I haven't explored too much of how to use templates and compile time eval to create faster versions of the code. I did a little of that. BTW, the new timings are on my new desktop which is using an AMD 6900HX CPU while the older (C#) timings were on an AMD 4500U. But everything else is pretty much identical -- single-threaded, magic bitboards, pseudo-legal move generation (so no bulk counting), and no tricks (i.e., no transposition tables or multithreading.) While the speedup is nice, I would probably need a lot more performance to affect my ELO much. I'm still trying to figure out how some people are getting such incredible performance. I may have to break down and start digging through the stockfish code or the code for some other high-performance engine to better understand where I am creating bottlenecks.

The C++ code is posted here: https://github.com/JoAnnP38/captare

Iketh · Post by **Iketh** » Wed Mar 08, 2023 2:19 am

A 6900HX is a substantial performance jump over 4500u. Off the top of my head, I’d estimate a 50% increase for a single thread.

My Perft Results

Re: My Perft Results

Re: My Perft Results

Re: My Perft Results

Re: My Perft Results

Re: My Perft Results