hgm wrote:I guess qperft is slowed down by the fact that I only put effort in optimizing the regular move generator, and no effort in calculating the perft fast. It is true that it does bulk counting in the final ply, but it does that by measuring the size of the move stack (top minus start), but it still fills the move stack with encoded moves, just like it would do on any ply, while this data is never used. So it wastes time on encoding and storing the moves,while all it really has to do is increment the stack pointer (or another counter).
When I cut away that dead wood, by making a special move generator for the last ply that directly counts, rather than pushing moves on the stack, I gain 25% speed (the perft(7) time drops from 102.01 sec to 75.42 sec).
By being clever, rather than just fast, I am pretty sure I could reduce that by another factor 10.
Line 662 flag=0; partially disables bulk counting. If I remove that line and recompile, qpertf is faster than jetchess. As you say, with a special move counting routine when depth=1, like frc-perft, the program would be even faster.
qperft (recompiled with your idea) could be faster than JetChess? How much faster? Please upload an executable with your change (not everybody knows how to compile... ), I want to try it. Gracias por adelantado.
qperft (recompiled with your idea) could be faster than JetChess? How much faster? Please upload an executable with your change (not everybody knows how to compile... ), I want to try it. Gracias por adelantado.
Pablo Vazquez wrote:Line 662 flag=0; partially disables bulk counting. If I remove that line and recompile, qpertf is faster than jetchess. As you say, with a special move counting routine when depth=1, like frc-perft, the program would be even faster.
Indeed, I discovered that too. (It disables the bulk counting completely, btw.) It was something I had put in for debugging at one time. I had not realized it was in the published source. I am not sure if the Windows binary I have on the web has this bug too, or if that was compiled before I put in that 'flag=0;'.
I have posted my latest version, which does not have the flag=0;, and uses separate routines (leaf_perft and move_count in stead of perft and move_gen) for the final ply (making it 25% faster). It might not optimally use the hash table for low-depth perfts, as it uses a sectioning of it optimized to calculate perft(13).
I am currently on Linux; when I get back to Windows I will make a (32-bit) Windows binary. I always use optimizationlevel -O2, btw (gcc -O2 -s q2perft.c -o qperft.exe)
I am sorry to say that frc-perft is MUCH slower than JetChess in my computer: while Perft(7) of starting position takes less than 9 seconds with JetChess, it took 80.89 seconds (!) with frc-perft. I do not know if I did something wrong. All the counters I have seen are good, but I prefer JetChess right now.
Thanks for explaining how to compile. Surely, I will be unable of doing it correctly (I will try it...), so probably I will have to wait Muller compiles (thank you Muller!). Pablo, you were very accurate discovering the issue of flag=0, congrats!
Pablo Vazquez wrote:Line 662 flag=0; partially disables bulk counting. If I remove that line and recompile, qpertf is faster than jetchess. As you say, with a special move counting routine when depth=1, like frc-perft, the program would be even faster.
Indeed, I discovered that too. (It disables the bulk counting completely, btw.) It was something I had put in for debugging at one time. I had not realized it was in the published source. I am not sure if the Windows binary I have on the web has this bug too, or if that was compiled before I put in that 'flag=0;'.
I have posted my latest version, which does not have the flag=0;, and uses separate routines (leaf_perft and move_count in stead of perft and move_gen) for the final ply (making it 25% faster). It might not optimally use the hash table for low-depth perfts, as it uses a sectioning of it optimized to calculate perft(13).
I am currently on Linux; when I get back to Windows I will make a (32-bit) Windows binary. I always use optimizationlevel -O2, btw (gcc -O2 -s q2perft.c -o qperft.exe)
I think that even with the flag=0; thing, it was doing some bulk counting with the following code:
I am sorry to say that frc-perft is MUCH slower than JetChess in my computer: while Perft(7) of starting position takes less than 9 seconds with JetChess, it took 80.89 seconds (!) with frc-perft. I do not know if I did something wrong. All the counters I have seen are good, but I prefer JetChess right now.
Thanks for explaining how to compile. Surely, I will be unable of doing it correctly (I will try it...), so probably I will have to wait Muller compiles (thank you Muller!). Pablo, you were very accurate discovering the issue of flag=0, congrats!
Regards from Spain.
Ajedrecista.
Hi Jesús, that's strange. Maybe you are using a 32-bit computer, that would explain some slowdown, but not that much. Also, remember that frcperft doesn't use a hash table, so you would have to disable it in other programs for a fair comparison.
It is true that I use 32-bit system, but the slowdown is more than noticeable. JetChess has a minimum of 64 MB of hash that can not be disabled. Running qperft without hash brought me similar results than frc-perft (maybe a little better or a little worse, I do not remember). I recommend you JetChess, just for seeing how it works (but it has not got source code). ¡Gracias por todo igualmente!
That is not what I considered 'bulk counting'; it is just normal one-by-one counting. The 'goto quick' skips making / unmaking the last ply for most moves, which is an optimization independent from bulk counting. (Only King moves and special moves get made. King moves to test if they wander into check, castlings to also see if the king crosses an attacked square, e.p.captures to test for the famous double-pin, and promotions to make sure they are made and counted four times. Some of that could still be optimized further, e.g. the promotions could simply be counted for four without making them, but they are too rare to bother. In perft(7) there are no promotions at all.)
OK, I made a Windows compile of the new qperft. It is now on my regular download page, replacing the qperft binary that was there before. Strange thing is that the Windows version is slower without hash (perft(7) takes 94 sec in stead of 75 sec), but it is faster with hash (233 sec vs 245 sec)!!!