I guess q2perft currently makes poor use of hash memory: It divides up the given memory into 8 sections, section 1-7 used for storing perft(2)-perft(8) results, while the 8th section is sub-sectioned to contain all higher depths (half of it for perft(9), one quarter of it for perft(10), 1/8 for perft(11) etc.). This means that for perft(7) only three of the eight sections are fully used, for storing perft(2)-perft(3) results. (Perft(7) needs some 70,000 perft(3) results, while even with 16MB there are 128K entries per section. So the other sections will remain virtually empty.)
One general remark on "perft performance": I think it would be better to measure NPS as "total visited nodes per second", not as "counted leaf nodes per second". A "visited node" would be a node where move generation is performed. That would certainly allow a fair comparison of different perft implementations on the same system platform (hardware+OS+...), even for different positions (which is impossible with "counted leaf nodes" due to the different branching factors at the level of frontier nodes where bulk-counting is applied). Also "hashing vs. non-hashing" would not influence the NPS number.
To go one step further, some kind of platform abstraction would be nice. frcperft does this based on the number of elapsed CPU cycles and prints "ticks/move" (but still based on "counted leaf nodes", see my remark above). Another approach would use some "perft speed coefficients" per system platform based on benchmarking, and would divide the absolute NPS by that coefficient.
Of course, one simple way to compare speed of two perft implementations is to compare their total time spent on perft(N) for the same position on the same system platform, if the limitations of this approach are accepted.
hgm wrote:OK, I made a Windows compile of the new qperft. It is now on my regular download page, replacing the qperft binary that was there before. Strange thing is that the Windows version is slower without hash (perft(7) takes 94 sec in stead of 75 sec), but it is faster with hash (233 sec vs 245 sec)!!!
Former qperft executable worked fine under my system; now, with your new recompile, it does not run (missing cygwin1.dll). What can I do? Thanks in advance.
Agreed, most nps figures by perfts that print those are pure nonsense, as they represent not nodes per second, but moves per second. And then sometimes even moves that were not made, but only counted.
Qperft does not make the final ply, because its original purpose was to time the move generator for an engine under realistic circumstances. (Just putting a loop around it to execute it a billion timeswas not very satisfactory, as very quickly the branch prediction learns to be perfect.) And in an engine the leaf nodes typically do a move generation, but do not make any of the generated moves (or they would not be leaf nodes).
hgm wrote:OK, I made a Windows compile of the new qperft. It is now on my regular download page, replacing the qperft binary that was there before. Strange thing is that the Windows version is slower without hash (perft(7) takes 94 sec in stead of 75 sec), but it is faster with hash (233 sec vs 245 sec)!!!
Former qperft executable worked fine under my system; now, with your new recompile, it does not run (missing cygwin1.dll). What can I do? Thanks in advance.
hgm wrote:OK, I made a Windows compile of the new qperft. It is now on my regular download page, replacing the qperft binary that was there before. Strange thing is that the Windows version is slower without hash (perft(7) takes 94 sec in stead of 75 sec), but it is faster with hash (233 sec vs 245 sec)!!!
Former qperft executable worked fine under my system; now, with your new recompile, it does not run (missing cygwin1.dll). What can I do? Thanks in advance.
Thank you very much for providing this link! Yesterday, I was unable to compile by myself (the system did not recognize gcc instruction and also did not find the source perft.c, which I copy in a Notepad and I changed its path trying that MinGW could find it, but it was impossible). It now runs without problem... but MUCH faster than older version of qperft. In my system (32-bit) JetChess is still a little faster (Perft(7) of starting position in ~ 8.7 seconds while qperft counts it in ~ 10 seconds). It is a huge improvement. I am curious on seeing multi-core perft counters (hopefully meleechess will support this in the near future).
Ajedrecista wrote:Former qperft executable worked fine under my system; now, with your new recompile, it does not run (missing cygwin1.dll). What can I do? Thanks in advance.
Oops! Forgot to include the -mno-cygwin flag in the compile command. I fixed that now, so that the version on my web page is now Cygwin-independent again.
The first perft(10) I tried on my Core-2-Duo (for the position after 1. h3 h6) took 4:20. Still too slow to compete with Steven Edward's monster; I really need that extra factor 10.
When Perft has to be a tool for to test several qualities of a move generator,
it should not only count leafs, but also provide relevant local information.
Because speed is not the only intension of writing a move generator.
Just for seeing the difference between processors, I run Perft(9) of the initial position with JetChess, using 1024 MB of hash in an Intel Core i5-760 (32-bit OS) of the university; remember that JetChess uses only one core. With my Pentium D930, the same Perft(9) run with the same hash size took 2720.81 seconds, and now 761.432 seconds (0:12:41.432)! If I divide the number of nodes by the spent time (I know that it is not the best way to measure speed according to Sven Schüle and others) I get around 3203.87 MN/s... the difference is more than obvious.
[d]
2,439,530,234,167 (move pathes after 9 half moves).