Page 1 of 2
A perft faster than qperft?!
Posted: Fri Apr 25, 2008 1:07 am
by Allard Siemelink
I finally got around to adding 'bulk counting' to my old (non hashing) perft code.
Amazingly it turns out to be ~2x faster than qperft on my 32-bit opteron@2500:
Code: Select all
>qperft 7
Quick Perft by H.G. Muller
Perft mode: No hashing, bulk counting in horizon nodes
perft(1)=20 ( 0.000 sec)
perft(2)=400 ( 0.000 sec)
perft(3)=8902 ( 0.000 sec)
perft(4)=197281 ( 0.015 sec)
perft(5)=4865609 ( 0.156 sec)
perft(6)=119060324 ( 3.922 sec)
perft(7)=3195901860 (102.016 sec)
>i-perft 7
perft 1 20 0.00s 1.$ mnps 851.4 ticks/op
perft 2 400 0.00s 1.$ mnps 48.8 ticks/op
perft 3 8902 0.00s 1.$ mnps 33.5 ticks/op
perft 4 197281 0.02s 12.3 mnps 33.3 ticks/op
perft 5 4865609 0.06s 78.5 mnps 32.5 ticks/op
perft 6 119060324 1.56s 76.2 mnps 33.3 ticks/op
perft 7 3195901860 41.36s 77.3 mnps 32.7 ticks/op
>qperft 6 "r3r1k1/1pq2pp1/2p2n2/1PNn4/2QN2b1/6P1/3RPP2/2R3KB b - -"
perft(1)=56 ( 0.000 sec)
perft(2)=2594 ( 0.000 sec)
perft(3)=137198 ( 0.000 sec)
perft(4)=6391735 ( 0.171 sec)
perft(5)=323787902 ( 7.797 sec)
perft(6)=15097513050 (389.578 sec)
>i-perft 6 "r3r1k1/1pq2pp1/2p2n2/1PNn4/2QN2b1/6P1/3RPP2/2R3KB b - -"
perft 1 56 0.00s 1.$ mnps 296.5 ticks/op
perft 2 2594 0.00s 1.$ mnps 30.9 ticks/op
perft 3 137198 0.00s 1.$ mnps 18.7 ticks/op
perft 4 6391735 0.06s 103.1 mnps 20.4 ticks/op
perft 5 323787902 2.41s 134.5 mnps 18.8 ticks/op
perft 6 15097513050 124.05s 121.7 mnps 20.8 ticks/op
Source and exe are available from
http://brightchess.googlepages.com (it is the last item in the download section)
Re: A perft faster than qperft?!
Posted: Fri Apr 25, 2008 1:37 am
by Dann Corbit
Allard Siemelink wrote:I finally got around to adding 'bulk counting' to my old (non hashing) perft code.
Amazingly it turns out to be ~2x faster than qperft on my 32-bit opteron@2500:
Code: Select all
>qperft 7
Quick Perft by H.G. Muller
Perft mode: No hashing, bulk counting in horizon nodes
perft(1)=20 ( 0.000 sec)
perft(2)=400 ( 0.000 sec)
perft(3)=8902 ( 0.000 sec)
perft(4)=197281 ( 0.015 sec)
perft(5)=4865609 ( 0.156 sec)
perft(6)=119060324 ( 3.922 sec)
perft(7)=3195901860 (102.016 sec)
>i-perft 7
perft 1 20 0.00s 1.$ mnps 851.4 ticks/op
perft 2 400 0.00s 1.$ mnps 48.8 ticks/op
perft 3 8902 0.00s 1.$ mnps 33.5 ticks/op
perft 4 197281 0.02s 12.3 mnps 33.3 ticks/op
perft 5 4865609 0.06s 78.5 mnps 32.5 ticks/op
perft 6 119060324 1.56s 76.2 mnps 33.3 ticks/op
perft 7 3195901860 41.36s 77.3 mnps 32.7 ticks/op
>qperft 6 "r3r1k1/1pq2pp1/2p2n2/1PNn4/2QN2b1/6P1/3RPP2/2R3KB b - -"
perft(1)=56 ( 0.000 sec)
perft(2)=2594 ( 0.000 sec)
perft(3)=137198 ( 0.000 sec)
perft(4)=6391735 ( 0.171 sec)
perft(5)=323787902 ( 7.797 sec)
perft(6)=15097513050 (389.578 sec)
>i-perft 6 "r3r1k1/1pq2pp1/2p2n2/1PNn4/2QN2b1/6P1/3RPP2/2R3KB b - -"
perft 1 56 0.00s 1.$ mnps 296.5 ticks/op
perft 2 2594 0.00s 1.$ mnps 30.9 ticks/op
perft 3 137198 0.00s 1.$ mnps 18.7 ticks/op
perft 4 6391735 0.06s 103.1 mnps 20.4 ticks/op
perft 5 323787902 2.41s 134.5 mnps 18.8 ticks/op
perft 6 15097513050 124.05s 121.7 mnps 20.8 ticks/op
Source and exe are available from
http://brightchess.googlepages.com (it is the last item in the download section)
I wonder if it would get faster with hashing.
It is not necessarily the case, if the calculation is so fast that a hash lookup is not faster.
Symbolic movepath enumeration
Posted: Fri Apr 25, 2008 3:51 am
by sje
Symbolic on a 3 GHz P4, 32 bit, no transposition table:
Code: Select all
[] emp 1
Pathway count to depth one: 20
[CT] [C: 00.00 U: 00.00]
[] emp 2
Pathway count to depth two: 400
[CT] [C: 00.00 U: 00.00]
[] emp 3
Pathway count to depth three: 8,902
[CT] [C: 00.03 U: 00.03]
[] emp 4
Pathway count to depth four: 197,281
[CT] [C: 00.03 U: 00.03]
[] emp 5
Pathway count to depth five: 4,865,609
[CT] [C: 00.33 U: 00.33]
[] emp 6
Pathway count to depth six: 119,060,324
[CT] [C: 07.92 U: 07.92]
[] emp 7
Pathway count to depth seven: 3,195,901,860
[CT] [C: 03:22.79 U: 03:22.79]
Re: Symbolic movepath enumeration
Posted: Fri Apr 25, 2008 3:56 am
by sje
Symbolic on a 3 GHz P4, 32 bit, small transposition table:
Code: Select all
[] emp 1
Pathway count to depth one: 20
[CT] [C: 00.00 U: 00.00]
[] emp 2
Pathway count to depth two: 400
[CT] [C: 00.00 U: 00.00]
[] emp 3
Pathway count to depth three: 8,902
[CT] [C: 00.03 U: 00.03]
[] emp 4
Pathway count to depth four: 197,281
[CT] [C: 00.04 U: 00.04]
[] emp 5
Pathway count to depth five: 4,865,609
[CT] [C: 00.21 U: 00.21]
[] emp 6
Pathway count to depth six: 119,060,324
[CT] [C: 03.16 U: 03.16]
[] emp 7
Pathway count to depth seven: 3,195,901,860
[CT] [C: 01:03.76 U: 01:03.76]
Re: A perft faster than qperft?!
Posted: Fri Apr 25, 2008 11:02 am
by Gerd Isenberg
Dann Corbit wrote:
I wonder if it would get faster with hashing.
It is not necessarily the case, if the calculation is so fast that a hash lookup is not faster.
Kind of exponential smaller tree size due to hashing transpositions versus linear slowdown due to latency has likely some break even point, where both approaches take about equal solution time. The more transpositions are possible with depth > 2 (e.g. e4 c6 d4 versus d4 c6 e4), the more the hash with reasonable replacement scheme pays off.
Re: A perft faster than qperft?!
Posted: Fri Apr 25, 2008 1:11 pm
by hgm
Did you use the qperft.exe on my website, or did you compile the source in the same way as your own? The reason I ask is that I noticed that gcc / cygwin, (I used to build qperft.exe) apparently does a quite poor job on my engines. Denis Mendoze made a compile for me of uMax that was 80% faster than my own gcc compile. Joker, which is more like qperft, did gain less, but still some 20%.
As to the hashing:
The issue is really if you should hash the frontier nodes (which only do the counting). Hashing interior nodes is always competative. Hashing the frontier nodes in the main hash table actually made qperft slower. The final solution I took was hashing the frontier nodes in a very small dedicated hash table, comparable to an eval cache, fitting entirely in L2. (2048 cache lines of 64 bytes, i.e. 128KB or 14K entries.)
Although the hit rate dropped from 60% to 30%, this gave a significant speedup, as the hash probes now go at the L2 access time, and thus are nearly free.
Re: A perft faster than qperft?!
Posted: Fri Apr 25, 2008 6:45 pm
by Allard Siemelink
hgm wrote:Did you use the qperft.exe on my website, or did you compile the source in the same way as your own? The reason I ask is that I noticed that gcc / cygwin, (I used to build qperft.exe) apparently does a quite poor job on my engines. Denis Mendoze made a compile for me of uMax that was 80% faster than my own gcc compile. Joker, which is more like qperft, did gain less, but still some 20%.
Hi HG, I used both; the exe from your website (quoted above) being slightly faster then my own qperft compile (gcc/mingw).
Admittedly I used gcc's pgo for i-perft, but I gained only 1%.
80% gain sounds incredible, perhaps Dennis should compile both perfts and compare?
Also, according to an old thread, qperft seems to do much better on core 2 duo processors.
Since I do not own one, I wonder if i-perft would still be faster on a core 2 duo.
Perhaps someone (you?) could post some c2d results?
Re: A perft faster than qperft?!
Posted: Fri Apr 25, 2008 8:48 pm
by hgm
Code: Select all
>qperft 7
perft(1)= 20 ( 0.000 sec)
perft(2)= 400 ( 0.000 sec)
perft(3)= 8902 ( 0.000 sec)
perft(4)= 197281 ( 0.000 sec)
perft(5)= 4865609 ( 0.078 sec)
perft(6)= 119060324 ( 1.435 sec)
perft(7)= 3195901860 (37.721 sec)
>i-perft 7
perft 1 20 0.00s 1.$ mnps 2145.6 ticks/op
perft 2 400 0.00s 1.$ mnps 51.3 ticks/op
perft 3 8902 0.00s 1.$ mnps 35.2 ticks/op
perft 4 197281 0.00s 1.$ mnps 35.4 ticks/op
perft 5 4865609 0.08s 62.4 mnps 34.5 ticks/op
perft 6 119060324 1.17s 101.8 mnps 23.6 ticks/op
perft 7 3195901860 30.39s 105.2 mnps 22.8 ticks/op
>qperft 6 "r3r1k1/1pq2pp1/2p2n2/1PNn4/2QN2b1/6P1/3RPP2/2R3KB b - -"
perft(1)= 56 ( 0.000 sec)
perft(2)= 2594 ( 0.000 sec)
perft(3)= 137198 ( 0.000 sec)
perft(4)= 6391735 ( 0.078 sec)
perft(5)= 323787902 ( 2.371 sec)
perft(6)= 15097513050 (123.833 sec)
>i-perft 6 "r3r1k1/1pq2pp1/2p2n2/1PNn4/2QN2b1/6P1/3RPP2/2R3KB b - -"
perft 1 56 0.00s 1.$ mnps 867.4 ticks/op
perft 2 2594 0.00s 1.$ mnps 33.0 ticks/op
perft 3 137198 0.00s 1.$ mnps 19.7 ticks/op
perft 4 6391735 0.05s 136.0 mnps 22.4 ticks/op
perft 5 323787902 1.89s 171.5 mnps 13.9 ticks/op
perft 6 15097513050 96.20s 156.9 mnps 15.3 ticks/op
This is for Core 2 Duo 2.4GHz. Yours is still faster, but the gap is not nearly as big.
Re: A perft faster than qperft?!
Posted: Fri Apr 25, 2008 11:55 pm
by tvrzsky
This is on Pentium Dual Core E2140 overclocked to 2.4 GHz, qperft compiled in Microsoft Visual C++ 2005 Express Edition:
Code: Select all
Quick Perft by H.G. Muller
Perft mode: No hashing, bulk counting in horizon nodes
perft(1)= 20 ( 0.000 sec)
perft(2)= 400 ( 0.000 sec)
perft(3)= 8902 ( 0.000 sec)
perft(4)= 197281 ( 0.000 sec)
perft(5)= 4865609 ( 0.062 sec)
perft(6)= 119060324 ( 1.172 sec)
perft(7)= 3195901860 (31.344 sec)
i-perft 1.0 (c) 2006-2008 AJ Siemelink
perft 1 20 0.00s 1.$ mnps 6414.4 ticks/op
perft 2 400 0.00s 1.$ mnps 91.9 ticks/op
perft 3 8902 0.00s 1.$ mnps 29.2 ticks/op
perft 4 197281 0.01s 13.2 mnps 24.3 ticks/op
perft 5 4865609 0.05s 103.5 mnps 22.9 ticks/op
perft 6 119060324 1.14s 104.3 mnps 23.1 ticks/op
perft 7 3195901860 30.16s 106.0 mnps 22.7 ticks/op
qperft.exe 6 "r3r1k1/1pq2pp1/2p2n2/1PNn4/2QN2b1/6P1/3RPP2/2R3KB b - -"
Quick Perft by H.G. Muller
Perft mode: No hashing, bulk counting in horizon nodes
perft(1)= 56 ( 0.000 sec)
perft(2)= 2594 ( 0.000 sec)
perft(3)= 137198 ( 0.000 sec)
perft(4)= 6391735 ( 0.047 sec)
perft(5)= 323787902 ( 2.000 sec)
perft(6)= 15097513050 (102.515 sec)
i-perft.exe 6 "r3r1k1/1pq2pp1/2p2n2/1PNn4/2QN2b1/6P1/3RPP2/2R3KB b - -"
i-perft 1.0 (c) 2006-2008 AJ Siemelink
perft 1 56 0.00s 1.$ mnps 2786.0 ticks/op
perft 2 2594 0.00s 1.$ mnps 121.3 ticks/op
perft 3 137198 0.00s 1.$ mnps 13.4 ticks/op
perft 4 6391735 0.03s 206.2 mnps 14.9 ticks/op
perft 5 323787902 1.86s 174.1 mnps 13.7 ticks/op
perft 6 15097513050 95.16s 158.7 mnps 15.2 ticks/op
Re: A perft faster than qperft?!
Posted: Sat Apr 26, 2008 12:33 pm
by hgm
So they get very close indeed. I seem to remember Allard told me that his move generator was very close to the mailbox (16x12) I use in qperft. Is that tstill the same move generator as is used in i-perft?
I have no idea why it performs so relatively poorly on Opteron. Might have to do with an unfortunate layout of the global data, and the small number of cache ways on AMD machines.