yet another attempt on Perft(14)

ankan · Post by **ankan** » Sat Aug 13, 2016 8:08 pm

Here is my latest attempt in computing perft(14) using my GPU perft program:

Perft&#40;03&#41;&#58;                8902, time&#58;    0.201 s
Perft&#40;04&#41;&#58;              197281, time&#58;    0.001 s
Perft&#40;05&#41;&#58;             4865609, time&#58;    0.001 s
Perft&#40;06&#41;&#58;           119060324, time&#58;    0.002 s
Perft&#40;07&#41;&#58;          3195901860, time&#58;    0.016 s
Perft&#40;08&#41;&#58;         84998978956, time&#58;    0.206 s
Perft&#40;09&#41;&#58;       2439530234167, time&#58;    1.414 s
Perft&#40;10&#41;&#58;      69352859712417, time&#58;   15.278 s
Perft&#40;11&#41;&#58;    2097651003696806, time&#58;  179.091 s
Perft&#40;12&#41;&#58;   62854969236701747, time&#58;  2932.64 s
Perft&#40;13&#41;&#58; 1981066775000396239, time&#58;  42792.7 s

..b1-a3     1988096314966752669
..g1-f3     2585178594020547799
..b1-c3     2638654908094167513
..a2-a3     1707945110805813310
..g1-h3     2022544576756325295
..b2-b3     2286406377370528139
..c2-c3     2666334464745721380
..f2-f3     1359948551007941806
..d2-d3     4459055881158216540
..g2-g3     2388698250037891063
..h2-h3     1691016253163166371
..a2-a4     2493115507450124588
..b2-b4     2297640117530334747
..e2-e3     7478732807317823667
..c2-c4     3067868595132318179
..f2-f4     1905689484049474095
..g2-g4     2050748957027119617
..h2-h4     2546461593049574382
..d2-d4     6636612996377425812
..e2-e4     7614272181524252025
Perft&#40;14&#41;&#58;61885021521585518997, time&#58;   795802 s

Complete output of the program (divided perfts till first 4 levels) is here:
https://raw.githubusercontent.com/ankan ... erft14.txt

After Steven Edwards reported mismatches due to false transposition table positives caused by 64 bit hash signatures, I updated my program to use 128 bit hashes. I also made some performance improvements and changes to utilize multiple GPUs in the machine if available. This run was performed on a machine with 3 Nvidia Titan X GPUs.

Instead of Steven's approach of computing perft(7) of all 96400068 unique(7) positions, I decided to compute perft using the regular method which even though is slightly slower it has the advantage of obtaining divided counts.

Unfortunately my result doesn't match the value reported by Peter Österlund back in 2013. With 128 bit hashes, it's unlikely to be caused by a false positive/hash collision but I can't rule out bugs in my program or random bit flips due to hardware failure/instability.

I am going to try one more time using unique(8) or unique(9) positions - or use the same regular method but using different set of random zobrist keys to rule out possibility of hardware instability.

Dann Corbit · Post by **Dann Corbit** » Sat Aug 13, 2016 8:47 pm

ankan wrote:Here is my latest attempt in computing perft(14) using my GPU perft program:
Code: Select all
Perft&#40;03&#41;&#58;                8902, time&#58;    0.201 s
Perft&#40;04&#41;&#58;              197281, time&#58;    0.001 s
Perft&#40;05&#41;&#58;             4865609, time&#58;    0.001 s
Perft&#40;06&#41;&#58;           119060324, time&#58;    0.002 s
Perft&#40;07&#41;&#58;          3195901860, time&#58;    0.016 s
Perft&#40;08&#41;&#58;         84998978956, time&#58;    0.206 s
Perft&#40;09&#41;&#58;       2439530234167, time&#58;    1.414 s
Perft&#40;10&#41;&#58;      69352859712417, time&#58;   15.278 s
Perft&#40;11&#41;&#58;    2097651003696806, time&#58;  179.091 s
Perft&#40;12&#41;&#58;   62854969236701747, time&#58;  2932.64 s
Perft&#40;13&#41;&#58; 1981066775000396239, time&#58;  42792.7 s

..b1-a3     1988096314966752669
..g1-f3     2585178594020547799
..b1-c3     2638654908094167513
..a2-a3     1707945110805813310
..g1-h3     2022544576756325295
..b2-b3     2286406377370528139
..c2-c3     2666334464745721380
..f2-f3     1359948551007941806
..d2-d3     4459055881158216540
..g2-g3     2388698250037891063
..h2-h3     1691016253163166371
..a2-a4     2493115507450124588
..b2-b4     2297640117530334747
..e2-e3     7478732807317823667
..c2-c4     3067868595132318179
..f2-f4     1905689484049474095
..g2-g4     2050748957027119617
..h2-h4     2546461593049574382
..d2-d4     6636612996377425812
..e2-e4     7614272181524252025
Perft&#40;14&#41;&#58;61885021521585518997, time&#58;   795802 s
Complete output of the program (divided perfts till first 4 levels) is here:
https://raw.githubusercontent.com/ankan ... erft14.txt

After Steven Edwards reported mismatches due to false transposition table positives caused by 64 bit hash signatures, I updated my program to use 128 bit hashes. I also made some performance improvements and changes to utilize multiple GPUs in the machine if available. This run was performed on a machine with 3 Nvidia Titan X GPUs.

Instead of Steven's approach of computing perft(7) of all 96400068 unique(7) positions, I decided to compute perft using the regular method which even though is slightly slower it has the advantage of obtaining divided counts.

Unfortunately my result doesn't match the value reported by Peter Österlund back in 2013. With 128 bit hashes, it's unlikely to be caused by a false positive/hash collision but I can't rule out bugs in my program or random bit flips due to hardware failure/instability.

I am going to try one more time using unique(8) or unique(9) positions - or use the same regular method but using different set of random zobrist keys to rule out possibility of hardware instability.

Re: "Unfortunately my result doesn't match the value reported by Peter Österlund back in 2013."

It is also possible the original 2013 result is wrong.
If the SJE result matches perfectly when completed, then we will have a better reference.

Dann Corbit · Post by **Dann Corbit** » Sat Aug 13, 2016 8:52 pm

Sorting the divide by mobility, there is an interesting result:

7614272181524252025 : e2-e4
7478732807317823667 : e2-e3
6636612996377425812 : d2-d4
4459055881158216540 : d2-d3
3067868595132318179 : c2-c4
2666334464745721380 : c2-c3
2638654908094167513 : b1-c3
2585178594020547799 : g1-f3
2546461593049574382 : h2-h4
2493115507450124588 : a2-a4
2388698250037891063 : g2-g3
2297640117530334747 : b2-b4
2286406377370528139 : b2-b3
2050748957027119617 : g2-g4
2022544576756325295 : g1-h3
1988096314966752669 : b1-a3
1905689484049474095 : f2-f4
1707945110805813310 : a2-a3
1691016253163166371 : h2-h3
1359948551007941806 : f2-f3

e4 and d4, the most important (at least most frequently played moves) are one and three in the ranking.

I wonder if e3 is also excellent due to the mobility it offers, and is being overlooked.
Similarly for 1.d3

It appears that 1.c4 also offers excellent mobility

And as far as mobility goes, 1.f3 really stinks

Edmund · Post by **Edmund** » Sat Aug 13, 2016 9:10 pm

Impressive speed. Assuming a branching factor of 32.5 it would take you "only" roughly 10 months for the perft 15 numbers. Do you consider there are still efficiency gains possible with your software? If so, by what factor?

Dann Corbit · Post by **Dann Corbit** » Sat Aug 13, 2016 9:29 pm

Edmund wrote:Impressive speed. Assuming a branching factor of 32.5 it would take you "only" roughly 10 months for the perft 15 numbers. Do you consider there are still efficiency gains possible with your software? If so, by what factor?

If he had 20 boxes with 10 cards each, he could snap it off in a jiffy.

I would like to see statistics about the move types generated.
What is the percentage of captures at each phase?
What is the percentage of e.p. captures at each phase?
What is the number of checkmates at each phase?
How many single reply moves are there at each phase?

ankan · Post by **ankan** » Sat Aug 13, 2016 9:43 pm

Edmund wrote:Impressive speed. Assuming a branching factor of 32.5 it would take you "only" roughly 10 months for the perft 15 numbers. Do you consider there are still efficiency gains possible with your software? If so, by what factor?

The other approach of first listing down unique positions (and occurrence counts) till certain depth and then computing perfts of those unique positions for remaining depth should be much faster depending on for what depth the unique positions are computed.
This is the method Peter used in his original Perft(14) computation. He managed to compute unique(11) - which is more than 726 billion positions! followed by those many perft(3) computations. I don't have access to hard disks big enough to even store so many positions.

Steven's Perft(14) project is also using the same method but with just ~96 million unique(7) positions - followed by perft(7) of all those positions.

For my program, I don't expect much gains from unique(7) positions (as my regular transposition tables are already big enough to hold most of them) but with unique(8) or higher, the gains could be much more.

In my program, I never clear the transposition tables (I have separate transposition tables for most levels and they get overwritten whenever a newer position at same slot is found). Which means for my program it also depends on how the unique positions are sorted. If nearby positions in the list of unique positions are 'close' to each other (i.e, expected to result in duplicate positions in a few plies), transposition table hit rate would be better when computing perfts of these positions. I am trying to figure out what is the best way to sort a given set of positions for best transposition table utilization.

With all unique(9) or unique(10) positions available and sorted in a reasonable order, I expect about 10X gain in performance - but maybe I am being too optimistic given that I still haven't even computed unique(8) myself

.

ankan · Post by **ankan** » Sat Aug 13, 2016 9:47 pm

Dann Corbit wrote: I would like to see statistics about the move types generated.
What is the percentage of captures at each phase?
What is the percentage of e.p. captures at each phase?
What is the number of checkmates at each phase?
How many single reply moves are there at each phase?

Currently my program doesn't save these statistics, but it should be quite easy to add them for the regular perft approach (without too much overhead). I will try to get them in my next run.

thomasahle · Post by **thomasahle** » Sun Aug 14, 2016 1:32 am

Do you use zobrist transposition tables, or something like a regular chaining hash table with correctness guarantees?

sje · Post by **sje** » Sun Aug 14, 2016 5:43 am

From Peter Österlund 2013.04.02: perft(14) = 61,885,021,521,585,529,237

sje · Post by **sje** » Sun Aug 14, 2016 5:52 am

sje wrote:From Peter Österlund 2013.04.02: perft(14) = 61,885,021,521,585,529,237

Which is 10,240 more than Ankan's result.

Which is THREE bits different (XOR) out of 66 bits.

Which could be a triplet of single bit random flips (cosmic rays).

Which shows that independent verification is so important.

Code: Select all

Peter&#58; 110101101011010011110000110011000101100111100000100110000110010101
Ankan&#58; 110101101011010011110000110011000101100111100000100011100110010101

yet another attempt on Perft(14)

yet another attempt on Perft(14)

Re: yet another attempt on Perft(14)

Re: yet another attempt on Perft(14)

Re: yet another attempt on Perft(14)

Re: yet another attempt on Perft(14)

Re: yet another attempt on Perft(14)

Re: yet another attempt on Perft(14)

Re: yet another attempt on Perft(14)

Peter Österlund's Perft(14) result

Re: Peter Österlund's Perft(14) result