Perft(14) revisited

jsgroby · Post by **jsgroby** » Fri Aug 08, 2014 11:00 pm

sje wrote:I calculated perft(7) = 3,195,901,860 back about 1989 on a Sun Microsystems 3/180 workstation which had a 16 MHz Motorola 68020 CPU. The run took about 36 hours. My bitboard program, developed on a 1986 Mac Plus and written in C, did not use transposition table assistance nor did it have any kind of 64 bit support. I used signed integer four byte variables, so I had to manually add the 20 perft(6) subtotals to get the final answer.

Today I have a quad core 64 bit machine with 32 GB RAM, and it can calculate perft(7) in about 1.6 seconds, a speed-up factor of about 80,000. Where will we be in another quarter century?

I have been kicking around the idea of building a mini-supercomputer from Raspberry Pi's or BeagleBone Black single board computers to do some higher order perft calculations.

sje · Post by **sje** » Sat Aug 09, 2014 7:21 am

I've added three more machines, all dual core, into the fray. But they, like me, are old and slow so I don't expect too much from them.

Current completed record counts (100,000 records per work unit):

wu7.000.sum: 9,054
wu7.001.sum: 58,503
wu7.002.sum: 7,341
wu7.003.sum: 9,033
wu7.004.sum: 4,053
wu7.005.sum: 656
wu7.006.sum: 1,165
wu7.007.sum: 192

This will take a long time if it's all done only by me, and that's why I've made the work units public so that others might assist with both original calculation and verification calculation.

If you have a super fast perft() program, some super fast machinery, or both, then you might consider lending a hand. The reward is eternal glory with your name included as a contributor on the canonical perft() result page: http://oeis.org/A048987

sje · Post by **sje** » Sat Aug 09, 2014 7:36 am

jsgroby wrote:I have been kicking around the idea of building a mini-supercomputer from Raspberry Pi's or BeagleBone Black single board computers to do some higher order perft calculations.

I've ported Symbolic to both a Raspberry Pi and a BeagleBone Black. As expected, there is a big performance hit when running a bitboard program on any 32 bit machine.

I have statistics! Consider the position [d]r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10[/d]

Running perft(4) on the above with no transposition table and no bulk counting is my somewhat standard benchmark.

For a 700 MHz RPi:
wendy Throughput: 146.966 KHz 6.80430 us

For a 1 GHz overclocked RPi:
smokey Throughput: 196.498 KHz 5.08911 us

For a 1 GHz BB Black:
karen Throughput: 274.433 KHz 3.64387 us

For a 3.4 GHz Core i7-2600:
kristen Throughput: 21.2806 MHz 46.9912 ns

You would need 78 BB Black boards to get the same total throughput as a single 3.4 GHz Core i7-2600 machine. The current Model B BB Black costs US$55, so 78 of them would cost US$4,290 -- much more than the Core i7, and you'd pay still more for the BB Black power supplies, Ethernet cabling, and Ethernet switches/hubs.

sje · Post by **sje** » Sat Aug 09, 2014 8:52 am

[d]r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10[/d]
I use the above for throughput benchmarking by running perft() to visit every node in the tree and perform a full update of all position related data for each node.

It can also be used for validation purposes, so you might add it to your collection of test positions along with the following data:

Code: Select all

perft&#40;0&#41;&#58; 1
perft&#40;1&#41;&#58; 46
perft&#40;2&#41;&#58; 2,079
perft&#40;3&#41;&#58; 89,890
perft&#40;4&#41;&#58; 3,894,594
perft&#40;5&#41;&#58; 164,075,551
perft&#40;6&#41;&#58; 6,923,051,137
perft&#40;7&#41;&#58; 287,188,994,746
perft&#40;8&#41;&#58; 11,923,589,843,526

stegemma · Post by **stegemma** » Sat Aug 09, 2014 11:07 am

sje wrote:[d]r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10[/d]
I use the above for throughput benchmarking by running perft() to visit every node in the tree and perform a full update of all position related data for each node.[...]

I think that it's not a good position to test perft, because Kings are both already castled. In the opening, castling has a big impact in move generation, because of it's "complicated" rules. This lead to in-accurancy on performance estimation of a perft routine, without castling.

Ajedrecista · Post by **Ajedrecista** » Sat Aug 09, 2014 1:42 pm

Hello Steven:

Great to see you again!

The position you propose was tested a year ago and perft(9) was also computed:

An altenative perft() initial FEN

[d]r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10

Code: Select all

perft&#40;1&#41; =                  46
perft&#40;2&#41; =               2,079
perft&#40;3&#41; =              89,890
perft&#40;4&#41; =           3,894,594
perft&#40;5&#41; =         164,075,551
perft&#40;6&#41; =       6,923,051,137
perft&#40;7&#41; =     287,188,994,746
perft&#40;8&#41; =  11,923,589,843,526
perft&#40;9&#41; = 490,154,852,788,714

I can not participate in this huge task because I only own an 8-year Dual Core (32-bit OS) and I do not have any perft counter that automatically reads FEN strings. gperft is very fast but I do not know how performs against your programme... I do not even know if it can read FEN strings from a file. Other option is release your programme, but I repeat: I can not help because my hardware is very slow.

Regards from Spain.

Ajedrecista.

vittyvirus · Post by **vittyvirus** » Sat Aug 09, 2014 1:59 pm

sje wrote:Back on 2013-04-02, Peter Österlund reported that perft(14) = 61,885,021,521,585,529,237.

I have not yet seen a confirmation of this, nor any intermediate results which would constitute a proof.

My thought is that any calculation of this magnitude needs some sort of independent validation. To this end, I've started a perft(14) effort.

Notes:

There are 96,400,068 unique ply 7 positions from the initial array. Many of these occur only once among the 3,195,901,860 perft(7) tree terminal nodes, but some appear much more often. The maximal occurrence count is 2,600 and there are two positions which occur this often:

rnbqkbnr/pppp1ppp/4p3/8/3PP3/8/PPP2PPP/RNBQKBNR b KQkq - 2 4
rnbqkbnr/pppp1ppp/8/4p3/3PP3/8/PPP2PPP/RNBQKBNR b KQkq - 2 4

To calculate perft(14), one way is to list all of the 96,400,068 unique(7) positions and their occurrence counts, then calculate perft(7) for each of these, then sum the products of the occurrence count and perft(7) for each position.

For the above two positions (FEN, occurrence count, perft(7), product):

rnbqkbnr/pppp1ppp/4p3/8/3PP3/8/PPP2PPP/RNBQKBNR b KQkq - 2 4 2600 43928908948 114215163264800
rnbqkbnr/pppp1ppp/8/4p3/3PP3/8/PPP2PPP/RNBQKBNR b KQkq - 2 4 2600 44948569301 116866280182600

The next six most frequent positions and their [occurrence count, perft(7), product] triad:

rnbqkbnr/pppp1ppp/4p3/8/4P3/3P4/PPP2PPP/RNBQKBNR b KQkq - 2 4 2166 30198759357 65410512767262
rnbqkbnr/pppp1ppp/8/4p3/4P3/3P4/PPP2PPP/RNBQKBNR b KQkq - 2 4 2166 25540143800 55319951470800
rnbqkbnr/ppp1pppp/3p4/8/3PP3/8/PPP2PPP/RNBQKBNR b KQkq - 2 4 2248 31150577721 70026498716808
rnbqkbnr/pppp1ppp/4p3/8/3P4/4P3/PPP2PPP/RNBQKBNR b KQkq - 2 4 2258 33424915802 75473459880916
rnbqkbnr/pppp1ppp/8/4p3/3P4/4P3/PPP2PPP/RNBQKBNR b KQkq - 2 4 2258 39795827246 89858977921468
rnbqkbnr/ppp1pppp/8/3p4/3PP3/8/PPP2PPP/RNBQKBNR b KQkq - 2 4 2368 39377074544 93244912520192

Adding all of the 96,400,068 products will give perft(14), which I estimate to be in the neighborhood of 6x10^19 and will overflow a 64 bit counter, so a special summation will be needed.

I have divided the 96,400,068 unique ply 7 positions into 964 work unit files of 100,000 positions each and a 965th file with 68 positions. Each work unit record has a position FEN followed by its occurrence count. The gzip (".gz" suffix) versions of these can be found at:

https://dl.dropboxusercontent.com/u/316 ... wu7.000.gz
https://dl.dropboxusercontent.com/u/316 ... wu7.001.gz
...
https://dl.dropboxusercontent.com/u/316 ... wu7.964.gz

Result files will be posted as well. For every record in a work unit file, there is a corresponding record in the result file. Each result record is an echo of the input record followed by the perft(7) of the position and the product of it and the occurrence count. Currently, only the result file for the very last workunit is available. It's at:

https://dl.dropboxusercontent.com/u/316 ... 964.sum.gz

The 965 work unit files have passed tests which used them to calculate perft(n), n = 7 through 10.

Calculating perft(14) with my current setup will take years. My idea for posting the workunit links is to stimulate interest for assistance from those with fast machines and a program which can read FEN and can do 100,000 perft(7) calculations in a reasonable time. Another idea is for me to try and squeeze some old perft() code into something OpenCL can digest and run it with my NVIDIA GeForce GTX 775M (2 GB GDDR5 memory, 1,344 CUDA cores). The code I have now uses recursion and that's not allowed with OpenCL kernels.

Thanks Steven!
Another thanks for accepting my request to post this here.

zullil · Post by **zullil** » Sat Aug 09, 2014 4:47 pm

sje wrote:[d]r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10[/d]
I use the above for throughput benchmarking by running perft() to visit every node in the tree and perform a full update of all position related data for each node.

It can also be used for validation purposes, so you might add it to your collection of test positions along with the following data:
Code: Select all
perft&#40;0&#41;&#58; 1
perft&#40;1&#41;&#58; 46
perft&#40;2&#41;&#58; 2,079
perft&#40;3&#41;&#58; 89,890
perft&#40;4&#41;&#58; 3,894,594
perft&#40;5&#41;&#58; 164,075,551
perft&#40;6&#41;&#58; 6,923,051,137
perft&#40;7&#41;&#58; 287,188,994,746
perft&#40;8&#41;&#58; 11,923,589,843,526

Code: Select all

louis@LZsT5610&#58;~/Documents/Chess/Kirby$ ./perft 
FEN string = r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - -
Depth = 7
Leaf nodes = 287188994746
Time taken = 2530108 ms

I got perft(7) right, but perft(8) would take several hours.

zullil · Post by **zullil** » Sat Aug 09, 2014 11:57 pm

zullil wrote:
sje wrote:[d]r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10[/d]
I use the above for throughput benchmarking by running perft() to visit every node in the tree and perform a full update of all position related data for each node.

It can also be used for validation purposes, so you might add it to your collection of test positions along with the following data:
Code: Select all
perft&#40;0&#41;&#58; 1
perft&#40;1&#41;&#58; 46
perft&#40;2&#41;&#58; 2,079
perft&#40;3&#41;&#58; 89,890
perft&#40;4&#41;&#58; 3,894,594
perft&#40;5&#41;&#58; 164,075,551
perft&#40;6&#41;&#58; 6,923,051,137
perft&#40;7&#41;&#58; 287,188,994,746
perft&#40;8&#41;&#58; 11,923,589,843,526
Code: Select all
louis@LZsT5610&#58;~/Documents/Chess/Kirby$ ./perft 
FEN string = r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - -
Depth = 7
Leaf nodes = 287188994746
Time taken = 2530108 ms
I got perft(7) right, but perft(8) would take several hours.

Make that days. Off by an order of magnitude.

sje · Post by **sje** » Sun Aug 10, 2014 12:01 am

stegemma wrote:I think that it's not a good position to test perft, because Kings are both already castled. In the opening, castling has a big impact in move generation, because of it's "complicated" rules. This lead to in-accurancy on performance estimation of a perft routine, without castling.

It doesn't test promotion, either. But that's okay because no one position can test everything in a reasonable amount of time. What the above position does do is to provide a known set of complex middlegame positions which can be used to generate timing benchmark data.

Perft(14) revisited

Re: Trivia: The very first perft(7)

Party time

Re: Trivia: The very first perft(7)

About that test positon

Re: About that test positon

Re: About that test position.

Re: Perft(14) revisited

Re: About that test positon

Re: About that test positon

Re: About that test positon