I have been kicking around the idea of building a mini-supercomputer from Raspberry Pi's or BeagleBone Black single board computers to do some higher order perft calculations.sje wrote:I calculated perft(7) = 3,195,901,860 back about 1989 on a Sun Microsystems 3/180 workstation which had a 16 MHz Motorola 68020 CPU. The run took about 36 hours. My bitboard program, developed on a 1986 Mac Plus and written in C, did not use transposition table assistance nor did it have any kind of 64 bit support. I used signed integer four byte variables, so I had to manually add the 20 perft(6) subtotals to get the final answer.
Today I have a quad core 64 bit machine with 32 GB RAM, and it can calculate perft(7) in about 1.6 seconds, a speed-up factor of about 80,000. Where will we be in another quarter century?
Perft(14) revisited
Moderators: hgm, Rebel, chrisw
-
- Posts: 83
- Joined: Mon Mar 24, 2014 12:26 am
- Location: Glen Carbon, IL USA
Re: Trivia: The very first perft(7)
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Party time
I've added three more machines, all dual core, into the fray. But they, like me, are old and slow so I don't expect too much from them.
Current completed record counts (100,000 records per work unit):
wu7.000.sum: 9,054
wu7.001.sum: 58,503
wu7.002.sum: 7,341
wu7.003.sum: 9,033
wu7.004.sum: 4,053
wu7.005.sum: 656
wu7.006.sum: 1,165
wu7.007.sum: 192
This will take a long time if it's all done only by me, and that's why I've made the work units public so that others might assist with both original calculation and verification calculation.
If you have a super fast perft() program, some super fast machinery, or both, then you might consider lending a hand. The reward is eternal glory with your name included as a contributor on the canonical perft() result page: http://oeis.org/A048987
Current completed record counts (100,000 records per work unit):
wu7.000.sum: 9,054
wu7.001.sum: 58,503
wu7.002.sum: 7,341
wu7.003.sum: 9,033
wu7.004.sum: 4,053
wu7.005.sum: 656
wu7.006.sum: 1,165
wu7.007.sum: 192
This will take a long time if it's all done only by me, and that's why I've made the work units public so that others might assist with both original calculation and verification calculation.
If you have a super fast perft() program, some super fast machinery, or both, then you might consider lending a hand. The reward is eternal glory with your name included as a contributor on the canonical perft() result page: http://oeis.org/A048987
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Trivia: The very first perft(7)
I've ported Symbolic to both a Raspberry Pi and a BeagleBone Black. As expected, there is a big performance hit when running a bitboard program on any 32 bit machine.jsgroby wrote:I have been kicking around the idea of building a mini-supercomputer from Raspberry Pi's or BeagleBone Black single board computers to do some higher order perft calculations.
I have statistics! Consider the position [d]r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10[/d]
Running perft(4) on the above with no transposition table and no bulk counting is my somewhat standard benchmark.
For a 700 MHz RPi:
wendy Throughput: 146.966 KHz 6.80430 us
For a 1 GHz overclocked RPi:
smokey Throughput: 196.498 KHz 5.08911 us
For a 1 GHz BB Black:
karen Throughput: 274.433 KHz 3.64387 us
For a 3.4 GHz Core i7-2600:
kristen Throughput: 21.2806 MHz 46.9912 ns
You would need 78 BB Black boards to get the same total throughput as a single 3.4 GHz Core i7-2600 machine. The current Model B BB Black costs US$55, so 78 of them would cost US$4,290 -- much more than the Core i7, and you'd pay still more for the BB Black power supplies, Ethernet cabling, and Ethernet switches/hubs.
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
About that test positon
[d]r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10[/d]
I use the above for throughput benchmarking by running perft() to visit every node in the tree and perform a full update of all position related data for each node.
It can also be used for validation purposes, so you might add it to your collection of test positions along with the following data:
I use the above for throughput benchmarking by running perft() to visit every node in the tree and perform a full update of all position related data for each node.
It can also be used for validation purposes, so you might add it to your collection of test positions along with the following data:
Code: Select all
perft(0): 1
perft(1): 46
perft(2): 2,079
perft(3): 89,890
perft(4): 3,894,594
perft(5): 164,075,551
perft(6): 6,923,051,137
perft(7): 287,188,994,746
perft(8): 11,923,589,843,526
-
- Posts: 859
- Joined: Mon Aug 10, 2009 10:05 pm
- Location: Italy
- Full name: Stefano Gemma
Re: About that test positon
I think that it's not a good position to test perft, because Kings are both already castled. In the opening, castling has a big impact in move generation, because of it's "complicated" rules. This lead to in-accurancy on performance estimation of a perft routine, without castling.sje wrote:[d]r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10[/d]
I use the above for throughput benchmarking by running perft() to visit every node in the tree and perform a full update of all position related data for each node.[...]
-
- Posts: 1968
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: About that test position.
Hello Steven:
Great to see you again!
The position you propose was tested a year ago and perft(9) was also computed:
An altenative perft() initial FEN
[d]r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10
I can not participate in this huge task because I only own an 8-year Dual Core (32-bit OS) and I do not have any perft counter that automatically reads FEN strings. gperft is very fast but I do not know how performs against your programme... I do not even know if it can read FEN strings from a file. Other option is release your programme, but I repeat: I can not help because my hardware is very slow.
Regards from Spain.
Ajedrecista.
Great to see you again!
The position you propose was tested a year ago and perft(9) was also computed:
An altenative perft() initial FEN
[d]r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10
Code: Select all
perft(1) = 46
perft(2) = 2,079
perft(3) = 89,890
perft(4) = 3,894,594
perft(5) = 164,075,551
perft(6) = 6,923,051,137
perft(7) = 287,188,994,746
perft(8) = 11,923,589,843,526
perft(9) = 490,154,852,788,714
Regards from Spain.
Ajedrecista.
-
- Posts: 646
- Joined: Wed Jun 18, 2014 2:30 pm
- Full name: Fahad Syed
Re: Perft(14) revisited
Thanks Steven!sje wrote:Back on 2013-04-02, Peter Österlund reported that perft(14) = 61,885,021,521,585,529,237.
I have not yet seen a confirmation of this, nor any intermediate results which would constitute a proof.
My thought is that any calculation of this magnitude needs some sort of independent validation. To this end, I've started a perft(14) effort.
Notes:
There are 96,400,068 unique ply 7 positions from the initial array. Many of these occur only once among the 3,195,901,860 perft(7) tree terminal nodes, but some appear much more often. The maximal occurrence count is 2,600 and there are two positions which occur this often:
rnbqkbnr/pppp1ppp/4p3/8/3PP3/8/PPP2PPP/RNBQKBNR b KQkq - 2 4
rnbqkbnr/pppp1ppp/8/4p3/3PP3/8/PPP2PPP/RNBQKBNR b KQkq - 2 4
To calculate perft(14), one way is to list all of the 96,400,068 unique(7) positions and their occurrence counts, then calculate perft(7) for each of these, then sum the products of the occurrence count and perft(7) for each position.
For the above two positions (FEN, occurrence count, perft(7), product):
rnbqkbnr/pppp1ppp/4p3/8/3PP3/8/PPP2PPP/RNBQKBNR b KQkq - 2 4 2600 43928908948 114215163264800
rnbqkbnr/pppp1ppp/8/4p3/3PP3/8/PPP2PPP/RNBQKBNR b KQkq - 2 4 2600 44948569301 116866280182600
The next six most frequent positions and their [occurrence count, perft(7), product] triad:
rnbqkbnr/pppp1ppp/4p3/8/4P3/3P4/PPP2PPP/RNBQKBNR b KQkq - 2 4 2166 30198759357 65410512767262
rnbqkbnr/pppp1ppp/8/4p3/4P3/3P4/PPP2PPP/RNBQKBNR b KQkq - 2 4 2166 25540143800 55319951470800
rnbqkbnr/ppp1pppp/3p4/8/3PP3/8/PPP2PPP/RNBQKBNR b KQkq - 2 4 2248 31150577721 70026498716808
rnbqkbnr/pppp1ppp/4p3/8/3P4/4P3/PPP2PPP/RNBQKBNR b KQkq - 2 4 2258 33424915802 75473459880916
rnbqkbnr/pppp1ppp/8/4p3/3P4/4P3/PPP2PPP/RNBQKBNR b KQkq - 2 4 2258 39795827246 89858977921468
rnbqkbnr/ppp1pppp/8/3p4/3PP3/8/PPP2PPP/RNBQKBNR b KQkq - 2 4 2368 39377074544 93244912520192
Adding all of the 96,400,068 products will give perft(14), which I estimate to be in the neighborhood of 6x10^19 and will overflow a 64 bit counter, so a special summation will be needed.
I have divided the 96,400,068 unique ply 7 positions into 964 work unit files of 100,000 positions each and a 965th file with 68 positions. Each work unit record has a position FEN followed by its occurrence count. The gzip (".gz" suffix) versions of these can be found at:
https://dl.dropboxusercontent.com/u/316 ... wu7.000.gz
https://dl.dropboxusercontent.com/u/316 ... wu7.001.gz
...
https://dl.dropboxusercontent.com/u/316 ... wu7.964.gz
Result files will be posted as well. For every record in a work unit file, there is a corresponding record in the result file. Each result record is an echo of the input record followed by the perft(7) of the position and the product of it and the occurrence count. Currently, only the result file for the very last workunit is available. It's at:
https://dl.dropboxusercontent.com/u/316 ... 964.sum.gz
The 965 work unit files have passed tests which used them to calculate perft(n), n = 7 through 10.
Calculating perft(14) with my current setup will take years. My idea for posting the workunit links is to stimulate interest for assistance from those with fast machines and a program which can read FEN and can do 100,000 perft(7) calculations in a reasonable time. Another idea is for me to try and squeeze some old perft() code into something OpenCL can digest and run it with my NVIDIA GeForce GTX 775M (2 GB GDDR5 memory, 1,344 CUDA cores). The code I have now uses recursion and that's not allowed with OpenCL kernels.
Another thanks for accepting my request to post this here.
-
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: About that test positon
sje wrote:[d]r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10[/d]
I use the above for throughput benchmarking by running perft() to visit every node in the tree and perform a full update of all position related data for each node.
It can also be used for validation purposes, so you might add it to your collection of test positions along with the following data:Code: Select all
perft(0): 1 perft(1): 46 perft(2): 2,079 perft(3): 89,890 perft(4): 3,894,594 perft(5): 164,075,551 perft(6): 6,923,051,137 perft(7): 287,188,994,746 perft(8): 11,923,589,843,526
Code: Select all
louis@LZsT5610:~/Documents/Chess/Kirby$ ./perft
FEN string = r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - -
Depth = 7
Leaf nodes = 287188994746
Time taken = 2530108 ms
-
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: About that test positon
Make that days. Off by an order of magnitude.zullil wrote:sje wrote:[d]r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10[/d]
I use the above for throughput benchmarking by running perft() to visit every node in the tree and perform a full update of all position related data for each node.
It can also be used for validation purposes, so you might add it to your collection of test positions along with the following data:Code: Select all
perft(0): 1 perft(1): 46 perft(2): 2,079 perft(3): 89,890 perft(4): 3,894,594 perft(5): 164,075,551 perft(6): 6,923,051,137 perft(7): 287,188,994,746 perft(8): 11,923,589,843,526
I got perft(7) right, but perft(8) would take several hours.Code: Select all
louis@LZsT5610:~/Documents/Chess/Kirby$ ./perft FEN string = r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - Depth = 7 Leaf nodes = 287188994746 Time taken = 2530108 ms
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: About that test positon
It doesn't test promotion, either. But that's okay because no one position can test everything in a reasonable amount of time. What the above position does do is to provide a known set of complex middlegame positions which can be used to generate timing benchmark data.stegemma wrote:I think that it's not a good position to test perft, because Kings are both already castled. In the opening, castling has a big impact in move generation, because of it's "complicated" rules. This lead to in-accurancy on performance estimation of a perft routine, without castling.