perft(15)

ankan · Post by **ankan** » Fri Aug 25, 2017 10:01 am

I have computed perft(15) of starting position. Any guesses before I post my result?

zullil · Post by **zullil** » Fri Aug 25, 2017 12:03 pm

ankan wrote:I have computed perft(15) of starting position. Any guesses before I post my result?

22 decimal digits. But I still don't see the point of doing this.

sovaz1997 · Post by **sovaz1997** » Fri Aug 25, 2017 1:03 pm

About 1856950645647565877110

elpapa · Post by **elpapa** » Fri Aug 25, 2017 1:24 pm

2,037,604,589,683,153,922,713

ankan · Post by **ankan** » Fri Aug 25, 2017 6:30 pm

Code: Select all

..b1-a3    62655830257579440909
..b1-c3    85088059730297336608
..g1-f3    81899150426761231912
..g1-h3    63907594566611189505
..a2-a3    53041891174879583392
..b2-b3    71654598438321484175
..c2-c3    85259585688411182584
..d2-d3    141281970521456325645
..e2-e3    255901924404429450892
..f2-f3    39666500997442921562
..g2-g3    75323915154922880025
..h2-h3    52348873157709292191
..a2-a4    79757154364517747244
..b2-b4    72727106025678883781
..c2-c4    98785655398496438907
..d2-d4    228792091600885596893
..e2-e4    263205233941976533298
..f2-f4    59045412652411222755
..g2-g4    63100127446410763940
..h2-h4    81657274104164965742

Perft(15):2015099950053364471960

I would like to dedicate this computation to Steven Edwards - whose tireless efforts for verifying perft(14) encouraged me to verify perft(14) myself and take up the challenge to compute perft(15).

Ajedrecista · Post by **Ajedrecista** » Fri Aug 25, 2017 8:56 pm

Hello Ankan:

ankan wrote:I have computed perft(15) of starting position. Any guesses before I post my result?

Thank you very much! There is a Perft(15) estimates thread started by Steven:

Perft(15) estimates thread

There were really good estimates. Here is a summary of the estimates of that thread given in chronological order:

2,015,977,411,404,673,313,379 (by me).

2.015e+21 (by François Labelle).

2.01533e21 (by Reinhard Scharnagl).

2.0150985437710440e+21 (by Peter Österlund).

2.014998e+021 (by me).

2.015258e+021 (by me).

2015107631301193211900 (by me).

I have not included info about standard deviations for simplicity. They can be seen in each post.

If I order the estimates, I get:

Code: Select all

2.014998e+021 (by me).
2.015e+21 (by François Labelle).
2.0150985437710440e+21 (by Peter Österlund).
2015107631301193211900 (by me).
2.015258e+021 (by me).
2.01533e21 (by Reinhard Scharnagl).
2,015,977,411,404,673,313,379 (by me).

ankan wrote:

Code: Select all

..b1-a3    62655830257579440909 
..b1-c3    85088059730297336608 
..g1-f3    81899150426761231912 
..g1-h3    63907594566611189505 
..a2-a3    53041891174879583392 
..b2-b3    71654598438321484175 
..c2-c3    85259585688411182584 
..d2-d3    141281970521456325645 
..e2-e3    255901924404429450892 
..f2-f3    39666500997442921562 
..g2-g3    75323915154922880025 
..h2-h3    52348873157709292191 
..a2-a4    79757154364517747244 
..b2-b4    72727106025678883781 
..c2-c4    98785655398496438907 
..d2-d4    228792091600885596893 
..e2-e4    263205233941976533298 
..f2-f4    59045412652411222755 
..g2-g4    63100127446410763940 
..h2-h4    81657274104164965742 

Perft(15):2015099950053364471960

Fantastic! I did many estimates but the best one was Peter's, like in Perft(14). Summarizing relative errors:

Code: Select all

Relative errors in percentage: 100*{Estimate/[Ankan's Perft(15)] - 1}

-0.005059 % (by me).
-0.004960 % (by François Labelle).
-0.000070 % (by Peter Österlund).
+0.000381 % (by me).
+0.007843 % (by me).
+0.011416 % (by Reinhard Scharnagl).
+0.043544 % (by me).

---------------------------------------------------------------------

In absolute value:

1) 0.000070 % (by Peter Österlund).
2) 0.000381 % (by me).
3) 0.004960 % (by François Labelle).
4) 0.005059 % (by me).
5) 0.007843 % (by me).
6) 0.011416 % (by Reinhard Scharnagl).
7) 0.043544 % (by me).

Can you give more details about this milestone, please? Elapsed time, used hardware...

OTOH, your count should be verified by other person in order to get more reliability.

Regards from Spain.

Ajedrecista.

petero2 · Post by **petero2** » Fri Aug 25, 2017 9:03 pm

ankan wrote:
Code: Select all
Perft(15):2015099950053364471960

Congratulations, very impressive.

Unfortunately I don't have the computing power needed to verify it, but I found some old data from my perft estimator. Computing the average and standard deviation (of the average) of that data I get:

Code: Select all

mean 2015098543771040000000
sDev       1742200209330515

So I can at least say your result looks very reasonable.

Dann Corbit · Post by **Dann Corbit** » Sat Aug 26, 2017 3:00 am

ankan wrote:I have computed perft(15) of starting position. Any guesses before I post my result?

Do you have stored statistics, such as how many mates are in the first 15 ply, how many stalemates, etc.?

ankan · Post by **ankan** » Sat Aug 26, 2017 6:33 am

Dann Corbit wrote: Do you have stored statistics, such as how many mates are in the first 15 ply, how many stalemates, etc.?

Unfortunately no. I can modify my program to count these statistics - but it will probably become quite a bit slow. I would guess counting these will for first 14 ply would take about the same time as the perft 15 computation took.

ankan · Post by **ankan** » Sat Aug 26, 2017 6:39 am

A cluster of DGX1 server systems was used to compute this result. The computation took about 8 days running on average of 4 systems (32 GPUs total) at a time. A second verification run took about 5 days running on average of 8 systems at a time. The verification run was performed with a different set or random numbers for zobrist hashes. Obviously this doesn't guarantee correctness as there could be bugs in my program - but at least the verification run avoids the possibility wrong result due to hardware instability or hash collisions.

My program makes use of GPUs to explore the tree and count no. of leaf nodes. It can make use of multiple GPUs in a system and now also supports a network mode where one can run the program on multiple systems on a network and these instances will communicate with each other to avoid duplicate work.

perft is basically a brute force problem - just exploring a tree and counting leaf nodes. When I first started with GPU version of perft - back in 2013, I was doing it just to see if GPUs are suitable for integer calculations and irregular problems like move generation for chess.
Most of the initial optimizations were focussed in improving speed of move generation. After many experiments I found that a breadth first search kind of approach works best for the GPU. My program starts exploring the tree in depth first manner on CPU. When a certain depth is reached a GPU function (kernel) is launched to compute perft of the subtree. The launch depth (currently 6) is chosen to make sure GPU kernel will be able to fit the entire subtree in memory. If it fails to do so, the kernel returns failure and multiple launches are made at depth-1. The GPU kernel explores the tree in breadth first manner. At every level, GPU kernels are launched to count and then generate child boards - which are then used for next level. The no of threads launched is equal to the no of frontier nodes (i.e, every thread processes one board position). At depth = 1 (just above leaves), a special kernel is launched to perform bulk counting (without actually generating child boards). For details of simple GPU perft routine see the function perft_bb_gpu_simple().

Soon I realized that if I want my perft program to be competitive and to be able to compute bigger perfts, I need transposition tables to avoid duplicate work. I still launch GPU work in breadth first manner. Before launching new kernel with all frontier nodes, a small separate hash table is used to find (and discard) duplicates in the current set of frontier nodes. The bigger transpositions tables are used in similar way as a CPU perft routine would use - i.e, in case of a hit, return value from hash table without exploring the node any further (for breadth first search, the node is removed from the list of frontier nodes). There are more than a single main hash table - different hash tables for different 'depths'. For depths that are explored on CPU, a lossless hash table is used (hash table with chain to hold multiple entries at same slot). For depths that are explored on the GPU, there are some hash tables in video/device memory (shallower levels) and some in system memory (deeper levels). These hash-tables can't hold all entries and act more like cache. See perft_bb_gpu_simple_hash() for more details.

Once I had my perft programming running and working well on single GPU, I wanted to experiment with scaling/distributing the work across multiple GPUs. Last year, for perft(14) verification I added some code to distribute sub-trees to different GPUs on a single system. The distribution happens at the root - every GPU gets a different set of sub-trees to explore. CPU worker threads are launched corresponding to each GPU on the system. These CPU threads are responsible for managing one GPU each. Each thread starts exploring a sub-tree and launches GPU work when GPU launch depth is reached. The CPU side hash tables are shared among these threads, and even the system memory hash tables used by GPU perft are also shared. However the device memory hash tables are not shared.

A few months back I decided to try distributing the problem to multiple systems (nodes on a cluster) - in the hope of computing perft(15) in reasonable amount of time. A shared file on network is used to hold a list (ip addresses) of all currently active nodes. Whenever a new instance comes online, it requests the first node in the list to send the complete CPU side hash table (the lossless one). It then adds its own IP address to the list. Whenever a node adds a new entry to CPU side hash table, it broadcasts this entry to all other nodes in the list. In case a node is no longer reachable, it's removed from the list (by the first peer that detects). This way the CPU side hash table is kept in sync across all instances of the program.

Disclosure: I work for Nvidia but chess programming is a personal hobby. I would like to thank my employer for letting me to use cycles on the cluster for this computation.

perft(15)

perft(15)

Re: perft(15)

Re: perft(15)

Re: perft(15)

Re: perft(15)

Re: Perft(15): comparison of estimates with Ankan's result.

Re: perft(15)

Re: perft(15)

Re: perft(15)

Re: Perft(15): comparison of estimates with Ankan's result.