yet another attempt on Perft(14)

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: yet another attempt on Perft(14)

Post by stegemma »

ankan wrote:Here is my latest attempt in computing perft(14) using my GPU perft program:

Code: Select all

Perft(03):                8902, time:    0.201 s
Perft(04):              197281, time:    0.001 s
Perft(05):             4865609, time:    0.001 s
Perft(06):           119060324, time:    0.002 s
Perft(07):          3195901860, time:    0.016 s
Perft(08):         84998978956, time:    0.206 s
Perft(09):       2439530234167, time:    1.414 s
Perft(10):      69352859712417, time:   15.278 s
Perft(11):    2097651003696806, time:  179.091 s
Perft(12):   62854969236701747, time:  2932.64 s
Perft(13): 1981066775000396239, time:  42792.7 s

..b1-a3     1988096314966752669
..g1-f3     2585178594020547799
..b1-c3     2638654908094167513
..a2-a3     1707945110805813310
..g1-h3     2022544576756325295
..b2-b3     2286406377370528139
..c2-c3     2666334464745721380
..f2-f3     1359948551007941806
..d2-d3     4459055881158216540
..g2-g3     2388698250037891063
..h2-h3     1691016253163166371
..a2-a4     2493115507450124588
..b2-b4     2297640117530334747
..e2-e3     7478732807317823667
..c2-c4     3067868595132318179
..f2-f4     1905689484049474095
..g2-g4     2050748957027119617
..h2-h4     2546461593049574382
..d2-d4     6636612996377425812
..e2-e4     7614272181524252025
Perft(14):61885021521585518997, time:   795802 s
Complete output of the program (divided perfts till first 4 levels) is here:
https://raw.githubusercontent.com/ankan ... erft14.txt

After Steven Edwards reported mismatches due to false transposition table positives caused by 64 bit hash signatures, I updated my program to use 128 bit hashes. I also made some performance improvements and changes to utilize multiple GPUs in the machine if available. This run was performed on a machine with 3 Nvidia Titan X GPUs.

Instead of Steven's approach of computing perft(7) of all 96400068 unique(7) positions, I decided to compute perft using the regular method which even though is slightly slower it has the advantage of obtaining divided counts.

Unfortunately my result doesn't match the value reported by Peter Österlund back in 2013. With 128 bit hashes, it's unlikely to be caused by a false positive/hash collision but I can't rule out bugs in my program or random bit flips due to hardware failure/instability.

I am going to try one more time using unique(8) or unique(9) positions - or use the same regular method but using different set of random zobrist keys to rule out possibility of hardware instability.
If you already use a 128 bit hash, it would be almost possible to use a compressed FEN or something similar, with exact position? Maybe it would be a little slower but it will be very perfect.
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: yet another attempt on Perft(14)

Post by Dann Corbit »

stegemma wrote:
ankan wrote:Here is my latest attempt in computing perft(14) using my GPU perft program:

Code: Select all

Perft(03):                8902, time:    0.201 s
Perft(04):              197281, time:    0.001 s
Perft(05):             4865609, time:    0.001 s
Perft(06):           119060324, time:    0.002 s
Perft(07):          3195901860, time:    0.016 s
Perft(08):         84998978956, time:    0.206 s
Perft(09):       2439530234167, time:    1.414 s
Perft(10):      69352859712417, time:   15.278 s
Perft(11):    2097651003696806, time:  179.091 s
Perft(12):   62854969236701747, time:  2932.64 s
Perft(13): 1981066775000396239, time:  42792.7 s

..b1-a3     1988096314966752669
..g1-f3     2585178594020547799
..b1-c3     2638654908094167513
..a2-a3     1707945110805813310
..g1-h3     2022544576756325295
..b2-b3     2286406377370528139
..c2-c3     2666334464745721380
..f2-f3     1359948551007941806
..d2-d3     4459055881158216540
..g2-g3     2388698250037891063
..h2-h3     1691016253163166371
..a2-a4     2493115507450124588
..b2-b4     2297640117530334747
..e2-e3     7478732807317823667
..c2-c4     3067868595132318179
..f2-f4     1905689484049474095
..g2-g4     2050748957027119617
..h2-h4     2546461593049574382
..d2-d4     6636612996377425812
..e2-e4     7614272181524252025
Perft(14):61885021521585518997, time:   795802 s
Complete output of the program (divided perfts till first 4 levels) is here:
https://raw.githubusercontent.com/ankan ... erft14.txt

After Steven Edwards reported mismatches due to false transposition table positives caused by 64 bit hash signatures, I updated my program to use 128 bit hashes. I also made some performance improvements and changes to utilize multiple GPUs in the machine if available. This run was performed on a machine with 3 Nvidia Titan X GPUs.

Instead of Steven's approach of computing perft(7) of all 96400068 unique(7) positions, I decided to compute perft using the regular method which even though is slightly slower it has the advantage of obtaining divided counts.

Unfortunately my result doesn't match the value reported by Peter Österlund back in 2013. With 128 bit hashes, it's unlikely to be caused by a false positive/hash collision but I can't rule out bugs in my program or random bit flips due to hardware failure/instability.

I am going to try one more time using unique(8) or unique(9) positions - or use the same regular method but using different set of random zobrist keys to rule out possibility of hardware instability.
If you already use a 128 bit hash, it would be almost possible to use a compressed FEN or something similar, with exact position? Maybe it would be a little slower but it will be very perfect.
I remember a while back there were EPD compression schemes around 160 bits
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: yet another attempt on Perft(14)

Post by stegemma »

Dann Corbit wrote:
stegemma wrote:[...]

If you already use a 128 bit hash, it would be almost possible to use a compressed FEN or something similar, with exact position? Maybe it would be a little slower but it will be very perfect.
I remember a while back there were EPD compression schemes around 160 bits
In effect you can even use two 128 bit values (a 256 bit perfect position), to speed up position definition. I think that let your computer runs for weeks or months to get back some uncertain result is a little brain-damaged ;)
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

The lesser evil

Post by sje »

As had been mentioned long ago, it is relatively utproductive to increase the signature bit length well past the point where the probability of a false match is far less than the probability of error caused by cosmic ray events.

And a cosmic ray event is what has almost certainly happened in one of the two perft(14) answers seen so far. My guess is that Peter's answer is correct as regular RAM is more resistant to disruption than GPU RAM. But that's only a guess.

Of the 54,100,000+ perft(7) results generated so far by Symbolic, more than 14 million have been verified. This is another piece of evidence that cosmic ray events are relatively infrequent. What about the 40+ million not yet verified? What about the 40+ million not yet calculated? Only time will tell. And Symbolic's hardware is also susceptible to cosmic ray events.

----

Note: "cosmic ray event" here also includes the case of a random, nearby case of a fissioning atom as the source of an interfering particle.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: The lesser evil

Post by stegemma »

sje wrote:As had been mentioned long ago, it is relatively utproductive to increase the signature bit length well past the point where the probability of a false match is far less than the probability of error caused by cosmic ray events.

And a cosmic ray event is what has almost certainly happened in one of the two perft(14) answers seen so far. My guess is that Peter's answer is correct as regular RAM is more resistant to disruption than GPU RAM. But that's only a guess.

Of the 54,100,000+ perft(7) results generated so far by Symbolic, more than 14 million have been verified. This is another piece of evidence that cosmic ray events are relatively infrequent. What about the 40+ million not yet verified? What about the 40+ million not yet calculated? Only time will tell. And Symbolic's hardware is also susceptible to cosmic ray events.

----

Note: "cosmic ray event" here also includes the case of a random, nearby case of a fissioning atom as the source of an interfering particle.
That's true but with actual hardware (or maybe near in the future) doubling RAM to obtain an exact hash could be the optimal solution. Remember that with "not perfect hashing" you just add an error probability to any possible external event, like cosmic rays, hardware faults, nuclear explosions, alien invasion... and so on. ;)
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
smatovic
Posts: 2639
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Cosmic rays - ECC

Post by smatovic »

the server brand Tesla Pascal offers native ECC protected HBM2 memory...

https://devblogs.nvidia.com/parallelfor ... de-pascal/

Code: Select all

ECC Memory

Another HBM2 benefit is native support for error correcting code (ECC) funtionality, which provides higher reliability for technical computing applications that are sensitive to data corruption, such as in large-scale clusters and supercomputers, where GPUs process large datasets with long application run times.

ECC technology detects and corrects single-bit soft errors before they affect the system. In comparison, GDDR5 does not provide internal ECC protection of the contents of memory and is limited to error detection of the GDDR5 bus only: Errors in the memory controller or the DRAM itself are not detected.

GK110 Kepler GPUs offered ECC protection for GDDR5 by allocating some of the available memory for explicit ECC storage. 6.25% of the overall GDDR5 is reserved for ECC bits. In the case of a 12 GB Tesla K40 (for example), 750 MB of its total memory is reserved for ECC operation, resulting in 11.25 GB (out of 12 GB) of available memory with ECC turned on for Tesla K40. Also, accessing ECC bits causes a small decrease in memory bandwidth compared to the non-ECC case. Since HBM2 supports ECC natively, Tesla P100 does not suffer from the capacity overhead, and ECC can be active at all times without a bandwidth penalty. Like the GK110 GPU, the GP100 GPU’s register files, shared memories, L1 cache, L2 cache, and the Tesla P100 accelerator’s HBM2 DRAM are protected by a Single‐Error Correct Double‐Error Detect (SECDED) ECC code.
smatovic
Posts: 2639
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Cosmic rays - ECC

Post by smatovic »

...with ECC it is still possible that two or more bits are flipped without correction,
but the frequency of this event is much lower.

Numbers from Jaguar Cluster with 360 TB RAM:

ECC errors: 350 per Minute
Double bit erros: 1 per 24 hours

http://spectrum.ieee.org/computing/hard ... bad-solder

--
Srdja
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Cosmic rays and ECC memory

Post by sje »

Cosmic rays and ECC memory

One, and only one, machine in use for the Perft(14) project has ECC RAM. It's a 2006 quad core 2.66 GHz Xeon Mac Pro, now with 32 GiB memory -- quite fast ten years ago, not so much today. I've checked its logs and no ECC errors have ever been detected; this for a machine which runs nearly 24/7.

This may be due in part to its location being only about 70 meters above sea level with lots of atmosphere above.
smatovic
Posts: 2639
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Cosmic rays - ECC -hardware dependent

Post by smatovic »

Numbers from Jaguar Cluster with 360 TB RAM:

ECC errors: 350 per Minute
Double bit erros: 1 per 24 hours
just for the record:
the above numbers are imo not suitable to make general assumptions
of cosmic ray bit flip events in memory...

another study with data on correctable and uncorrectable errors by Google from
2010 shows that error rate and ratio is clearly platform dependent...

http://static.googleusercontent.com/med ... /35162.pdf

--
Srdja
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Cosmic rays and ECC memory

Post by Dann Corbit »

sje wrote:Cosmic rays and ECC memory

One, and only one, machine in use for the Perft(14) project has ECC RAM. It's a 2006 quad core 2.66 GHz Xeon Mac Pro, now with 32 GiB memory -- quite fast ten years ago, not so much today. I've checked its logs and no ECC errors have ever been detected; this for a machine which runs nearly 24/7.

This may be due in part to its location being only about 70 meters above sea level with lots of atmosphere above.
Interesting article on memory reliability:
https://www.pugetsystems.com/labs/artic ... emory-520/
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.