yet another attempt on Perft(14)

stegemma · Post by **stegemma** » Tue Aug 16, 2016 8:31 am

ankan wrote:Here is my latest attempt in computing perft(14) using my GPU perft program:
Code: Select all
Perft&#40;03&#41;&#58;                8902, time&#58;    0.201 s
Perft&#40;04&#41;&#58;              197281, time&#58;    0.001 s
Perft&#40;05&#41;&#58;             4865609, time&#58;    0.001 s
Perft&#40;06&#41;&#58;           119060324, time&#58;    0.002 s
Perft&#40;07&#41;&#58;          3195901860, time&#58;    0.016 s
Perft&#40;08&#41;&#58;         84998978956, time&#58;    0.206 s
Perft&#40;09&#41;&#58;       2439530234167, time&#58;    1.414 s
Perft&#40;10&#41;&#58;      69352859712417, time&#58;   15.278 s
Perft&#40;11&#41;&#58;    2097651003696806, time&#58;  179.091 s
Perft&#40;12&#41;&#58;   62854969236701747, time&#58;  2932.64 s
Perft&#40;13&#41;&#58; 1981066775000396239, time&#58;  42792.7 s

..b1-a3     1988096314966752669
..g1-f3     2585178594020547799
..b1-c3     2638654908094167513
..a2-a3     1707945110805813310
..g1-h3     2022544576756325295
..b2-b3     2286406377370528139
..c2-c3     2666334464745721380
..f2-f3     1359948551007941806
..d2-d3     4459055881158216540
..g2-g3     2388698250037891063
..h2-h3     1691016253163166371
..a2-a4     2493115507450124588
..b2-b4     2297640117530334747
..e2-e3     7478732807317823667
..c2-c4     3067868595132318179
..f2-f4     1905689484049474095
..g2-g4     2050748957027119617
..h2-h4     2546461593049574382
..d2-d4     6636612996377425812
..e2-e4     7614272181524252025
Perft&#40;14&#41;&#58;61885021521585518997, time&#58;   795802 s
Complete output of the program (divided perfts till first 4 levels) is here:
https://raw.githubusercontent.com/ankan ... erft14.txt

After Steven Edwards reported mismatches due to false transposition table positives caused by 64 bit hash signatures, I updated my program to use 128 bit hashes. I also made some performance improvements and changes to utilize multiple GPUs in the machine if available. This run was performed on a machine with 3 Nvidia Titan X GPUs.

Instead of Steven's approach of computing perft(7) of all 96400068 unique(7) positions, I decided to compute perft using the regular method which even though is slightly slower it has the advantage of obtaining divided counts.

Unfortunately my result doesn't match the value reported by Peter Österlund back in 2013. With 128 bit hashes, it's unlikely to be caused by a false positive/hash collision but I can't rule out bugs in my program or random bit flips due to hardware failure/instability.

I am going to try one more time using unique(8) or unique(9) positions - or use the same regular method but using different set of random zobrist keys to rule out possibility of hardware instability.

If you already use a 128 bit hash, it would be almost possible to use a compressed FEN or something similar, with exact position? Maybe it would be a little slower but it will be very perfect.

Dann Corbit · Post by **Dann Corbit** » Tue Aug 16, 2016 9:03 am

stegemma wrote:
ankan wrote:Here is my latest attempt in computing perft(14) using my GPU perft program:
Code: Select all
Perft&#40;03&#41;&#58;                8902, time&#58;    0.201 s
Perft&#40;04&#41;&#58;              197281, time&#58;    0.001 s
Perft&#40;05&#41;&#58;             4865609, time&#58;    0.001 s
Perft&#40;06&#41;&#58;           119060324, time&#58;    0.002 s
Perft&#40;07&#41;&#58;          3195901860, time&#58;    0.016 s
Perft&#40;08&#41;&#58;         84998978956, time&#58;    0.206 s
Perft&#40;09&#41;&#58;       2439530234167, time&#58;    1.414 s
Perft&#40;10&#41;&#58;      69352859712417, time&#58;   15.278 s
Perft&#40;11&#41;&#58;    2097651003696806, time&#58;  179.091 s
Perft&#40;12&#41;&#58;   62854969236701747, time&#58;  2932.64 s
Perft&#40;13&#41;&#58; 1981066775000396239, time&#58;  42792.7 s

..b1-a3     1988096314966752669
..g1-f3     2585178594020547799
..b1-c3     2638654908094167513
..a2-a3     1707945110805813310
..g1-h3     2022544576756325295
..b2-b3     2286406377370528139
..c2-c3     2666334464745721380
..f2-f3     1359948551007941806
..d2-d3     4459055881158216540
..g2-g3     2388698250037891063
..h2-h3     1691016253163166371
..a2-a4     2493115507450124588
..b2-b4     2297640117530334747
..e2-e3     7478732807317823667
..c2-c4     3067868595132318179
..f2-f4     1905689484049474095
..g2-g4     2050748957027119617
..h2-h4     2546461593049574382
..d2-d4     6636612996377425812
..e2-e4     7614272181524252025
Perft&#40;14&#41;&#58;61885021521585518997, time&#58;   795802 s
Complete output of the program (divided perfts till first 4 levels) is here:
https://raw.githubusercontent.com/ankan ... erft14.txt

After Steven Edwards reported mismatches due to false transposition table positives caused by 64 bit hash signatures, I updated my program to use 128 bit hashes. I also made some performance improvements and changes to utilize multiple GPUs in the machine if available. This run was performed on a machine with 3 Nvidia Titan X GPUs.

Instead of Steven's approach of computing perft(7) of all 96400068 unique(7) positions, I decided to compute perft using the regular method which even though is slightly slower it has the advantage of obtaining divided counts.

Unfortunately my result doesn't match the value reported by Peter Österlund back in 2013. With 128 bit hashes, it's unlikely to be caused by a false positive/hash collision but I can't rule out bugs in my program or random bit flips due to hardware failure/instability.

I am going to try one more time using unique(8) or unique(9) positions - or use the same regular method but using different set of random zobrist keys to rule out possibility of hardware instability.
If you already use a 128 bit hash, it would be almost possible to use a compressed FEN or something similar, with exact position? Maybe it would be a little slower but it will be very perfect.

I remember a while back there were EPD compression schemes around 160 bits

stegemma · Post by **stegemma** » Tue Aug 16, 2016 9:36 am

Dann Corbit wrote:
stegemma wrote:[...]

If you already use a 128 bit hash, it would be almost possible to use a compressed FEN or something similar, with exact position? Maybe it would be a little slower but it will be very perfect.
I remember a while back there were EPD compression schemes around 160 bits

In effect you can even use two 128 bit values (a 256 bit perfect position), to speed up position definition. I think that let your computer runs for weeks or months to get back some uncertain result is a little brain-damaged

sje · Post by **sje** » Wed Aug 17, 2016 1:17 am

As had been mentioned long ago, it is relatively utproductive to increase the signature bit length well past the point where the probability of a false match is far less than the probability of error caused by cosmic ray events.

And a cosmic ray event is what has almost certainly happened in one of the two perft(14) answers seen so far. My guess is that Peter's answer is correct as regular RAM is more resistant to disruption than GPU RAM. But that's only a guess.

Of the 54,100,000+ perft(7) results generated so far by Symbolic, more than 14 million have been verified. This is another piece of evidence that cosmic ray events are relatively infrequent. What about the 40+ million not yet verified? What about the 40+ million not yet calculated? Only time will tell. And Symbolic's hardware is also susceptible to cosmic ray events.

----

Note: "cosmic ray event" here also includes the case of a random, nearby case of a fissioning atom as the source of an interfering particle.

stegemma · Post by **stegemma** » Wed Aug 17, 2016 9:22 am

sje wrote:As had been mentioned long ago, it is relatively utproductive to increase the signature bit length well past the point where the probability of a false match is far less than the probability of error caused by cosmic ray events.

And a cosmic ray event is what has almost certainly happened in one of the two perft(14) answers seen so far. My guess is that Peter's answer is correct as regular RAM is more resistant to disruption than GPU RAM. But that's only a guess.

Of the 54,100,000+ perft(7) results generated so far by Symbolic, more than 14 million have been verified. This is another piece of evidence that cosmic ray events are relatively infrequent. What about the 40+ million not yet verified? What about the 40+ million not yet calculated? Only time will tell. And Symbolic's hardware is also susceptible to cosmic ray events.

----

Note: "cosmic ray event" here also includes the case of a random, nearby case of a fissioning atom as the source of an interfering particle.

That's true but with actual hardware (or maybe near in the future) doubling RAM to obtain an exact hash could be the optimal solution. Remember that with "not perfect hashing" you just add an error probability to any possible external event, like cosmic rays, hardware faults, nuclear explosions, alien invasion... and so on.

smatovic · Post by **smatovic** » Wed Aug 17, 2016 2:12 pm

the server brand Tesla Pascal offers native ECC protected HBM2 memory...

https://devblogs.nvidia.com/parallelfor ... de-pascal/

Code: Select all

ECC Memory

Another HBM2 benefit is native support for error correcting code &#40;ECC&#41; funtionality, which provides higher reliability for technical computing applications that are sensitive to data corruption, such as in large-scale clusters and supercomputers, where GPUs process large datasets with long application run times.

ECC technology detects and corrects single-bit soft errors before they affect the system. In comparison, GDDR5 does not provide internal ECC protection of the contents of memory and is limited to error detection of the GDDR5 bus only&#58; Errors in the memory controller or the DRAM itself are not detected.

GK110 Kepler GPUs offered ECC protection for GDDR5 by allocating some of the available memory for explicit ECC storage. 6.25% of the overall GDDR5 is reserved for ECC bits. In the case of a 12 GB Tesla K40 &#40;for example&#41;, 750 MB of its total memory is reserved for ECC operation, resulting in 11.25 GB &#40;out of 12 GB&#41; of available memory with ECC turned on for Tesla K40. Also, accessing ECC bits causes a small decrease in memory bandwidth compared to the non-ECC case. Since HBM2 supports ECC natively, Tesla P100 does not suffer from the capacity overhead, and ECC can be active at all times without a bandwidth penalty. Like the GK110 GPU, the GP100 GPU’s register files, shared memories, L1 cache, L2 cache, and the Tesla P100 accelerator’s HBM2 DRAM are protected by a Single&#8208;Error Correct Double&#8208;Error Detect &#40;SECDED&#41; ECC code.

smatovic · Post by **smatovic** » Wed Aug 17, 2016 2:46 pm

...with ECC it is still possible that two or more bits are flipped without correction,
but the frequency of this event is much lower.

Numbers from Jaguar Cluster with 360 TB RAM:

ECC errors: 350 per Minute
Double bit erros: 1 per 24 hours

http://spectrum.ieee.org/computing/hard ... bad-solder

--
Srdja

sje · Post by **sje** » Sun Aug 21, 2016 5:59 am

Cosmic rays and ECC memory

One, and only one, machine in use for the Perft(14) project has ECC RAM. It's a 2006 quad core 2.66 GHz Xeon Mac Pro, now with 32 GiB memory -- quite fast ten years ago, not so much today. I've checked its logs and no ECC errors have ever been detected; this for a machine which runs nearly 24/7.

This may be due in part to its location being only about 70 meters above sea level with lots of atmosphere above.

smatovic · Post by **smatovic** » Sun Aug 21, 2016 8:00 pm

Numbers from Jaguar Cluster with 360 TB RAM:

ECC errors: 350 per Minute
Double bit erros: 1 per 24 hours

just for the record:
the above numbers are imo not suitable to make general assumptions
of cosmic ray bit flip events in memory...

another study with data on correctable and uncorrectable errors by Google from
2010 shows that error rate and ratio is clearly platform dependent...

http://static.googleusercontent.com/med ... /35162.pdf

--
Srdja

Dann Corbit · Post by **Dann Corbit** » Tue Aug 23, 2016 1:04 pm

sje wrote:Cosmic rays and ECC memory

One, and only one, machine in use for the Perft(14) project has ECC RAM. It's a 2006 quad core 2.66 GHz Xeon Mac Pro, now with 32 GiB memory -- quite fast ten years ago, not so much today. I've checked its logs and no ECC errors have ever been detected; this for a machine which runs nearly 24/7.

This may be due in part to its location being only about 70 meters above sea level with lots of atmosphere above.

Interesting article on memory reliability:
https://www.pugetsystems.com/labs/artic ... emory-520/

yet another attempt on Perft(14)

Re: yet another attempt on Perft(14)

Re: yet another attempt on Perft(14)

Re: yet another attempt on Perft(14)

The lesser evil

Re: The lesser evil

Re: Cosmic rays - ECC

Re: Cosmic rays - ECC

Cosmic rays and ECC memory

Re: Cosmic rays - ECC -hardware dependent

Re: Cosmic rays and ECC memory