If you already use a 128 bit hash, it would be almost possible to use a compressed FEN or something similar, with exact position? Maybe it would be a little slower but it will be very perfect.ankan wrote:Here is my latest attempt in computing perft(14) using my GPU perft program:Complete output of the program (divided perfts till first 4 levels) is here:Code: Select all
Perft(03): 8902, time: 0.201 s Perft(04): 197281, time: 0.001 s Perft(05): 4865609, time: 0.001 s Perft(06): 119060324, time: 0.002 s Perft(07): 3195901860, time: 0.016 s Perft(08): 84998978956, time: 0.206 s Perft(09): 2439530234167, time: 1.414 s Perft(10): 69352859712417, time: 15.278 s Perft(11): 2097651003696806, time: 179.091 s Perft(12): 62854969236701747, time: 2932.64 s Perft(13): 1981066775000396239, time: 42792.7 s ..b1-a3 1988096314966752669 ..g1-f3 2585178594020547799 ..b1-c3 2638654908094167513 ..a2-a3 1707945110805813310 ..g1-h3 2022544576756325295 ..b2-b3 2286406377370528139 ..c2-c3 2666334464745721380 ..f2-f3 1359948551007941806 ..d2-d3 4459055881158216540 ..g2-g3 2388698250037891063 ..h2-h3 1691016253163166371 ..a2-a4 2493115507450124588 ..b2-b4 2297640117530334747 ..e2-e3 7478732807317823667 ..c2-c4 3067868595132318179 ..f2-f4 1905689484049474095 ..g2-g4 2050748957027119617 ..h2-h4 2546461593049574382 ..d2-d4 6636612996377425812 ..e2-e4 7614272181524252025 Perft(14):61885021521585518997, time: 795802 s
https://raw.githubusercontent.com/ankan ... erft14.txt
After Steven Edwards reported mismatches due to false transposition table positives caused by 64 bit hash signatures, I updated my program to use 128 bit hashes. I also made some performance improvements and changes to utilize multiple GPUs in the machine if available. This run was performed on a machine with 3 Nvidia Titan X GPUs.
Instead of Steven's approach of computing perft(7) of all 96400068 unique(7) positions, I decided to compute perft using the regular method which even though is slightly slower it has the advantage of obtaining divided counts.
Unfortunately my result doesn't match the value reported by Peter Österlund back in 2013. With 128 bit hashes, it's unlikely to be caused by a false positive/hash collision but I can't rule out bugs in my program or random bit flips due to hardware failure/instability.
I am going to try one more time using unique(8) or unique(9) positions - or use the same regular method but using different set of random zobrist keys to rule out possibility of hardware instability.
yet another attempt on Perft(14)
Moderators: hgm, Rebel, chrisw
-
- Posts: 859
- Joined: Mon Aug 10, 2009 10:05 pm
- Location: Italy
- Full name: Stefano Gemma
Re: yet another attempt on Perft(14)
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
http://www.linformatica.com
-
- Posts: 12540
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: yet another attempt on Perft(14)
I remember a while back there were EPD compression schemes around 160 bitsstegemma wrote:If you already use a 128 bit hash, it would be almost possible to use a compressed FEN or something similar, with exact position? Maybe it would be a little slower but it will be very perfect.ankan wrote:Here is my latest attempt in computing perft(14) using my GPU perft program:Complete output of the program (divided perfts till first 4 levels) is here:Code: Select all
Perft(03): 8902, time: 0.201 s Perft(04): 197281, time: 0.001 s Perft(05): 4865609, time: 0.001 s Perft(06): 119060324, time: 0.002 s Perft(07): 3195901860, time: 0.016 s Perft(08): 84998978956, time: 0.206 s Perft(09): 2439530234167, time: 1.414 s Perft(10): 69352859712417, time: 15.278 s Perft(11): 2097651003696806, time: 179.091 s Perft(12): 62854969236701747, time: 2932.64 s Perft(13): 1981066775000396239, time: 42792.7 s ..b1-a3 1988096314966752669 ..g1-f3 2585178594020547799 ..b1-c3 2638654908094167513 ..a2-a3 1707945110805813310 ..g1-h3 2022544576756325295 ..b2-b3 2286406377370528139 ..c2-c3 2666334464745721380 ..f2-f3 1359948551007941806 ..d2-d3 4459055881158216540 ..g2-g3 2388698250037891063 ..h2-h3 1691016253163166371 ..a2-a4 2493115507450124588 ..b2-b4 2297640117530334747 ..e2-e3 7478732807317823667 ..c2-c4 3067868595132318179 ..f2-f4 1905689484049474095 ..g2-g4 2050748957027119617 ..h2-h4 2546461593049574382 ..d2-d4 6636612996377425812 ..e2-e4 7614272181524252025 Perft(14):61885021521585518997, time: 795802 s
https://raw.githubusercontent.com/ankan ... erft14.txt
After Steven Edwards reported mismatches due to false transposition table positives caused by 64 bit hash signatures, I updated my program to use 128 bit hashes. I also made some performance improvements and changes to utilize multiple GPUs in the machine if available. This run was performed on a machine with 3 Nvidia Titan X GPUs.
Instead of Steven's approach of computing perft(7) of all 96400068 unique(7) positions, I decided to compute perft using the regular method which even though is slightly slower it has the advantage of obtaining divided counts.
Unfortunately my result doesn't match the value reported by Peter Österlund back in 2013. With 128 bit hashes, it's unlikely to be caused by a false positive/hash collision but I can't rule out bugs in my program or random bit flips due to hardware failure/instability.
I am going to try one more time using unique(8) or unique(9) positions - or use the same regular method but using different set of random zobrist keys to rule out possibility of hardware instability.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 859
- Joined: Mon Aug 10, 2009 10:05 pm
- Location: Italy
- Full name: Stefano Gemma
Re: yet another attempt on Perft(14)
In effect you can even use two 128 bit values (a 256 bit perfect position), to speed up position definition. I think that let your computer runs for weeks or months to get back some uncertain result is a little brain-damagedDann Corbit wrote:I remember a while back there were EPD compression schemes around 160 bitsstegemma wrote:[...]
If you already use a 128 bit hash, it would be almost possible to use a compressed FEN or something similar, with exact position? Maybe it would be a little slower but it will be very perfect.
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
http://www.linformatica.com
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
The lesser evil
As had been mentioned long ago, it is relatively utproductive to increase the signature bit length well past the point where the probability of a false match is far less than the probability of error caused by cosmic ray events.
And a cosmic ray event is what has almost certainly happened in one of the two perft(14) answers seen so far. My guess is that Peter's answer is correct as regular RAM is more resistant to disruption than GPU RAM. But that's only a guess.
Of the 54,100,000+ perft(7) results generated so far by Symbolic, more than 14 million have been verified. This is another piece of evidence that cosmic ray events are relatively infrequent. What about the 40+ million not yet verified? What about the 40+ million not yet calculated? Only time will tell. And Symbolic's hardware is also susceptible to cosmic ray events.
----
Note: "cosmic ray event" here also includes the case of a random, nearby case of a fissioning atom as the source of an interfering particle.
And a cosmic ray event is what has almost certainly happened in one of the two perft(14) answers seen so far. My guess is that Peter's answer is correct as regular RAM is more resistant to disruption than GPU RAM. But that's only a guess.
Of the 54,100,000+ perft(7) results generated so far by Symbolic, more than 14 million have been verified. This is another piece of evidence that cosmic ray events are relatively infrequent. What about the 40+ million not yet verified? What about the 40+ million not yet calculated? Only time will tell. And Symbolic's hardware is also susceptible to cosmic ray events.
----
Note: "cosmic ray event" here also includes the case of a random, nearby case of a fissioning atom as the source of an interfering particle.
-
- Posts: 859
- Joined: Mon Aug 10, 2009 10:05 pm
- Location: Italy
- Full name: Stefano Gemma
Re: The lesser evil
That's true but with actual hardware (or maybe near in the future) doubling RAM to obtain an exact hash could be the optimal solution. Remember that with "not perfect hashing" you just add an error probability to any possible external event, like cosmic rays, hardware faults, nuclear explosions, alien invasion... and so on.sje wrote:As had been mentioned long ago, it is relatively utproductive to increase the signature bit length well past the point where the probability of a false match is far less than the probability of error caused by cosmic ray events.
And a cosmic ray event is what has almost certainly happened in one of the two perft(14) answers seen so far. My guess is that Peter's answer is correct as regular RAM is more resistant to disruption than GPU RAM. But that's only a guess.
Of the 54,100,000+ perft(7) results generated so far by Symbolic, more than 14 million have been verified. This is another piece of evidence that cosmic ray events are relatively infrequent. What about the 40+ million not yet verified? What about the 40+ million not yet calculated? Only time will tell. And Symbolic's hardware is also susceptible to cosmic ray events.
----
Note: "cosmic ray event" here also includes the case of a random, nearby case of a fissioning atom as the source of an interfering particle.
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
http://www.linformatica.com
-
- Posts: 2645
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Cosmic rays - ECC
the server brand Tesla Pascal offers native ECC protected HBM2 memory...
https://devblogs.nvidia.com/parallelfor ... de-pascal/
https://devblogs.nvidia.com/parallelfor ... de-pascal/
Code: Select all
ECC Memory
Another HBM2 benefit is native support for error correcting code (ECC) funtionality, which provides higher reliability for technical computing applications that are sensitive to data corruption, such as in large-scale clusters and supercomputers, where GPUs process large datasets with long application run times.
ECC technology detects and corrects single-bit soft errors before they affect the system. In comparison, GDDR5 does not provide internal ECC protection of the contents of memory and is limited to error detection of the GDDR5 bus only: Errors in the memory controller or the DRAM itself are not detected.
GK110 Kepler GPUs offered ECC protection for GDDR5 by allocating some of the available memory for explicit ECC storage. 6.25% of the overall GDDR5 is reserved for ECC bits. In the case of a 12 GB Tesla K40 (for example), 750 MB of its total memory is reserved for ECC operation, resulting in 11.25 GB (out of 12 GB) of available memory with ECC turned on for Tesla K40. Also, accessing ECC bits causes a small decrease in memory bandwidth compared to the non-ECC case. Since HBM2 supports ECC natively, Tesla P100 does not suffer from the capacity overhead, and ECC can be active at all times without a bandwidth penalty. Like the GK110 GPU, the GP100 GPU’s register files, shared memories, L1 cache, L2 cache, and the Tesla P100 accelerator’s HBM2 DRAM are protected by a Single‐Error Correct Double‐Error Detect (SECDED) ECC code.
-
- Posts: 2645
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Cosmic rays - ECC
...with ECC it is still possible that two or more bits are flipped without correction,
but the frequency of this event is much lower.
Numbers from Jaguar Cluster with 360 TB RAM:
ECC errors: 350 per Minute
Double bit erros: 1 per 24 hours
http://spectrum.ieee.org/computing/hard ... bad-solder
--
Srdja
but the frequency of this event is much lower.
Numbers from Jaguar Cluster with 360 TB RAM:
ECC errors: 350 per Minute
Double bit erros: 1 per 24 hours
http://spectrum.ieee.org/computing/hard ... bad-solder
--
Srdja
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Cosmic rays and ECC memory
Cosmic rays and ECC memory
One, and only one, machine in use for the Perft(14) project has ECC RAM. It's a 2006 quad core 2.66 GHz Xeon Mac Pro, now with 32 GiB memory -- quite fast ten years ago, not so much today. I've checked its logs and no ECC errors have ever been detected; this for a machine which runs nearly 24/7.
This may be due in part to its location being only about 70 meters above sea level with lots of atmosphere above.
One, and only one, machine in use for the Perft(14) project has ECC RAM. It's a 2006 quad core 2.66 GHz Xeon Mac Pro, now with 32 GiB memory -- quite fast ten years ago, not so much today. I've checked its logs and no ECC errors have ever been detected; this for a machine which runs nearly 24/7.
This may be due in part to its location being only about 70 meters above sea level with lots of atmosphere above.
-
- Posts: 2645
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Cosmic rays - ECC -hardware dependent
just for the record:Numbers from Jaguar Cluster with 360 TB RAM:
ECC errors: 350 per Minute
Double bit erros: 1 per 24 hours
the above numbers are imo not suitable to make general assumptions
of cosmic ray bit flip events in memory...
another study with data on correctable and uncorrectable errors by Google from
2010 shows that error rate and ratio is clearly platform dependent...
http://static.googleusercontent.com/med ... /35162.pdf
--
Srdja
-
- Posts: 12540
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Cosmic rays and ECC memory
Interesting article on memory reliability:sje wrote:Cosmic rays and ECC memory
One, and only one, machine in use for the Perft(14) project has ECC RAM. It's a 2006 quad core 2.66 GHz Xeon Mac Pro, now with 32 GiB memory -- quite fast ten years ago, not so much today. I've checked its logs and no ECC errors have ever been detected; this for a machine which runs nearly 24/7.
This may be due in part to its location being only about 70 meters above sea level with lots of atmosphere above.
https://www.pugetsystems.com/labs/artic ... emory-520/
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.