Perft(14) Weekly Status Reports for 2015

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

petero2
Posts: 689
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: Projection estimations

Post by petero2 »

sje wrote:I have serious doubts about the reliability of 64 bit signatures over extended calculation times. I've given examples of 64 bit false positives and have shown how the number of required signature bits is roughly proportional to log2(N) where N is the probe count. Others have posted on how their chess engines have had 64 bit false positives and how such are handled in a search.

A second area of doubt comes from the possibility of undetected I/O disk errors where an application could beat on disk more in a single month than what is their total lifetime projected use. While undetected hard drive I/O errors are uncommon with modern hardware, they are common enough that I've seen a couple of them.
I addressed both of those concerns in my algorithm description. To summarize, in case you don't want to read the whole post describing the algorithm:

I use 64+64 bit signatures, although some of the bits in the second signature are ignored. The number of ignored bits depend on the hash table size. I computed the probability for a hash collision ruining the computation to be 1/28000.

I/O errors are handled because all data is compressed using the xz program, which uses CRC32 checksums to detect data corruption.

I also addressed the cosmic ray concern by using a lockless hash table algorithm that is almost immune to random bit errors.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Projection estimations

Post by sje »

Does your compression usage cover every disk transaction? Regular disk I/O has a checksum guard on every sector or block, yet still some errors go undetected.

From Ankan's parallel calculation of work units 400-799 using 64 bit signatures, I've detected two errors so far. One was due to a stop/restart in a calculation and the second was a random bit flip (0x32 -> 0x72) due to unknown causes. Nearly all of his results have not been confirmed for errors from false positives, although I will start with these in about five months from now. His results will be very helpful, but by themselves cannot have the same confidence as calculations 128 bit signatures.

As for cosmic ray damage, it is real and it is not limited to memory chips. A typical CPU has significant area including a lot of cache and so presents a target for random bit flips. This will be true regardless of the hash algorithm in use.
petero2
Posts: 689
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: Projection estimations

Post by petero2 »

The techniques I used significantly reduce the risk of errors caused by hard disk failure and cosmic rays, but the remaining error probability is not zero. That fact, combined with other types of possible hardware errors and the possibility of software bugs, either in code I wrote, or in code I depend on, is the reason why independent verification is needed.

Regarding hard disk transactions, all data used in the computation of the unique positions after 11 ply is compressed using xz and therefore protected by its CRC32 checksums. At the same time it makes the data vulnerable to potential data corrupting bugs in xz. Log files are however not compressed, and while they are mostly not essential for the perft computation, there is one exception. The final perft computation was split in 73 parts, the result for each part was written to a logfile, and the sum was computed by extracting the 73 sub-totals. If a disk error would corrupt the roughly 1.5KB of data that stored the sub-totals, that could go undetected.

Regarding the hash table algorithm, it provides roughly the same protection as if the hash table would have been stored in ECC memory. As you mention, cosmic rays can affect other parts of the computer system too that would not have been protected by the hash table algorithm.

This wikipedia article claims that many processors use error correcting codes for on-chip cache, but it does not contain information about modern CPUs, so I don't know if that is still true.

It is hard to find data on failure probabilities for computers caused by cosmic rays, but here are two articles I found:

How Cosmic Rays Cause Computer Downtime

Cosmic Rays Don't Strike Twice: Understanding the Nature of DRAM Errors and the Implications for System Design
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Perft(14) Weekly Status 2015-11-22

Post by sje »

Perft(14) Weekly Status 2015-11-22

Symbolic has produced more than 25,800,000 perft(7) results so far, about 26.76% of the 96,400,068 needed.

Day count: 473
Estimated remaining day count: 1,294
Estimated total day count: 1,767

Average throughput: 54,545 results/day
Effective frequency: 82.00 GHz

Work units not yet started (695): 269-963

Sum of perft()s: 591,477,040,716,903,326
Sum of products: 3,351,373,550,780,408,051
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Perft(14) Weekly Status 2015-11-29

Post by sje »

Perft(14) Weekly Status 2015-11-29

Symbolic has produced more than 26,700,000 perft(7) results so far, about 27.70% of the 96,400,068 needed.

Day count: 480
Estimated remaining day count: 1,253
Estimated total day count: 1,733

Average throughput: 55,625 results/day
Effective frequency: 86.22 GHz

Work units not yet started (686): 278-963

Sum of perft()s: 610,181,587,017,265,262
Sum of products: 3,575,828,106,384,751,283
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Perft(14) Weekly Status 2015-12-06

Post by sje »

Perft(14) Weekly Status 2015-12-06

Symbolic has produced more than 27,200,000 perft(7) results so far, about 28.22% of the 96,400,068 needed.

Day count: 487
Estimated remaining day count: 1,239
Estimated total day count: 1,726

Average throughput: 55,852 results/day
Effective frequency: 88.23 GHz

Work units not yet started (681): 283-963

Sum of perft()s: 621,967,351,718,060,139
Sum of products: 3,713,382,165,227,293,895
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Perft(14) Weekly Status 2015-12-13

Post by sje »

Perft(14) Weekly Status 2015-12-13

Symbolic has produced more than 27,800,000 perft(7) results so far, about 28.84% of the 96,400,068 needed.

Day count: 494
Estimated remaining day count: 1,219
Estimated total day count: 1,713

Average throughput: 56,275 results/day
Effective frequency: 92.94 GHz

Work units not yet started (675): 289-963

Sum of perft()s: 643,162,901,959,059,398
Sum of products: 3,967,728,768,119,285,003
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Perft(14) Weekly Status 2015-12-20

Post by sje »

Perft(14) Weekly Status 2015-12-20

Symbolic has produced more than 28,300,000 perft(7) results so far, about 29.36% of the 96,400,068 needed.

Day count: 501
Estimated remaining day count: 1,206
Estimated total day count: 1,707

Average throughput: 56,487 results/day
Effective frequency: 96.68 GHz

Work units not yet started (670): 294-963

Sum of perft()s: 661,322,412,507,939,838
Sum of products: 4,185,642,894,705,850,283
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Perft(14) Weekly Status 2015-12-27

Post by sje »

Perft(14) Weekly Status 2015-12-27

Symbolic has produced more than 28,900,000 perft(7) results so far, about 29.88% of the 96,400,068 needed.

Day count: 508
Estimated remaining day count: 1,187
Estimated total day count: 1,695

Average throughput: 56,890 results/day
Effective frequency: 100.17 GHz

Work units not yet started (664): 300-963

Sum of perft()s: 681,433,406,220,554,455
Sum of products: 4,426,974,819,257,225,687
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Perft(14) Weekly Status Reports for 2015

Post by sje »