Perft(13) [3.4 GHz Core i7-2600, 16 GB RAM]

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Perft(13) [3.4 GHz Core i7-2600, 16 GB RAM]

Post by sje »

Perft(13) [3.4 GHz Core i7-2600, 16 GB RAM]

After many tests, I have re-started the perft(13) run on the 3.4 GHz Core i7-2600, 16 GB RAM machine. Instead of building upon the earlier calculations from the earlier partial run on the 2 GHz Core 2 Duo box, I will use those results as check data.

When compared with distinct position counts, there is a count difference at ply > 2 for the checkpoint/restart records as determined from the earlier run. An example:

Code: Select all

r1bqkbnr/pppppppp/n7/8/1P6/2N5/P1PPPPPP/R1BQKBNR b KQkq - 0 2 9 3581733681490
r1bqkbnr/pppppppp/n7/8/1P6/2N5/P1PPPPPP/R1BQKBNR b KQkq - 2 2 9 3581733681490
Two records appear where only one is really needed; this occurs due to the different half move counter values. Note that the two subtotals are the same as they should be.

For ply = 1, there are 20 distinct positions, and there will be 20 checkpoint records.
For ply = 2, there are 400 distinct positions, and there will be 400 checkpoint records.
For ply = 3, there are 5,362 distinct positions, but there will be 5,429 checkpoint records.
For ply = 4, there are 72,078 distinct positions, but there will be 72,467 checkpoint records.

The estimate for the total running time for the perft(13) will be:

Code: Select all

72,467 * (count of completed unique draft 9 records) * (elapsed wall time)
Record counts after 37 minutes:

0 of 20 draft 12
0 of 400 draft 11
0 of 5,362 draft 10
3 of 72,467 draft 9

Total running time estimate: 621 days (ridiculous due to low sample count)
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Perft(13) [3.4 GHz Core i7-2600, 16 GB RAM]

Post by sje »

It is possible that the count of checkpoint records for ply > 2 may vary slightly due to transposition table dynamics. This happens because the checkpoint records have the half move counter and full move number fields while these field are not used in the transposition table.

In any case, the effect is slight and these numbers are used only for time estimates.

The first draft 8 record:

Code: Select all

rnbqkbnr/1ppppppp/8/p7/8/N6N/PPPPPPPP/1RBQKB1R b Kkq - 1 3 8 96823013418
The first draft 9 record:

Code: Select all

rnbqkbnr/1ppppppp/8/p7/8/N7/PPPPPPPP/1RBQKBNR w Kkq - 0 3 9 2736703434862
Record counts after 53 minutes:

0 of 20 draft 12
0 of 400 draft 11
0 of 5,362 draft 10
8 of 72,467 draft 9

Total running time estimate: 333 days (caution: low sample count)
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

After 10 hours

Post by sje »

Record counts after 10 hours:

0 of 20 draft 12
0 of 400 draft 11
9 of 5,362 draft 10
171 of 72,467 draft 9

Total running time estimate: 177 days

That's close to my original estimate of 180 days; the estimates will surely bobble for some time.

Here is the first draft 10 record produced:

Code: Select all

rnbqkbnr/1ppppppp/p7/8/8/N7/PPPPPPPP/1RBQKBNR b Kkq - 1 2 10 55513774849955
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Perft(13) [3.4 GHz Core i7-2600, 16 GB RAM]

Post by sje »

The run uses a 12 GB transposition table with 2^29 entries split evenly between WTM and BTM. Each half of the table is partitioned into 256 regions each guarded by a spinlock. There are eight counting threads running, one for each of the eight hyperthreads supported by the four core Intel i7-2600. Each thread is assigned one of the twenty ply zero moves; the moves are dealt in SAN order from a supervisory thread. When a counting thread completes its draft twelve calculation, it's given another ply zero move to handle (if any remain).

When a store is made into the transposition table, if the record is draft eight or higher then it's also written to the checkpoint file. Upon a restart, the contents of the checkpoint file are read into a fresh transposition table before counting resumes.

The transposition table is four-way associative; up to four entries may be tried with overwrite preference given to the entry having the lowest subtotal count. Unused entries have a subtotal count of zero. Only subtotals of draft two or greater are stored in the table.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

After 24 hours

Post by sje »

Record counts after 24 hours:

0 draft 12
0 draft 11
16 draft 10
323 draft 9

Total running time estimate: 224 days (February 2012)
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

After 38 hours

Post by sje »

Record counts after 38 hours:

0 draft 12
0 draft 11
30 draft 10
534 draft 9
9,841 draft 8

Total running time estimate: 133 days (result due in November 2011)

Don't put too much trust in the time prediction. The data will be better after a week or two.

The 3.4 GHz i7-2600 quad core 16 GB RAM machine is producing results about ten times faster than did the 2 GHz Core 2 Duo dual core 4 GB RAM machine.

Profiling the code shows that about a third of the time is spent in the penultimate depth bulk move counter routine. Only three percent of the time is used or move generation, and just about all the rest is eaten by the execute/retract routines and the associated bitboard database updates. The update routines are doing a big bunch of stuff needed for move selection that's unneeded for perft() calculations, so big speed gains could be made if I wasn't too lazy to rewrite the code.

When I picked up the new machine at the computer shop, I was shown a rack mount dual Xeon hex core with space for 192 GB RAM. I think the price was about US$7,500. So if anyone thinks that my perft(13) run is taking too long, then they're welcome to send a check my way and I'll have the result in a few weeks.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

After 76 hours

Post by sje »

Record counts after 76 hours:

0 draft 12
0 draft 11
61 draft 10
1,066 draft 9
18,048 draft 8

Total running time estimate: 144 days (result due in November 2011)
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

After 136 hours

Post by sje »

Record counts after 136 hours:

0 draft 12
4 draft 11
124 draft 10
2283 draft 9
32591 draft 8

Total running time estimate: 150 days (result due late November 2011)

The four draft 11 records produced so far:

Code: Select all

rnbqkbnr/1ppppppp/p7/8/8/N7/PPPPPPPP/R1BQKBNR w KQkq - 0 2 11 1865423942798492
rnbqkbnr/1ppppppp/p7/8/8/2N5/PPPPPPPP/R1BQKBNR w KQkq - 0 2 11 2458021022756805
rnbqkbnr/1ppppppp/p7/8/8/1P6/P1PPPPPP/RNBQKBNR w KQkq - 0 2 11 2110456650953864
rnbqkbnr/1ppppppp/p7/8/8/P7/1PPPPPPP/RNBQKBNR w KQkq - 0 2 11 1562327606120748
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

After 7 days

Post by sje »

Record counts after 7 days:

0 draft 12
8 draft 11
162 draft 10
3036 draft 9
40107 draft 8

Each of the above represent the number of transposition table stores, so there are some duplicates counted for draft 10 or less.

There will be only 20 draft 12 records (all unique) and 400 draft 11 records (all unique). Based on only the draft 11 record count, the completion fraction is 8/400 = two percent. Based on this, the running time will be 400/8 * 1 week = 50 weeks, but that's certainly an overestimate. I still think that there's about a 50% chance of completing the run in under six months of total time.
User avatar
hgm
Posts: 28353
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: After 7 days

Post by hgm »

I guess there isn't much chance I could beat you to it with only a 2.4GHz Core 2 Duo and 1GB of memory... (And only a single-threaded perft.) I tried a test run for perft(12), and it took about 12 hours to crank out the first perft(10) (after 1. h2h3 h7h6).

The turn-over of perft(7) values, though, is pretty slow. I guess there would hardly be any slow-down if I would use an on-disk hash-table there. Or use your strategy, to prepare the list of 'unique' perft(7) leaves in a file. (I estimate there must be about 100M of those?). Then I would have to do only a 6-ply perft from each of them. That way I would get more mileage out of the in-memory hashtable, which would only have to service a 6-ply tree (where the leaves do not have any info associated with them, and hashing the frontier nodes never gave me a speedup, so that only ply 1-4 (= 8-11 in the total perft(13) tree) need to be hased).

An advantage would also be that perft(6) counts still fit in 32 bits, so I could cram more hash entries into the same memory. (Like 5 entries per 64-byte cache line, consisting of an 8-byte key, a 4 byte count, and 4 bytes to spare for five (packed) 4-bit key extensions, a 7-bit LRU state-counter, and perhaps five count-overflow flags.)