Perft(14) revisited

sje · Post by **sje** » Fri Aug 15, 2014 4:35 pm

Now at 341,564 perft(7) results. That's 's about 1/282 of the final count; there's much more to be done.

I made some minor changes in the transposition table code to better utilize available physical RAM; an assumption is made that the program will be the only big RAM user running at a given time. I also added some experimental code which uses bzero() to reset transposition table total content instead of the prior method of calling an element reset routine for each table element.

So each of the runs in progress was interrupted and then restarted with the revised program.

Symbolic has five different transposition tables, each with a different element class and a corresponding table class. The table classes each inherit from a general base class to make things more inform.

As a result of these and other design decisions, the element count for any transposition table must be 2^N where N is from 8 up to 32. You can see how this could leave a big chunk of RAM unused. I had thought of using a different scheme, but I'm not sure that the extra complexity of element address calculation using division instead of shifts and masks is the better idea.

ZirconiumX · Post by **ZirconiumX** » Fri Aug 15, 2014 4:39 pm

Use two tables. A big table for high depth entries, and a small one for smaller entries. This will use up much more memory, which will most likely pay off vs the added branch.

Matthew:out

sje · Post by **sje** » Fri Aug 15, 2014 4:50 pm

Symbolic uses a two-way table for storing intermediate perft() results; each element is paired with its dual at (address xor 1). At when storing a result, the element with the lesser draft at the two addresses is the one overwritten.

Not all tables use this scheme. For example, the tablebase transposition table does a simple overwrite with all elements considered equal with respect to draft and ply.

sje · Post by **sje** » Sat Aug 16, 2014 11:03 pm

Now at 396,779 perft(7) calculations, about 1/243 of the total.

The next work unit to finish will be wu7.003 (94% done) which is running an a 3.4 GHz quad core Core i5 with a 16 GB (2^29 entry) transposition table. To avoid false positive matches, 128 bit hash signatures are used.

The machine I used to calculate perft(13) over an 14 month run is also participating. It's doing wu7.002 (76% done) using a 3.4 GHz quad core Core i7 with an 8 GB (2^28 entry) transposition table.

sje · Post by **sje** » Sun Aug 17, 2014 9:50 am

Work unit wu7.003 has completed. The gzip result can be found at https://dl.dropboxusercontent.com/u/316 ... 003.sum.gz

Verification is welcome.

Work unit wu7.008 has started.

"A journey of a thousand miles begins with a single step."

sje · Post by **sje** » Mon Aug 18, 2014 7:21 am

Calculation for work unit wu7.001 has finished.
Input: https://dl.dropboxusercontent.com/u/316 ... wu7.001.gz
Result: https://dl.dropboxusercontent.com/u/316 ... 001.sum.gz
Verification is welcome.

Completed work units (3): 001, 003, 964
In progress work units (8): 000, 002, 004-009

Work unit wu7.010 starts next.

sje · Post by **sje** » Tue Aug 19, 2014 9:08 am

It looks like I'll have to write an OpenCL perft() assist kernel to see if a significantly better rate of progress can be had.

One difficulty here is that while the OpenCL specification says what can be done, it can't say how efficiently anything can be done because of the differing target hardware environments. So it's necessary to do a full design, build, and run to get the information needed to see if an effort was justified. What may be as fast as lightning on one GPU could be as slow as a government tax refund on another.

Another concern is that at present it appears that only C and not C++ kernels are supported. Also, while I've gotten some OpenCL test code running on a couple of different Mac OS/X machines, I haven't yet done any investigation of the Linux side of things. I have no capability to do any OpenCL work on other operating systems; I'll leave that to others.

While I'm not eager to write any C code after twenty years of abstention, I'm also not too happy with the idea of having to wait until 2020 to get the perft(14) result using only my own CPU-only code and hardware.

My idea at present is to write a very simple, non bitboard, OpenCL perft() kernel with a small memory footprint and using mostly only 16 bit integer variables. With a proper test harness, the kernel code can be tested on a CPU before trying to run it under OpenCL. The final result could then be distributed freely so that others may see if it might work on their machines. It is also possible that the perft() kernel source could be expanded and adapted for use in a general chess search.

--------

Work units in progress:

000 99%
002 96%
004 38%
005 23%
006 23%
007 5%
008 15%
009 4%

sje · Post by **sje** » Tue Aug 19, 2014 11:18 am

Calculation for work unit wu7.000 has finished.
Input: https://dl.dropboxusercontent.com/u/316 ... wu7.000.gz
Result: https://dl.dropboxusercontent.com/u/316 ... 000.sum.gz
Verification is welcome.

Completed (4): 000-001, 003, 964
In progress (8): 002, 004-010
Not yet started (953): 011-963

002: 97%
004: 39%
005: 23%
006: 23%
007: 5%
008: 16%
009: 4%
010: 0%

Instead of posting a report for each work unit completion, I'll just post a weekly summary as the current production rate is rather low.

Perft(14) revisited

Status and a revision

Re: Status and a revision

Re: Status and a revision

Status

wu7.003 completed

Calculation for work unit wu7.001 has finished

The plan

Status