Sixteen gigabytes

sje · Post by **sje** » Tue Mar 13, 2007 8:12 pm

Sixteen gigabytes is 17,179,869,184 bytes and it's the upper limit on RAM for my MacPro when loaded with eight times 2 GByte FB-DIMMs. That number is about eight thousand times the amount of memory available to the Chess 4.x program of some three decades ago, and some two million times more than the first microcomputer programs.

(Note: the actual MacPro limit may be higher at 64 GByte, but the current upper limit on a single FB-DIMM is only 2 GByte.)

What to do with this embarrassment of riches?

First, it's well over US$2,000 for that 16 GByte and that's why my box has but a single GByte for now.

Second, the current OS doesn't provide good support of application memory space larger than two GByte per process. This is expected to change soon, but it's not there yet.

Given that these and any other issues will be solved, what then to do with respect to chess program design for a 16 GByte computer?

Segmentation is an option for program design where an application can be split into multiple processes, each with a 2 GByte limit. This would allow for continued use of 32 bit pointers. On the downside, it would likely require some overhead and some kludge to support interprocess random access memory access. The alternative is to bite the bullet and transition entirely to 64 bit pointers.

Some benefits will accrue automatically and one of these is file buffering supplied by the OS. This could be helped along by manual buffering of the opening book and various tablebase files.

Transposition table size increases are an obvious approach. What would be the effect on a program when its main transposition table is hundred times larger than in an earlier incarnation? Would it be sufficient to just store more positions, or would it better to consider adding more information to each entry?

Current A/B searchers look at only one leg at a time of the search tree at any one time. With a 16 GByte system, would it become feasible and useful to store a large portion of the upper tree? This would help save much calculation in an iterative program, and portions of the tree could be retained from move to move.

Anything in a static evaluation function that's currently done by calculation would be a candidate for conversion to table look-up. These tables could be computed beforehand, or be implemented as transposition tables.

Bill Rogers · Post by **Bill Rogers** » Thu Mar 15, 2007 7:58 pm

You should be shamed of your self. Remember that the first computer ever built which occupied an entire city block and had over 50,000 vaccumm tubes only had 1,024 bytes of memory. Not only that they could only run it for a few minutes at a time before one or more tubes burned out.
Bill

hgm · Post by **hgm** » Thu Mar 15, 2007 9:01 pm

With a good replacement algorithm the search speed is only a very weak function of hash-table size. E.g. a 4096 times larger table might give you a speedup of a factor 2 if the tree is large enough to even overload the larger table. (The speedup of course saturates as soon as the tree can be completely accomodated.)

I am not sure that hashing results from non-recursive routines (such as evaluation) can give very much benifit: no matter how large your memory, hash probes are quite expensive, as they typically miss the cache, and a cache-line fill takes many hundreds of CPU cycles to perform, in which perhaps a thousand instructions could have been executed. So the thing you hash must really be very expensive to calculate, or the hashing would only slow you down compared to calculating it from scratch. Unless the tables are small enough to fit the cache, but that doesn´t help you finding a use for the 16GB, of course.

One thing for which such a huge amount of memory would be helpful, is for building EGTBs. With 10GB you could build 6-men TBs in RAM, in just a few hours.

sje · Post by **sje** » Thu Mar 15, 2007 11:59 pm

Bill Rogers wrote:You should be shamed of your self. Remember that the first computer ever built which occupied an entire city block and had over 50,000 vaccumm tubes only had 1,024 bytes of memory. Not only that they could only run it for a few minutes at a time before one or more tubes burned out.
Bill

One of my instructors in my undergrad days had worked on the Harvard Mark I, arguably the first general purpose stored program machine. A guy I once worked with started with computers that used CRT electrostatic storage along with mercury delay lines for memory. In high school, I used a machine (HP 2000) that relied on magnetic drum storage.

sje · Post by **sje** » Fri Mar 16, 2007 12:08 am

hgm wrote:I am not sure that hashing results from non-recursive routines (such as evaluation) can give very much benifit: no matter how large your memory, hash probes are quite expensive, as they typically miss the cache, and a cache-line fill takes many hundreds of CPU cycles to perform, in which perhaps a thousand instructions could have been executed. So the thing you hash must really be very expensive to calculate, or the hashing would only slow you down compared to calculating it from scratch.

Symbolic's A/B searcher hashes static evaluations and it pays off with an average 20% hit rate. It also hashes variation sequences used in move ordering, and that helps as well. And there's the main trans table, the pawn trans table, the tablebase trans table, and the hint trans table. All of these would benefit from an increase in size if resources were available.

Dann Corbit · Post by **Dann Corbit** » Fri Mar 16, 2007 2:18 am

The volume of memory per unit cost will continue to drop exponentially.

Tragically, with RAM, the speed increase is only linear, as opposed to the CPU speed increase which is exponential. That means that we are screaming towards a fundamental bottleneck in computation.

At any rate, you can already get systems that address the full 64 bit address space (e.g. 64 bit Unix flavors -- several of which are Linux variants).

64 bit Windows is going mainstream now also. I guess that we will laugh at 2GB in ten years the way that we laugh at 640K right now.

sje · Post by **sje** » Fri Mar 16, 2007 7:13 am

There are roadblocks coming up quickly for both CPUs and memory. Integrated circuit feature sizes are closing in on one hundred atoms per element. Certain quantum effects, such as insulation between adjacent elements, require a minimum number of atoms and there's no known way to beat the limit.

Then there are the problems of heat generation where it becomes impossible to conduct thermal energy away faster than it's generated.

Quantum computing and vacuum engineering are still mostly science fiction, so there's no immediate hope there.

Maybe the upcoming hardware barrier is a good thing. It just might encourage a well needed sense of discipline in software engineering.

Bill Rogers · Post by **Bill Rogers** » Fri Mar 16, 2007 7:03 pm

Hello Stephen
One possible thing that you might try is to load a chess program in that memory along with its book and try running it from there. In this way at least the book is all contain in ram and should speed up the program at least in theory, well maybe just a little.
Bill

sje · Post by **sje** » Fri Mar 16, 2007 8:47 pm

Bill Rogers wrote:One possible thing that you might try is to load a chess program in that memory along with its book and try running it from there. In this way at least the book is all contain in ram and should speed up the program at least in theory, well maybe just a little.

Symbolic has this option, and it is usually employed when playing on a server. For much larger books, a Unix like OS will hopefully do intelligent file buffering.

Symbolic's cognitive search probes the book and the tablebases at every node. It can get away with such time consuming work because the node count per search is targeted at a thousand or less.

I vaguely remember reading about a chess program from some decades ago that would load its book into memory at the start of the game and then evict it once a probe missed and a search was needed. Not such a good idea as transposing back into the book after a miss or two is common.

A book made from all the recorded master level games ever played would fit into a gigabyte file with suitable compression. A well edited book might take ten to a hundred times less storage.

--------

One idea for using more memory is to defactor some of the evaluation routines. For example, instead of having a single routine to evaluate a white king on any square, have 64 routines, one for each square. Might take a hit on code cache performance, though. A related suggestion would be to have separate move generators for white and black, and maybe to have 48 pawn move generator routines per color.

sje · Post by **sje** » Tue Jun 12, 2007 5:58 am

The price on 667 MHz DDR2 ECC fully buffered DIMMs has come down to about US$1200 per 16 GB (8 x 2 GB). Recently, 4 GB DIMMs have become available although at about twice the price per GB. These 4 GB DIMMs are compatible with the Mac Pro towers, so an owner could fit 32 GB in one of these boxes -- at a price close to US$5000.

--------

Apple has released a beta version of OS/X 10.5 Leopard and claims that it's a full 64 bit OS that runs both 32 bit and 64 bit applications simultaneously. If this is strictly true, it means that all the older PowerPC Macs (except for the G5 models) and even the current Mac Mini (uses the 32 bit only Core Duo chip) will not be able to run the new OS version.

Sixteen gigabytes

Sixteen gigabytes

Re: Sixteen gigabytes

Re: Sixteen gigabytes

Re: Sixteen gigabytes

Re: Sixteen gigabytes

Re: Sixteen gigabytes

Re: Sixteen gigabytes

Re: Sixteen gigabytes

Re: Sixteen gigabytes

An update