Zobrist key random numbers

bob · Post by **bob** » Mon Feb 02, 2009 5:24 am

hgm wrote:More registers per se cannot make it worse, as the compiler could always refrain from usig them. (But will it?)

Look, I have no experience with 64-bit mode, as I have only one machine that has it, and the OS with which it was delivered prevents me from running in 64-bit mode. So I have no first-hand data on it. I just wanted to point out that most of these out-of-order CPUs really seem to benefit from frequent loading and storing. Which was just as unexpected to me as it will be to you. Naively 'optimizing' the code by eliminating memory variables and better use of registers often produced very counter-productive results.

So I can imagine that an optimizing compiler that uses the same algorithm as before, with just the parameter for the number of registers set to a larger value, would in fact make all the wrong decisions.

Here is my point. X86 already has some 70 rename registers, so there are a _ton_ of registers in use at any one instant in time. All the extra registers add, as far as the CPU is concerned, is that it gives the user a greater namespace to use so that we can avoid unnecessary store/load operations when _we_ run out of registers. So I do not believe that using extra registers can slow down the pipeline itself. The longer instruction opcodes to access the extra registers could be a problem in some cases, no doubt. But every program I have re-compiled to run in 64 bit mode, so far, has run faster than the 32 bit version of the same identical source program... The increase might be very small in some cases, to be certain, but I have yet to find one that actually runs slower. Even in spite of the 64 bit pointer issue and other program changes the 64 bit environment requires. A few have reported issues with tons of pointers, but I don't use "tons"... and have not seen any issues of any kind...

So "no better" might be possible, but "worse" is hard to imagine, when we are just talking about using extra registers and ignoring the 64 bit vs 32 bit pointer / addressing issues. 64 bit virtual memory is also less efficient because of the extra levels of page tables...

bob · Post by **bob** » Mon Feb 02, 2009 5:26 am

diep wrote:
hgm wrote:More registers per se cannot make it worse, as the compiler could always refrain from usig them. (But will it?)

Look, I have no experience with 64-bit mode, as I have only one machine that has it, and the OS with which it was delivered prevents me from running in 64-bit mode. So I have no first-hand data on it. I just wanted to point out that most of these out-of-order CPUs really seem to benefit from frequent loading and storing. Which was just as unexpected to me as it will be to you. Naively 'optimizing' the code by eliminating memory variables and better use of registers often produced very counter-productive results.

So I can imagine that an optimizing compiler that uses the same algorithm as before, with just the parameter for the number of registers set to a larger value, would in fact make all the wrong decisions.
Processors can fake having 16 GPR's and just get away with just a handful, matter of renaming temporarily some registers.

Vincent

Not always. If _I_ can't see the registers, then if I need one and eax-edx are already in use, I have to store one before I can re-use it. And that store can't be hidden by the hardware. If I had additional registers I could avoid lots of temp variable usage, which is the reason machines like the Sparc, etc. have 32 registers... and their hardware _also_ does register renaming, which was done prior to the pentium pro which was Intel's first attempt...

hgm · Post by **hgm** » Mon Feb 02, 2009 10:57 am

Well, like I said, to my big surprise these 'unnecessary' loads and stores are in fact accelerating the code on my 32-bit machines, and register-only code eliminating these loads and stores often has very disappointing performance. The thing pointed out by Wylie might very well be the bottleneck I was running into:

wgarvin wrote: okay, I think when you read a register that you haven't touched in a while, you can get an "ROB read port stall". Only 2 regular plus one index registers can be read from the ROB per cycle. Usually on x86-32 its not an issue because there aren't very many registers so values seldom sit in them until they get cold. But it does suggest that for an x86-64 compiler, using a classical RISC register allocator (e.g. a graph coloring allocator) might not be a good idea.

Of course these loads and stores might in fact be what ruins the efficiency of hyperthreading, due to each hyperthread attempting to use the store unit full time. So optimalization for hyperthreading might have to use a completely different strategy than optimizing for a single thread. And the extra registers might come in very handy then to relieve pressure on the load/store units. This remains to be investigated, but it will have to wait until I can run 64-bit code.

diep · Post by **diep** » Sat Feb 07, 2009 2:00 am

bob wrote:
diep wrote:
hgm wrote:More registers per se cannot make it worse, as the compiler could always refrain from usig them. (But will it?)

Look, I have no experience with 64-bit mode, as I have only one machine that has it, and the OS with which it was delivered prevents me from running in 64-bit mode. So I have no first-hand data on it. I just wanted to point out that most of these out-of-order CPUs really seem to benefit from frequent loading and storing. Which was just as unexpected to me as it will be to you. Naively 'optimizing' the code by eliminating memory variables and better use of registers often produced very counter-productive results.

So I can imagine that an optimizing compiler that uses the same algorithm as before, with just the parameter for the number of registers set to a larger value, would in fact make all the wrong decisions.
Processors can fake having 16 GPR's and just get away with just a handful, matter of renaming temporarily some registers.

Vincent
Not always. If _I_ can't see the registers, then if I need one and eax-edx are already in use, I have to store one before I can re-use it. And that store can't be hidden by the hardware. If I had additional registers I could avoid lots of temp variable usage, which is the reason machines like the Sparc, etc. have 32 registers... and their hardware _also_ does register renaming, which was done prior to the pentium pro which was Intel's first attempt...

AMD can't reorder stores indeed, intel core2 and later can it seems.
(arguably intel MUST do it as they have just 1 read port versus AMD has 2)

The difference between the 2 architectures is HUGE.

Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers

Re: Zobrist key random numbers