Here is my point. X86 already has some 70 rename registers, so there are a _ton_ of registers in use at any one instant in time. All the extra registers add, as far as the CPU is concerned, is that it gives the user a greater namespace to use so that we can avoid unnecessary store/load operations when _we_ run out of registers. So I do not believe that using extra registers can slow down the pipeline itself. The longer instruction opcodes to access the extra registers could be a problem in some cases, no doubt. But every program I have re-compiled to run in 64 bit mode, so far, has run faster than the 32 bit version of the same identical source program... The increase might be very small in some cases, to be certain, but I have yet to find one that actually runs slower. Even in spite of the 64 bit pointer issue and other program changes the 64 bit environment requires. A few have reported issues with tons of pointers, but I don't use "tons"... and have not seen any issues of any kind...hgm wrote:More registers per se cannot make it worse, as the compiler could always refrain from usig them. (But will it?)
Look, I have no experience with 64-bit mode, as I have only one machine that has it, and the OS with which it was delivered prevents me from running in 64-bit mode. So I have no first-hand data on it. I just wanted to point out that most of these out-of-order CPUs really seem to benefit from frequent loading and storing. Which was just as unexpected to me as it will be to you. Naively 'optimizing' the code by eliminating memory variables and better use of registers often produced very counter-productive results.
So I can imagine that an optimizing compiler that uses the same algorithm as before, with just the parameter for the number of registers set to a larger value, would in fact make all the wrong decisions.
So "no better" might be possible, but "worse" is hard to imagine, when we are just talking about using extra registers and ignoring the 64 bit vs 32 bit pointer / addressing issues. 64 bit virtual memory is also less efficient because of the extra levels of page tables...