This makes Crafty Faster

GothicChessInventor · Mon Oct 15, 2007 12:13 am

By the way, just "playing the optimization game", I noticed this makes crafty a tiny bit faster on the 2 systems on which I tested it:

Code: Select all

#define LOWER_ITERATION_BOUND ((iteration_depth) << 1)
#define UPPER_ITERATION_BOUND ((iteration_depth) << 2)
#define NEW_EXTENDED(x,y) ((((x) << 2) - (y)) / ((y) << 1))

#define LimitExtensions(extended, ply) \
            extended = Min(extended, INCPLY);  \
            if(ply > LOWER_ITERATION_BOUND) { \
            if(ply <= UPPER_ITERATION_BOUND) \
            extended = NEW_EXTENDED(iteration_depth, ply); \
            else \
            extended = 0; \
            }

bob · Post by **bob** » Mon Oct 15, 2007 5:10 am

GothicChessInventor wrote:By the way, just "playing the optimization game", I noticed this makes crafty a tiny bit faster on the 2 systems on which I tested it:

Code: Select all

#define LOWER_ITERATION_BOUND ((iteration_depth) << 1)
#define UPPER_ITERATION_BOUND ((iteration_depth) << 2)
#define NEW_EXTENDED(x,y) ((((x) << 2) - (y)) / ((y) << 1))

#define LimitExtensions(extended, ply) \
            extended = Min(extended, INCPLY);  \
            if(ply > LOWER_ITERATION_BOUND) { \
            if(ply <= UPPER_ITERATION_BOUND) \
            extended = NEW_EXTENDED(iteration_depth, ply); \
            else \
            extended = 0; \
            }

I don't see why it would be faster, assuming your compiler optimizer is any good. 2* should get changed into a shl instruction instantly. Ditto for 4*...

I plugged it in but saw no difference on my machines here, using intel's C++ compiler.

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Oct 15, 2007 10:45 am

GothicChessInventor wrote:By the way, just "playing the optimization game", I noticed this makes crafty a tiny bit faster on the 2 systems on which I tested it:

Code: Select all

#define LOWER_ITERATION_BOUND ((iteration_depth) << 1)
#define UPPER_ITERATION_BOUND ((iteration_depth) << 2)
#define NEW_EXTENDED(x,y) ((((x) << 2) - (y)) / ((y) << 1))

#define LimitExtensions(extended, ply) \
            extended = Min(extended, INCPLY);  \
            if(ply > LOWER_ITERATION_BOUND) { \
            if(ply <= UPPER_ITERATION_BOUND) \
            extended = NEW_EXTENDED(iteration_depth, ply); \
            else \
            extended = 0; \
            }

Despite your macros obfuscate and make it harder to follow, you changed the semantic

Code: Select all

#define LimitExtensions(extended,ply)\
   extended = Min(extended,ply); \
   if(ply > 2 * iteration_depth) { \
      if(ply < 4 * iteration_depth) \
         extended = extended*(4*iteration_depth - ply)/(4*iteration_depth); \
      else \
         extended = 0; \
   }

in a significant way which is likely to have more effects in search than replacing lea or add by shift. If it makes Crafty stronger - why not. Not the first time to found improvements by accident or even bugs

c = a*{1,2,4,8} + b + const may translate to one x86 lea instruction.
add reg, reg is also likely the better replacement (or at least not worse) for shift left one on some architectures, specially on net burst. Nowadays I prefere multiplication (or even division) by a constant in the C-source and leave it to the compiler to generate optimal code for the specific target platform.

At least vc2005 I am aware of, does quite optimally and implements, what is suggested in the Software Optimization Guide for AMD Family 10h Processors (same for K8 btw.)
http://www.amd.com/us-en/assets/content ... /40546.pdf
Chapter 8 Integer Optimizations
8.2 Alternative Code for Multiplying by a Constant.

bob · Post by **bob** » Mon Oct 15, 2007 10:06 pm

Gerd Isenberg wrote:
GothicChessInventor wrote:By the way, just "playing the optimization game", I noticed this makes crafty a tiny bit faster on the 2 systems on which I tested it:
Code: Select all
#define LOWER_ITERATION_BOUND ((iteration_depth) << 1)
#define UPPER_ITERATION_BOUND ((iteration_depth) << 2)
#define NEW_EXTENDED(x,y) ((((x) << 2) - (y)) / ((y) << 1))

#define LimitExtensions(extended, ply) \
            extended = Min(extended, INCPLY);  \
            if(ply > LOWER_ITERATION_BOUND) { \
            if(ply <= UPPER_ITERATION_BOUND) \
            extended = NEW_EXTENDED(iteration_depth, ply); \
            else \
            extended = 0; \
            }
Despite your macros obfuscate and make it harder to follow, you changed the semantic
Code: Select all
#define LimitExtensions(extended,ply)\
   extended = Min(extended,ply); \
   if(ply > 2 * iteration_depth) { \
      if(ply < 4 * iteration_depth) \
         extended = extended*(4*iteration_depth - ply)/(4*iteration_depth); \
      else \
         extended = 0; \
   }
in a significant way which is likely to have more effects in search than replacing lea or add by shift. If it makes Crafty stronger - why not. Not the first time to found improvements by accident or even bugs

c = a*{1,2,4,8} + b + const may translate to one x86 lea instruction.
add reg, reg is also likely the better replacement (or at least not worse) for shift left one on some architectures, specially on net burst. Nowadays I prefere multiplication (or even division) by a constant in the C-source and leave it to the compiler to generate optimal code for the specific target platform.

At least vc2005 I am aware of, does quite optimally and implements, what is suggested in the Software Optimization Guide for AMD Family 10h Processors (same for K8 btw.)
http://www.amd.com/us-en/assets/content ... /40546.pdf
Chapter 8 Integer Optimizations
8.2 Alternative Code for Multiplying by a Constant.

Good catch. The denominator is different, which makes the extension amount different. Not sure what "faster" means however... I originally assumed the shifting vs multiplying was his point. It was so hard to read the nested macros that I didn't notice the thing had changed the amount of the extension. It looks like it ought to extend too much that way...

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Oct 15, 2007 11:37 pm

bob wrote: Good catch. The denominator is different, which makes the extension amount different. Not sure what "faster" means however... I originally assumed the shifting vs multiplying was his point. It was so hard to read the nested macros that I didn't notice the thing had changed the amount of the extension. It looks like it ought to extend too much that way...

I guess Ed extends far less, since he did not multiply the nominator with "extended". Since 2*ply > 4*iteration_depth (initial precondition * 2), his quotient becomes always zero.

Code: Select all

   if(ply > 2 * iteration_depth) { \
      if(ply <= 4 * iteration_depth) \
         extended = (4*iteration_depth - ply)/(2*ply); \
      else \
         extended = 0; \
   }

So infact Ed has something like this ...

Code: Select all

   if(ply > 2 * iteration_depth) { \
      extended = 0; \
   }

slightly more expensive due to the wasted cycles to compute zero. Using <= instead of < even for the equal case with difference zero.

GothicChessInventor · Post by **GothicChessInventor** » Tue Oct 16, 2007 7:33 am

Actually this is part of the problem when I type code without copy/paste.

Yeah I see why this would make a huge difference, sorry about the goofup!

This makes Crafty Faster

This makes Crafty Faster

Re: This makes Crafty Faster

Re: This makes Crafty Faster

Re: This makes Crafty Faster

Re: This makes Crafty Faster

Re: This makes Crafty Faster