This makes Crafty Faster

Discussion of chess software programming and technical issues.

Moderator: Ras

GothicChessInventor

This makes Crafty Faster

Post by GothicChessInventor »

By the way, just "playing the optimization game", I noticed this makes crafty a tiny bit faster on the 2 systems on which I tested it:

Code: Select all

#define LOWER_ITERATION_BOUND ((iteration_depth) << 1)
#define UPPER_ITERATION_BOUND ((iteration_depth) << 2)
#define NEW_EXTENDED(x,y) ((((x) << 2) - (y)) / ((y) << 1))

#define LimitExtensions(extended, ply) \
            extended = Min(extended, INCPLY);  \
            if(ply > LOWER_ITERATION_BOUND) { \
            if(ply <= UPPER_ITERATION_BOUND) \
            extended = NEW_EXTENDED(iteration_depth, ply); \
            else \
            extended = 0; \
            }
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: This makes Crafty Faster

Post by bob »

GothicChessInventor wrote:By the way, just "playing the optimization game", I noticed this makes crafty a tiny bit faster on the 2 systems on which I tested it:

Code: Select all

#define LOWER_ITERATION_BOUND ((iteration_depth) << 1)
#define UPPER_ITERATION_BOUND ((iteration_depth) << 2)
#define NEW_EXTENDED(x,y) ((((x) << 2) - (y)) / ((y) << 1))

#define LimitExtensions(extended, ply) \
            extended = Min(extended, INCPLY);  \
            if(ply > LOWER_ITERATION_BOUND) { \
            if(ply <= UPPER_ITERATION_BOUND) \
            extended = NEW_EXTENDED(iteration_depth, ply); \
            else \
            extended = 0; \
            }
I don't see why it would be faster, assuming your compiler optimizer is any good. 2* should get changed into a shl instruction instantly. Ditto for 4*...

I plugged it in but saw no difference on my machines here, using intel's C++ compiler.
Gerd Isenberg
Posts: 2251
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: This makes Crafty Faster

Post by Gerd Isenberg »

GothicChessInventor wrote:By the way, just "playing the optimization game", I noticed this makes crafty a tiny bit faster on the 2 systems on which I tested it:

Code: Select all

#define LOWER_ITERATION_BOUND ((iteration_depth) << 1)
#define UPPER_ITERATION_BOUND ((iteration_depth) << 2)
#define NEW_EXTENDED(x,y) ((((x) << 2) - (y)) / ((y) << 1))

#define LimitExtensions(extended, ply) \
            extended = Min(extended, INCPLY);  \
            if(ply > LOWER_ITERATION_BOUND) { \
            if(ply <= UPPER_ITERATION_BOUND) \
            extended = NEW_EXTENDED(iteration_depth, ply); \
            else \
            extended = 0; \
            }
Despite your macros obfuscate and make it harder to follow, you changed the semantic

Code: Select all

#define LimitExtensions(extended,ply)\
   extended = Min(extended,ply); \
   if(ply > 2 * iteration_depth) { \
      if(ply < 4 * iteration_depth) \
         extended = extended*(4*iteration_depth - ply)/(4*iteration_depth); \
      else \
         extended = 0; \
   }
in a significant way which is likely to have more effects in search than replacing lea or add by shift. If it makes Crafty stronger - why not. Not the first time to found improvements by accident or even bugs ;-)

c = a*{1,2,4,8} + b + const may translate to one x86 lea instruction.
add reg, reg is also likely the better replacement (or at least not worse) for shift left one on some architectures, specially on net burst. Nowadays I prefere multiplication (or even division) by a constant in the C-source and leave it to the compiler to generate optimal code for the specific target platform.

At least vc2005 I am aware of, does quite optimally and implements, what is suggested in the Software Optimization Guide for AMD Family 10h Processors (same for K8 btw.)
http://www.amd.com/us-en/assets/content ... /40546.pdf
Chapter 8 Integer Optimizations
8.2 Alternative Code for Multiplying by a Constant.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: This makes Crafty Faster

Post by bob »

Gerd Isenberg wrote:
GothicChessInventor wrote:By the way, just "playing the optimization game", I noticed this makes crafty a tiny bit faster on the 2 systems on which I tested it:

Code: Select all

#define LOWER_ITERATION_BOUND ((iteration_depth) << 1)
#define UPPER_ITERATION_BOUND ((iteration_depth) << 2)
#define NEW_EXTENDED(x,y) ((((x) << 2) - (y)) / ((y) << 1))

#define LimitExtensions(extended, ply) \
            extended = Min(extended, INCPLY);  \
            if(ply > LOWER_ITERATION_BOUND) { \
            if(ply <= UPPER_ITERATION_BOUND) \
            extended = NEW_EXTENDED(iteration_depth, ply); \
            else \
            extended = 0; \
            }
Despite your macros obfuscate and make it harder to follow, you changed the semantic

Code: Select all

#define LimitExtensions(extended,ply)\
   extended = Min(extended,ply); \
   if(ply > 2 * iteration_depth) { \
      if(ply < 4 * iteration_depth) \
         extended = extended*(4*iteration_depth - ply)/(4*iteration_depth); \
      else \
         extended = 0; \
   }
in a significant way which is likely to have more effects in search than replacing lea or add by shift. If it makes Crafty stronger - why not. Not the first time to found improvements by accident or even bugs ;-)

c = a*{1,2,4,8} + b + const may translate to one x86 lea instruction.
add reg, reg is also likely the better replacement (or at least not worse) for shift left one on some architectures, specially on net burst. Nowadays I prefere multiplication (or even division) by a constant in the C-source and leave it to the compiler to generate optimal code for the specific target platform.

At least vc2005 I am aware of, does quite optimally and implements, what is suggested in the Software Optimization Guide for AMD Family 10h Processors (same for K8 btw.)
http://www.amd.com/us-en/assets/content ... /40546.pdf
Chapter 8 Integer Optimizations
8.2 Alternative Code for Multiplying by a Constant.
Good catch. The denominator is different, which makes the extension amount different. Not sure what "faster" means however... I originally assumed the shifting vs multiplying was his point. It was so hard to read the nested macros that I didn't notice the thing had changed the amount of the extension. It looks like it ought to extend too much that way...
Gerd Isenberg
Posts: 2251
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: This makes Crafty Faster

Post by Gerd Isenberg »

bob wrote: Good catch. The denominator is different, which makes the extension amount different. Not sure what "faster" means however... I originally assumed the shifting vs multiplying was his point. It was so hard to read the nested macros that I didn't notice the thing had changed the amount of the extension. It looks like it ought to extend too much that way...
I guess Ed extends far less, since he did not multiply the nominator with "extended". Since 2*ply > 4*iteration_depth (initial precondition * 2), his quotient becomes always zero.

Code: Select all

   if(ply > 2 * iteration_depth) { \
      if(ply <= 4 * iteration_depth) \
         extended = (4*iteration_depth - ply)/(2*ply); \
      else \
         extended = 0; \
   }
So infact Ed has something like this ...

Code: Select all

   if(ply > 2 * iteration_depth) { \
      extended = 0; \
   }
slightly more expensive due to the wasted cycles to compute zero. Using <= instead of < even for the equal case with difference zero.
GothicChessInventor

Re: This makes Crafty Faster

Post by GothicChessInventor »

Actually this is part of the problem when I type code without copy/paste.

Yeah I see why this would make a huge difference, sorry about the goofup!