fast mobility count through hashing

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Ryan Benitez
Posts: 719
Joined: Thu Mar 09, 2006 1:21 am
Location: Portland Oregon

Re: fast mobility count through hashing

Post by Ryan Benitez »

Zach Wegner wrote: I thought the popcnt instruction was only available on Core i7s?
SSE 4a and SSE 4.2 so i7 and Phenom
If someone says their software is too slow I assume they are not using a 386. :P
But really, it makes sense to optimize for current technology.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: fast mobility count through hashing

Post by mcostalba »

Zach Wegner wrote: make it considerably more complicated, but adapting the existing magic finders to do this shouldn't be too hard.
Have you some link about magic finders sources ?

Thanks
Marco
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: fast mobility count through hashing

Post by Gerd Isenberg »

mcostalba wrote:
Zach Wegner wrote: make it considerably more complicated, but adapting the existing magic finders to do this shouldn't be too hard.
Have you some link about magic finders sources ?

Thanks
Marco
related cpw links:
http://chessprogramming.wikispaces.com/ ... for+Magics
http://chessprogramming.wikispaces.com/Hash+Table
http://chessprogramming.wikispaces.com/ ... +Bitboards
http://chessprogramming.wikispaces.com/ ... ialization

As a first exercise you may try to find a fast hash function in 32-bit mode simply for some 12 first rank attacks of a center rook.

Code: Select all

inline unsigned int hash(unsigned int f, unsigned int pattern) {
   return (f * pattern) >> 29; // does not work
}

unsigned int pattern[] = {0x28,0x2c,0x2e,0x2f,0x68,0x6c,0x6e,0x6f,0xe8,0xec,0xee,0xef};
unsigned int pattcnt[] = {   2,   3,   4,   5,   3,   4,   5,   6,   4,   5,   6,   7};

void testPattern() {
   unsigned int f, i;
   for (f=1; f; f++) {
      for &#40;i=0; i < sizeof&#40;pattern&#41;/sizeof&#40;pattern&#91;0&#93;); i++)
         if ( hash&#40;f, pattern&#91;i&#93;) != pattcnt&#91;i&#93; ) goto next;
      printf ("factor %f found\n", f&#41;;
      return;
    next&#58;	;
   &#125;
   printf ("no factor found\n");
&#125;
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: fast mobility count through hashing

Post by bob »

Zach Wegner wrote:
Ryan Benitez wrote:Why not just use a faster popcnt?
in msvc its just:
#include <intrin.h>
#define count_1s(b) (__popcnt64(b))

The asm instruction is popcnt so that is simple too. Msvc x64 does not take inline asm so that is the only frustrating part.
I thought the popcnt instruction was only available on Core i7s?

Anyways, the idea is workable IMO. The magic would be something like shifting the attack bit for every attackable square up to bit 60, and then the sum would be in the top 4 bits. Overflows make it considerably more complicated, but adapting the existing magic finders to do this shouldn't be too hard.
Actually I believe it is supported on any processor that has the sse 4.2 flag set when you use the CPUID instruction.

At least the intel C++ compiler has a built-in intrinsic to access the popcnt instruction. I don't use it as I have the inline assembly code for MSB(), LSB() and PopCnt() already done, and just created a version of PopCnt() for SSE4.2 processors that uses the popcnt asm instruction.

BTW using popcnt was almost an immeasurable change when I was testing on the Nehalem box we had here a month ago. If you play your cards right, you can keep PopCnt() from being an important factor at all.
Ryan Benitez
Posts: 719
Joined: Thu Mar 09, 2006 1:21 am
Location: Portland Oregon

Re: fast mobility count through hashing

Post by Ryan Benitez »

I use PopCnt() a lot. It is the sacrifice I made for making the board as small as I can.
Maybe I played my cards wrong but it works well for me.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: fast mobility count through hashing

Post by Gerd Isenberg »

Ryan Benitez wrote:I use PopCnt() a lot. It is the sacrifice I made for making the board as small as I can.
Makes perfect sense. No need to store and maintain redundant piece-counters anymore. On AMD K10 popcnt has latency of 2 cycles restricted to scheduling in pipe 2. No exact intel data handy on popcnt, but I guess it is in a similar range. To use popcnt as often as desired without tears anymore, even sparse populated sets for center- or king-distance related mobility, where without popcnt instruction one would prefer Brian Kernighan's loop approach or probably a SSE2 dot-product.
Pradu
Posts: 287
Joined: Sat Mar 11, 2006 3:19 am
Location: Atlanta, GA

Re: fast mobility count through hashing

Post by Pradu »

bob wrote:Actually I believe it is supported on any processor that has the sse 4.2 flag set when you use the CPUID instruction.
There's even a seperate flag just for popcnt so that it can work with CPUs that have popcnt but not SSE4:

Code: Select all

#ifdef _MSC_VER
    #include <intrin.h>
#elif defined&#40; __GNUC__ )
    static __inline__ __attribute__(&#40;always_inline&#41;) void __cpuid&#40;int CPUInfo&#91;&#93;, const int InfoType&#41;
    &#123;
        __asm__ __volatile__
        (
            "cpuid"
            &#58; "=a" &#40;CPUInfo&#91;0&#93;), "=b" &#40;CPUInfo&#91;1&#93;), "=c" &#40;CPUInfo&#91;2&#93;), "=d" &#40;CPUInfo&#91;3&#93;)
            &#58; "a" &#40;InfoType&#41;
        );
    &#125;
#else
    #error Unsupported Compiler
#endif

...

int CPUInfo&#91;4&#93;;
__cpuid&#40;CPUInfo, 1&#41;;
int has_popcnt = &#40;CPUInfo&#91;2&#93;>>23&#41;&1;
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: fast mobility count through hashing

Post by mcostalba »

Pradu wrote:
bob wrote:Actually I believe it is supported on any processor that has the sse 4.2 flag set when you use the CPUID instruction.
There's even a seperate flag just for popcnt so that it can work with CPUs that have popcnt but not SSE4:

Code: Select all

#ifdef _MSC_VER
    #include <intrin.h>
#elif defined&#40; __GNUC__ )
    static __inline__ __attribute__(&#40;always_inline&#41;) void __cpuid&#40;int CPUInfo&#91;&#93;, const int InfoType&#41;
    &#123;
        __asm__ __volatile__
        (
            "cpuid"
            &#58; "=a" &#40;CPUInfo&#91;0&#93;), "=b" &#40;CPUInfo&#91;1&#93;), "=c" &#40;CPUInfo&#91;2&#93;), "=d" &#40;CPUInfo&#91;3&#93;)
            &#58; "a" &#40;InfoType&#41;
        );
    &#125;
#else
    #error Unsupported Compiler
#endif

...

int CPUInfo&#91;4&#93;;
__cpuid&#40;CPUInfo, 1&#41;;
int has_popcnt = &#40;CPUInfo&#91;2&#93;>>23&#41;&1;
I have one question regarding runtime check of CPU capabilities.

How this can be used for a chess engine that is supposed to be released as a binary and downloaded and used by anyone on any computer ?

I mean, I cannot use a function pointer to redirect at runtime on a fallback standard C code implementation if host CPU does not support POPCNT. That would be far too slow. This kind of functions must be inlined. So, when I check that host CPU has or not has the popcnt capability what can I do ?

Chess engine is not supposed to be compiled on _any_ pc that will run it, but is compiled once with the best optimization and distributed.

The only possibility I foreseen is two create two compiles, one for CPU with POPCNT and another for CPU without POPCNT, but considering that we need also another two versions for 32 and 64 bits we are ramping up fast on this combinatorial escalation.
Pradu
Posts: 287
Joined: Sat Mar 11, 2006 3:19 am
Location: Atlanta, GA

Re: fast mobility count through hashing

Post by Pradu »

mcostalba wrote:I have one question regarding runtime check of CPU capabilities.
Well, I don't do it at runtime. I have a Makefile that generates an executable that creates a Makefile with CPU info in it and subsequently I include the generated Makefile into the original Makefile so it will optimize for the platform it is running on. Subsequently, for cross-platform builds, I can explictly enable, disable features by invoking make differently:

Code: Select all

make CPU_POPCNT=0
Or have this done in a Makefile for loop if I want to build the same source code but with different flags.
I guess if you are releasing it on the internet you could make multiple executables. This may not be the best way of doing it but this is what I do.
The only possibility I foreseen is two create two compiles, one for CPU with POPCNT and another for CPU without POPCNT, but considering that we need also another two versions for 32 and 64 bits we are ramping up fast on this combinatorial escalation.
With a Makefile based build-system, this really isn't a problem (apart from the build-time).
Last edited by Pradu on Sun May 10, 2009 10:32 pm, edited 1 time in total.
Pradu
Posts: 287
Joined: Sat Mar 11, 2006 3:19 am
Location: Atlanta, GA

Re: fast mobility count through hashing

Post by Pradu »

we are ramping up fast on this combinatorial escalation.
I'm guessing this is the reason why many programs are distributed without binaries. But for a closed-source program this could indeed be a problem but atleast for computer-chess it isn't too bad. For example if you have a chess engine that targets x86/x64 platforms with or without popcnt, for AMD processors and for Intel processors, for Windows, Linux and Mac and that's only about 24 executables. I guess you could take a few more options (perhaps around a hundred executables) and after that point it would indeed be a serious problem but I don't know any workaround other than distributing the source code which may or may not be possible.