What's the fastest move generator?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: What's the fastest move generator?

Post by diep »

smatovic wrote:
My move generator i open sourced a year or 10 ago.
Where to get?

--
Srdja
To give you an idea how it looks like:

Code: Select all

#define white               0
#define black               1
#define neutral             2

#define no_piece            0  /*  0b0000 */
#define pawn                1  /*  0b0001 */
#define knight              2  /*  0b0010 */
#define bishop              3  /*  0b0011 */
#define rook                4  /*  0b0100 */
#define queen               5  /*  0b0101 */
#define king                6  /*  0b0110 */

#define bpawn               9  /*  0b1001 */
#define bknight             10 /*  0b1010 */
#define bbishop             11 /*  0b1011 */
#define brook               12 /*  0b1100 */
#define bqueen              13 /*  0b1101 */
#define bking               14 /*  0b1110 */

#define QuickLink(FFTT) {genindex->zet = FFTT;genindex++;}
// 'zet' is Dutch for 'move' in English

// the first 2 arrays are exactly like Gnuchess from John Stanback
// not to confuse with the later raped gnuchess when less talented guys
// hopped in to 'care' about gnuchess:

// 'color' has 3 values, 0,1,2 meaning white, black and neutral
// 'board' has piece values 1..6 and empty squares 0

// 'snelbord' also 'quickboard' in other engines i wrote which no longer
// have the 'board' at all nor color, yet realize at cheapskate mobile processors that having all the arrays is faster as every instruction counts
there. Even an AND is yet another instruction.
// it has piece values 1..6 and 9..14

PLEASE NOTE IF YOU WANT THE CODE TO BE FAST FOR 64 BITS COMPILES AS WELL, THEN MODIFY 'int' INTO 'unsigned int'. ASK GERD ISENBERG WHY IF YOU DON'T KNOW THE REASON.

Generating sliders: Queen Rook Bishop

Code: Select all

  t  = cancapside[side];
  psq  = &quickpiecelist[side][12]; /* Q R B */
  while( (sq=*psq++) != 128 ) {
    int SRsq  = &#40;sq<<6&#41;;
    int piece = board&#91;sq&#93;; // same as 'snelbord&#91;sq&#93; & 7'
    int *s,*v,*w,u;

    s = andscan&#91;0&#93;;
    v = ipiecepos&#91;piece&#93;&#91;sq&#93;;
    w = iskippos&#91;sq&#93;;

    u = *v++;
    do &#123;
      int p1=snelbord&#91;u&#93;,sh=w&#91;u&#93;;
      v += &#40;s&#91;p1&#93;&sh&#41;;
      if&#40; (&#40;p1-1&#41;>>3&#41; != side ) &#123;
        QuickLink&#40;SRsq|u|t&#91;p1&#93;);
      &#125;
    &#125; while&#40; &#40;u=*v++) != 128 );
  &#125;
Now generating Knights and King moves:

Code: Select all

  psq = quickpiecelist&#91;side&#93;; /* K N */
  while&#40; &#40;sq=*psq++) != 128 ) &#123;
    int SRsq  = &#40;sq<<6&#41;;
    int piece = board&#91;sq&#93;;

    int u,*v;
    v = ipiecepos&#91;piece&#93;&#91;sq&#93;;
    u = *v++;
    do &#123;
      int p1 = snelbord&#91;u&#93;;
      if&#40; (&#40;p1-1&#41;>>3&#41; != side ) &#123;
        QuickLink&#40;SRsq|u|t&#91;p1&#93;);
      &#125;
    &#125; while&#40; &#40;u=*v++) != 128 );
  &#125;
Pawns i'm doing less super optimized possibly as there is a lot of branches. I'll have a look one day how to do that faster... ...even then the next code isn't exactly slow as i by hand 'unrolled' the loop:

Note that things are fall through optimized. I wrote the main generator doing this around 1998, in 2005 there was a small revision how pieces are in different lists. If you don't know what fall through is, please google for it, it's a big step forwards (invented by intel).

Code: Select all

  psq  = &quickpiecelist&#91;side&#93;&#91;26&#93;; /* pawns */
  while&#40; &#40;sq=*psq++) != 128 ) &#123;
    int SRsq  = &#40;sq<<6&#41;;
    int u,*v,*w;

    v = ipiecepos&#91;side&#93;&#91;sq&#93;;
    w = ipawndir&#91;side&#93;&#91;sq&#93;;
    u = *v++;
    if&#40; row&#40;u&#41; != 0 && row&#40;u&#41; != 7 ) &#123;
      if&#40; color&#91;u&#93; == neutral&#41; &#123;
        QuickLink&#40;SRsq|u&#41;;
        if&#40; &#40;u=*v&#41; != 128 && color&#91;u&#93; == neutral ) &#123; /* indien u == sq dan false */
          QuickLink&#40;SRsq|u&#41;;
        &#125;
      &#125;

      u = *w++;
      if&#40; color&#91;u&#93; == xside ) &#123; 
        QuickLink&#40;SRsq|u|move_captures&#41;;
      &#125;
      if&#40; &#40;u=*w&#41; != 128 && color&#91;u&#93; == xside ) &#123; 
        QuickLink&#40;SRsq|u|move_captures&#41;;
      &#125;
    &#125;
    else &#123;
      if&#40; color&#91;u&#93; == neutral&#41; &#123;
        QuickLink&#40;SRsq|u|move_pqueen&#41;;
        QuickLink&#40;SRsq|u|move_pknight&#41;;
        QuickLink&#40;SRsq|u|move_prook&#41;;
        QuickLink&#40;SRsq|u|move_pbishop&#41;;
      &#125;
      u = *w++;
      if&#40; color&#91;u&#93; == xside&#41; &#123;/* captures */
        QuickLink&#40;SRsq|u|move_captures|move_pqueen&#41;;
        QuickLink&#40;SRsq|u|move_captures|move_pknight&#41;;
        QuickLink&#40;SRsq|u|move_captures|move_prook&#41;;
        QuickLink&#40;SRsq|u|move_captures|move_pbishop&#41;;
      &#125;

      if&#40; &#40;u=*w&#41; != 128 && color&#91;u&#93; == xside&#41; &#123;
        QuickLink&#40;SRsq|u|move_captures|move_pqueen&#41;;
        QuickLink&#40;SRsq|u|move_captures|move_pknight&#41;;
        QuickLink&#40;SRsq|u|move_captures|move_prook&#41;;
        QuickLink&#40;SRsq|u|move_captures|move_pbishop&#41;;
      &#125;
    &#125;
  &#125;
This is how i generate moves - still have to 'upgrade' similar code in the 'main diep' to this generator by the way, yet the focus is and was the sliders of course. If you analyze it well you'll see it doesn't have many branches.

The difference with the 'old diep' and this 2005 code is that i used to have all pieces in 1 list for each side. This code has QRB separated from KN and separated from the pawns. That single branch there was really EXPENSIVE. No expensive branches in the above code...

Also note the branches to link moves are short so real fast on the CPU's.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: What's the fastest move generator?

Post by Gerd Isenberg »

diep wrote:Generating sliders: Queen Rook Bishop

Code: Select all

  t  = cancapside&#91;side&#93;;
  psq  = &quickpiecelist&#91;side&#93;&#91;12&#93;; /* Q R B */
  while&#40; &#40;sq=*psq++) != 128 ) &#123;
    int SRsq  = &#40;sq<<6&#41;;
    int piece = board&#91;sq&#93;; // same as 'snelbord&#91;sq&#93; & 7'
    int *s,*v,*w,u;

    s = andscan&#91;0&#93;;
    v = ipiecepos&#91;piece&#93;&#91;sq&#93;;
    w = iskippos&#91;sq&#93;;

    u = *v++;
    do &#123;
      int p1=snelbord&#91;u&#93;,sh=w&#91;u&#93;;
      v += &#40;s&#91;p1&#93;&sh&#41;;
      if&#40; (&#40;p1-1&#41;>>3&#41; != side ) &#123;
        QuickLink&#40;SRsq|u|t&#91;p1&#93;);
      &#125;
    &#125; while&#40; &#40;u=*v++) != 128 );
  &#125;
Nice piece of code Vincent. I specially like the inner loop in sliding attacks, and the conditional branchless skip to end of the ray in case of none empty target square.

Code: Select all

      v += &#40;s&#91;p1&#93;&sh&#41;;
Was that your invention? Great!

I would like to write a small CPW article about your move gen, pseudo code, or the code you posted here. Is that OK?

Do you suggest a name? Vincent's SnelZetter or something?

Gerd
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: What's the fastest move generator?

Post by Gerd Isenberg »

Have you tried instead of

Code: Select all

      if&#40; (&#40;p1-1&#41;>>3&#41; != side ) &#123;
        QuickLink&#40;SRsq|u|t&#91;p1&#93;);
branchless conditional write?

Code: Select all

      genindex->zet = &#40;SRsq|u|t&#91;p1&#93;);
      genindex += (&#40;p1-1&#41;>>3&#41; != side;
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: What's the fastest move generator?

Post by diep »

Gerd Isenberg wrote:
diep wrote:Generating sliders: Queen Rook Bishop

Code: Select all

  t  = cancapside&#91;side&#93;;
  psq  = &quickpiecelist&#91;side&#93;&#91;12&#93;; /* Q R B */
  while&#40; &#40;sq=*psq++) != 128 ) &#123;
    int SRsq  = &#40;sq<<6&#41;;
    int piece = board&#91;sq&#93;; // same as 'snelbord&#91;sq&#93; & 7'
    int *s,*v,*w,u;

    s = andscan&#91;0&#93;;
    v = ipiecepos&#91;piece&#93;&#91;sq&#93;;
    w = iskippos&#91;sq&#93;;

    u = *v++;
    do &#123;
      int p1=snelbord&#91;u&#93;,sh=w&#91;u&#93;;
      v += &#40;s&#91;p1&#93;&sh&#41;;
      if&#40; (&#40;p1-1&#41;>>3&#41; != side ) &#123;
        QuickLink&#40;SRsq|u|t&#91;p1&#93;);
      &#125;
    &#125; while&#40; &#40;u=*v++) != 128 );
  &#125;
Nice piece of code Vincent. I specially like the inner loop in sliding attacks, and the conditional branchless skip to end of the ray in case of none empty target square.

Code: Select all

      v += &#40;s&#91;p1&#93;&sh&#41;;
Was that your invention? Great!

I would like to write a small CPW article about your move gen, pseudo code, or the code you posted here. Is that OK?

Do you suggest a name? Vincent's SnelZetter or something?

Gerd
I had posted this method before. You called it back then 'straightforward way of generating moves'. It's simply a table based move generator, just like gnuchess; You can use this in virtual every non-bitboard type of move generator. In fact you can also use it with bitboards, as it's just a table looking up the next move. Giving it a name would be overkill i fear, though it's clearly an invention. Note from a broader perspective it's just a more optimized table method than gnuchess had in 80s.

One can use this in mailbox or 0x88 as well.

I'll post some later today here also the way how to generate the tables. Had posted all this already a year or 10 ago except for the larger piecelist i have nowadays which used to be size [2][18] now it's way larger.

Besides a dozen of yanks seems you're only one from Europe having writing rights in CPW, would be nice if you write something about it.

At request of Richard Stallman i gpl-ed the code.

Most here tend to forget that there is something called patents. We didn't have those fights over here yet in computerchess too much, but in fact the only way Apple stops Samsung (without taking a viewpoint who is right or wrong) now is a method used in software on how to scroll. A software patent IMHO that is the center of dispute between Apple and Samsung.

So there is no issues with this move generator there.

Note that when using this move generator for a 0x88 based datastructure, a board that's 16x16 like in Fruit, that you can also really quickly generate check giving moves in an accurate manner.

For many complex scanning loops in Diep, 16x16 board would have been a cycle here and a few cycles there faster.

Too late to change that now obviously :)
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: What's the fastest move generator?

Post by diep »

Gerd Isenberg wrote:Have you tried instead of

Code: Select all

      if&#40; (&#40;p1-1&#41;>>3&#41; != side ) &#123;
        QuickLink&#40;SRsq|u|t&#91;p1&#93;);
branchless conditional write?

Code: Select all

      genindex->zet = &#40;SRsq|u|t&#91;p1&#93;);
      genindex += (&#40;p1-1&#41;>>3&#41; != side;
I stripped the code a tad. In fact i also assign a score in between to each move.

Somewhere end 90s i tried many variations. If you add 1 instruction you reduce the table size from 60-100 KB to say a few kilobyte only.

Back then this all tested slower. I'm sure that depending upon the compiler and the processor you sometimes can save a cycle here or there.

You can also replace the (p1-1) >> 3 with other equivalents, at some processors shifting right is not such a fast instruction.

At P4 shifting right was like 7 cycles. So the shift right here might be from around 2005 there.

So back then i had something else other than shifting right 3. You can also do a lookup to another table - that's what i did do in 90s.

Note there is another alternative of this method that's closely related to it which i used in 90s as that was faster at CPU's from back then.

I didn't accurately benchmark all this for TODAYS processors.
There is a serious lack of time for that.

The k7's can do 2 reads in parallel each cycle. I'm not sure about core2 Xeon and i7. I know their previous incarnation can do just 1 read a cycle, what's it with Core2 Xeons L5420 i have here at home?
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: What's the fastest move generator?

Post by diep »

Gerd Isenberg wrote:Have you tried instead of

Code: Select all

      if&#40; (&#40;p1-1&#41;>>3&#41; != side ) &#123;
        QuickLink&#40;SRsq|u|t&#91;p1&#93;);
branchless conditional write?

Code: Select all

      genindex->zet = &#40;SRsq|u|t&#91;p1&#93;);
      genindex += (&#40;p1-1&#41;>>3&#41; != side;
I checked code. The 'slow' Diep the code still looks like this:

Code: Select all

 else &#123;
    int *t,*lm;
    t  = cancapside&#91;side&#93;;
    lm = linkmask&#91;side&#93;;
    if&#40; sweep&#91;piece&#93; ) &#123;
      int *s,*v,*w,u;
      s = andscan&#91;0&#93;;
      v = ipiecepos&#91;piece&#93;&#91;sq&#93;;
      w = iskippos&#91;sq&#93;;
      u = *v++;
      do &#123;
        int p1=snelbord&#91;u&#93;,sh=w&#91;u&#93;;
        v += &#40;s&#91;p1&#93;&sh&#41;;
        genindex->zet = SRsq|u|t&#91;p1&#93;;
        genindex     += lm&#91;p1&#93;;
      &#125; while&#40; &#40;u=*v++) != 128 );
    &#125;
    else &#123;
      int u,*v;
      v = ipiecepos&#91;piece&#93;&#91;sq&#93;;
      u = *v++;
      do &#123;
        int p1 = snelbord&#91;u&#93;;
        genindex->zet = SRsq|u|t&#91;p1&#93;;
        genindex     += lm&#91;p1&#93;;
      &#125; while&#40; &#40;u=*v++) != 128 );
    &#125;
As you see above there is no != type signs there at all. It just links without branches at all. I'm not sure which of the above 3 is any faster at modern processors at all. Would need to look at the assembler it generates then and estimate if it matters at all. Of course in the 2005 incarnation i saved out the 'else' there which is a HUGE speedup as this is a very slow branch.

In nps at the k7 i noticed removing that branch BIGTIME.

But of course looking at the 'invariant' as in logics and grammars we call the innerloops, i'm not sure which is faster.

I did notice back then that GCC had more problems with some constructs than visual studio. For the cluster here GCC is rather important...
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: What's the fastest move generator?

Post by Gerd Isenberg »

diep wrote: I had posted this method before. You called it back then 'straightforward way of generating moves'.
Yes, I remember, excuse the redundancy to highlight your routine again ;-)
diep wrote: It's simply a table based move generator, just like gnuchess; You can use this in virtual every non-bitboard type of move generator. In fact you can also use it with bitboards, as it's just a table looking up the next move. Giving it a name would be overkill i fear, though it's clearly an invention. Note from a broader perspective it's just a more optimized table method than gnuchess had in 80s.
I think the first table driven move gen was described by Alex Bell on Atlas, probably already developed when he worked for Barricelli in the early 60s. Disjoint next square and next direction loops in Algol. Looking for Stanback's GNU Chess move gen. Bruce Moreland also mentioned table driven movegen, so did Reul in his thesis with the ray-wise blocking loops for captures.
diep wrote: One can use this in mailbox or 0x88 as well.

I'll post some later today here also the way how to generate the tables. Had posted all this already a year or 10 ago except for the larger piecelist i have nowadays which used to be size [2][18] now it's way larger.


Besides a dozen of yanks seems you're only one from Europe having writing rights in CPW, would be nice if you write something about it.
Recently, at 5th CPW anniversity, I posted in General Topics with explicit invitation to become member and to contribute. There are > 700 members already, but most are anonymious and keep silent. I am not aware they have no writing rights, and nobody complained about, iirr. There were recent edits by RichardPijl and Edmund_Moshammer for instance.

May be some have objections with wikispaces and propriatary wikitext and still buggy graphical editor (better use plain text editor and preview). If someone has problems to register (it still requires user names without spaces, so use underscore) or to login, please send me a pm. Editing requires "objective Wiki-style", to refer or quote sources. Linking to other pages or sub-pages takes some time.
diep wrote:At request of Richard Stallman i gpl-ed the code.

Most here tend to forget that there is something called patents. We didn't have those fights over here yet in computerchess too much, but in fact the only way Apple stops Samsung (without taking a viewpoint who is right or wrong) now is a method used in software on how to scroll. A software patent IMHO that is the center of dispute between Apple and Samsung.

So there is no issues with this move generator there.

Note that when using this move generator for a 0x88 based datastructure, a board that's 16x16 like in Fruit, that you can also really quickly generate check giving moves in an accurate manner.

For many complex scanning loops in Diep, 16x16 board would have been a cycle here and a few cycles there faster.

Too late to change that now obviously :)
On 0x88, I think Christophe Theron still has a valid point, and mailbox vector atacks are great to detect checks and to reduce table sizes.
http://www.stmintz.com/ccc/index.php?id=114438

On the planned cpw-article on table driven movegen, I prefer to explain it in words, some data structure diagrams, and pseudo code.

Gerd
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: What's the fastest move generator?

Post by diep »

I really appreciate if you write an article about all this at CPW.

As i noted a bunch of years ago, 16x16 board allows a near perfect prediction array type years 90 style.

From head something like if you have 'from and to' squares and want to see whether this move is a giving check:

checkforcheck[ constant + to - from ]

small byte array will do the job there.

of course if you use a pointer in middle of an array
also negative references possible yet most boundscheckers don't like that.

if you check it out then for a 16x16 array it's more efficient than 12x12 as in the 16x16 array less overlaps. So works way faster.

Of course the 0x88 idea is not so relevant with the tables i designed.

With todays processors doing table lookups has become a lot cheaper, much opposed to the past.

I noticed that bigtime for a prime number sieve mine i experimented with a few months ago.

Basically up to a kilobyte or roughly 800 you're ok if you're streaming, the level caches do magnificent there. Tad less at intel processors but it's not a big deal difference.

This was total different in 90s.

Please don't forget most programs nowadays use an array to store all the material, which is a huge array.

Around a 1.2MB in most programs it is.

Efficiency in code and reducing number of branches matters though

Only GPU programming is total different there, as everything must be in the register files there otherwise you slow down and you just have a few kilobytes a core. Somewhere around a 1-2 kilobyte for datastorage.

One thing i really never did do is optimize carefully for core2. Now that AMD is years behind on intel and eats way too much power for what it delivers (though seemingly in modern society power usage never is an issue for hardware for 99% of the consumers), so any generic optimization now is nuts. Intel rules simply.