What's Strelka's Secret Sauce?

Discussion of chess software programming and technical issues.

Moderator: Ras

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: What's Strelka's Secret Sauce?

Post by bob »

Gerd Isenberg wrote:
bob wrote: I have all the bitboards indexed by wtm. IE pawns[wtm] and so forth. All other masks are indexed the same way in the evaluation. Random hash values are also done the same way... Makes things far cleaner since all the white/black stuff is replaced by one general case that will work with either side. I now have things like

score += EvaluateKnights(white) - EvaluateKnights(black)

same sort of thing for move generation, you just pass it the color and it uses the same code. Only tiny exception is for pawns since negative shifts won't work on the PC and I don't want to make it too ugly (yet). I am looking at ways to generate pawn moves without the special-case white/black code as well, although 99% of the pawn generation code is now not duplicated...

Bob
I see, the generalized shift by rotate-and can be combined with the wrap 'and' for set-wise pawn attacks. Thus the additional cost for a generalized << +-8 is one read plus 'and'.

Code: Select all

// positve left, negative right shifts
int shift[8] = {9, 1,-7,-8,-9,-1, 7, 8};
 
U64 avoidWrap[8] =
{
   0xfefefefefefefe00,
   0xfefefefefefefefe,
   0x00fefefefefefefe,
   0x00ffffffffffffff,
   0x007f7f7f7f7f7f7f,
   0x7f7f7f7f7f7f7f7f,
   0x7f7f7f7f7f7f7f00,
   0xffffffffffffff00,
};
 
U64 shiftOne (U64 b, int dir8) {
   return rotateLeft (b, shift[dir8]) & avoidWrap[dir8];
}

#if defined X86INTRINSICS
U64 rotateLeft (U64 x, int s) {return _rotl64(x, s);}
#else
// likely translated to rol by "smart" compilers
U64 rotateLeft (U64 x, int s) {return (x << s) | (x >> (64-s));}
#endif
As I said, I have not yet tackled that. I already have one kludge that I don't like, which is the need for LSB() for advanced white pieces, MSB() for advanced black pieces. I buried that in a macro to make the code read better. Bottom line (so far) is that 22.0 is faster than 21.7, yet the node counts match in every test position in my validation suite. The smaller size has helped, plus a little cleaning up here and there has also helped...

Never did understand why + shifts are ok, but - shifts are not. Most every other machine I have programmed would do it either way just fine...
Edsel Apostol
Posts: 803
Joined: Mon Jul 17, 2006 5:53 am
Full name: Edsel Apostol

Re: What's Strelka's Secret Sauce?

Post by Edsel Apostol »

Ok. I think I've got your point. :oops:

If those things could be fixed on Strelka then it could catch up with the latest Rybka.

By the way, when will be the next Crafty release?

Good luck with CCT. My engine will be playing there also.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: What's Strelka's Secret Sauce?

Post by bob »

Edsel Apostol wrote:Ok. I think I've got your point. :oops:

If those things could be fixed on Strelka then it could catch up with the latest Rybka.

By the way, when will be the next Crafty release?

Good luck with CCT. My engine will be playing there also.
21.7 will be released immediately following CCT.

22.0 will not be that far behind, but there is a lot of rewriting going on to eliminate all the black/white code. It is painful, and will be even more painful when I get to the biggies which are things like EvaluatePawns(), EvaluatePassedPawns(), and such...
rreagan
Posts: 102
Joined: Sun Sep 09, 2007 6:32 am

Re: What's Strelka's Secret Sauce?

Post by rreagan »

bob wrote:Only tiny exception is for pawns since negative shifts won't work on the PC and I don't want to make it too ugly (yet). I am looking at ways to generate pawn moves without the special-case white/black code as well, although 99% of the pawn generation code is now not duplicated...

Bob
I don't know how efficient this is, but here is something I have done to accomplish this.

Code: Select all

enum { white, black };
BitBoard shift_forward (BitBoard b, int color) {
    return (b << 8) >> (color << 4);
}
pawn_pushes = shift_forward(friendly_pawns, side_to_move);
Substituting a table lookup for (color << 4) would work also.
Gerd Isenberg
Posts: 2251
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: What's Strelka's Secret Sauce?

Post by Gerd Isenberg »

rreagan wrote: I don't know how efficient this is, but here is something I have done to accomplish this.

Code: Select all

enum { white, black };
BitBoard shift_forward (BitBoard b, int color) {
    return (b << 8) >> (color << 4);
}
pawn_pushes = shift_forward(friendly_pawns, side_to_move);
Substituting a table lookup for (color << 4) would work also.
This is fine for pawn pushes, with most significant rank mapping (rank bit 3-5, file bit 0-2) so that black pawns may not temporary disappear from the board - three shift instructions.

Code: Select all

shl  ecx, 4
shl  rax, 8
shr  rax, cl
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: What's Strelka's Secret Sauce?

Post by wgarvin »

I'm not sure if you are interested in platforms other than Intel, but... variable-length shifts are slow on certain platforms. The biggest surprise for me was the inorder PPC cores in the Xbox360--a variable-length shift instruction has a latency of 16 cycles on that thing.
Gerd Isenberg
Posts: 2251
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: What's Strelka's Secret Sauce?

Post by Gerd Isenberg »

wgarvin wrote:I'm not sure if you are interested in platforms other than Intel, but... variable-length shifts are slow on certain platforms. The biggest surprise for me was the inorder PPC cores in the Xbox360--a variable-length shift instruction has a latency of 16 cycles on that thing.
You are right - I implicitly assume variable shift by cl on core2 or K8/K10 x86-64 target cpus. core2 has 1 cycle latency, 0.5 cycles reciprocal throughput. K8 has 1 cycle latency, 1/3 cycles reciprocal throughput. Of course with 16 cycles latency, variable shift is a no no - and one better abjures general shift - or probably relies on something like this:

Code: Select all

U64 genShift(U64 x, int s) {
   return (s > 0) ? (x << s) : (x >> -s);
}
with some hope of speculative execution of both shifts with a conditional move instruction - or

Code: Select all

U64 genShift(U64 x, int s) {
   int isNegative = s >> 31; // arithmetical shift -> -1 if negative, else 0
   x >>= -s &  isNegative;
   x <<=  s & ~isNegative;
   return x;
}
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: What's Strelka's Secret Sauce?

Post by wgarvin »

Another idea is to try and use rotate instructions instead of shift instructions (if the rest of the bitboard is already masked off, or something). The rotate would work if you give it a "negative" 2's complement number. It has some downsides though...it might be more expensive than a shift, and its hard to convince C compilers to generate rotate instructions on their own.

Anyway for this case you have here it seems likely that those 3 shifts are pretty good. You guys surely know more about the performance characteristics of these modern x86-64 chips than I do. The last x86 chip I knew a lot about was the P4. Times have changed, and I haven't been paying enough attention. :)

[Edit: reading up in the thread, seems you've already been over the rotate ground. Sorry about that.]
Gerd Isenberg
Posts: 2251
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: What's Strelka's Secret Sauce?

Post by Gerd Isenberg »

wgarvin wrote:Another idea is to try and use rotate instructions instead of shift instructions (if the rest of the bitboard is already masked off, or something). The rotate would work if you give it a "negative" 2's complement number. It has some downsides though...it might be more expensive than a shift, and its hard to convince C compilers to generate rotate instructions on their own.

Anyway for this case you have here it seems likely that those 3 shifts are pretty good. You guys surely know more about the performance characteristics of these modern x86-64 chips than I do. The last x86 chip I knew a lot about was the P4. Times have changed, and I haven't been paying enough attention. :)

[Edit: reading up in the thread, seems you've already been over the rotate ground. Sorry about that.]
Hehe, no problem ;-)
Threads are hard to follow with this php-boards. I have the gut feeling this was discussed x-times before in the history of CCC (and elsewhere). One more reason to collect it in the wiki.
http://chessprogramming.wikispaces.com/ ... erations23
I will add Russel's proposal for a generalized pawn shift in the pawn push section.
ernest
Posts: 2046
Joined: Wed Mar 08, 2006 8:30 pm

Re: What's Strelka's Secret Sauce?

Post by ernest »

Guetti wrote: So lets try to summarize the most important Strelka secrets:
1. ...........
2. Implementation of material tables
3. ...........
I think it is these "material tables", which Strelka simply took and copied from Rybka 1.0 Beta, which bring about the great leap in strength (+100 Elo), compared to Fruit 2.1
Without them, we would only have a bitboard Fruit 2.1

Do we know something about these "material tables" ???