bob wrote:
I have all the bitboards indexed by wtm. IE pawns[wtm] and so forth. All other masks are indexed the same way in the evaluation. Random hash values are also done the same way... Makes things far cleaner since all the white/black stuff is replaced by one general case that will work with either side. I now have things like
same sort of thing for move generation, you just pass it the color and it uses the same code. Only tiny exception is for pawns since negative shifts won't work on the PC and I don't want to make it too ugly (yet). I am looking at ways to generate pawn moves without the special-case white/black code as well, although 99% of the pawn generation code is now not duplicated...
Bob
I see, the generalized shift by rotate-and can be combined with the wrap 'and' for set-wise pawn attacks. Thus the additional cost for a generalized << +-8 is one read plus 'and'.
// positve left, negative right shifts
int shift[8] = {9, 1,-7,-8,-9,-1, 7, 8};
U64 avoidWrap[8] =
{
0xfefefefefefefe00,
0xfefefefefefefefe,
0x00fefefefefefefe,
0x00ffffffffffffff,
0x007f7f7f7f7f7f7f,
0x7f7f7f7f7f7f7f7f,
0x7f7f7f7f7f7f7f00,
0xffffffffffffff00,
};
U64 shiftOne (U64 b, int dir8) {
return rotateLeft (b, shift[dir8]) & avoidWrap[dir8];
}
#if defined X86INTRINSICS
U64 rotateLeft (U64 x, int s) {return _rotl64(x, s);}
#else
// likely translated to rol by "smart" compilers
U64 rotateLeft (U64 x, int s) {return (x << s) | (x >> (64-s));}
#endif
As I said, I have not yet tackled that. I already have one kludge that I don't like, which is the need for LSB() for advanced white pieces, MSB() for advanced black pieces. I buried that in a macro to make the code read better. Bottom line (so far) is that 22.0 is faster than 21.7, yet the node counts match in every test position in my validation suite. The smaller size has helped, plus a little cleaning up here and there has also helped...
Never did understand why + shifts are ok, but - shifts are not. Most every other machine I have programmed would do it either way just fine...
Edsel Apostol wrote:Ok. I think I've got your point.
If those things could be fixed on Strelka then it could catch up with the latest Rybka.
By the way, when will be the next Crafty release?
Good luck with CCT. My engine will be playing there also.
21.7 will be released immediately following CCT.
22.0 will not be that far behind, but there is a lot of rewriting going on to eliminate all the black/white code. It is painful, and will be even more painful when I get to the biggies which are things like EvaluatePawns(), EvaluatePassedPawns(), and such...
bob wrote:Only tiny exception is for pawns since negative shifts won't work on the PC and I don't want to make it too ugly (yet). I am looking at ways to generate pawn moves without the special-case white/black code as well, although 99% of the pawn generation code is now not duplicated...
Bob
I don't know how efficient this is, but here is something I have done to accomplish this.
Substituting a table lookup for (color << 4) would work also.
This is fine for pawn pushes, with most significant rank mapping (rank bit 3-5, file bit 0-2) so that black pawns may not temporary disappear from the board - three shift instructions.
I'm not sure if you are interested in platforms other than Intel, but... variable-length shifts are slow on certain platforms. The biggest surprise for me was the inorder PPC cores in the Xbox360--a variable-length shift instruction has a latency of 16 cycles on that thing.
wgarvin wrote:I'm not sure if you are interested in platforms other than Intel, but... variable-length shifts are slow on certain platforms. The biggest surprise for me was the inorder PPC cores in the Xbox360--a variable-length shift instruction has a latency of 16 cycles on that thing.
You are right - I implicitly assume variable shift by cl on core2 or K8/K10 x86-64 target cpus. core2 has 1 cycle latency, 0.5 cycles reciprocal throughput. K8 has 1 cycle latency, 1/3 cycles reciprocal throughput. Of course with 16 cycles latency, variable shift is a no no - and one better abjures general shift - or probably relies on something like this:
U64 genShift(U64 x, int s) {
int isNegative = s >> 31; // arithmetical shift -> -1 if negative, else 0
x >>= -s & isNegative;
x <<= s & ~isNegative;
return x;
}
Another idea is to try and use rotate instructions instead of shift instructions (if the rest of the bitboard is already masked off, or something). The rotate would work if you give it a "negative" 2's complement number. It has some downsides though...it might be more expensive than a shift, and its hard to convince C compilers to generate rotate instructions on their own.
Anyway for this case you have here it seems likely that those 3 shifts are pretty good. You guys surely know more about the performance characteristics of these modern x86-64 chips than I do. The last x86 chip I knew a lot about was the P4. Times have changed, and I haven't been paying enough attention.
[Edit: reading up in the thread, seems you've already been over the rotate ground. Sorry about that.]
wgarvin wrote:Another idea is to try and use rotate instructions instead of shift instructions (if the rest of the bitboard is already masked off, or something). The rotate would work if you give it a "negative" 2's complement number. It has some downsides though...it might be more expensive than a shift, and its hard to convince C compilers to generate rotate instructions on their own.
Anyway for this case you have here it seems likely that those 3 shifts are pretty good. You guys surely know more about the performance characteristics of these modern x86-64 chips than I do. The last x86 chip I knew a lot about was the P4. Times have changed, and I haven't been paying enough attention.
[Edit: reading up in the thread, seems you've already been over the rotate ground. Sorry about that.]
Hehe, no problem
Threads are hard to follow with this php-boards. I have the gut feeling this was discussed x-times before in the history of CCC (and elsewhere). One more reason to collect it in the wiki. http://chessprogramming.wikispaces.com/ ... erations23
I will add Russel's proposal for a generalized pawn shift in the pawn push section.
Guetti wrote:
So lets try to summarize the most important Strelka secrets:
1. ...........
2. Implementation of material tables
3. ...........
I think it is these "material tables", which Strelka simply took and copied from Rybka 1.0 Beta, which bring about the great leap in strength (+100 Elo), compared to Fruit 2.1
Without them, we would only have a bitboard Fruit 2.1
Do we know something about these "material tables" ???