Need some help on speed optimization

bob · Post by **bob** » Thu Nov 13, 2008 7:17 am

Dann Corbit wrote:
Jan Brouwer wrote:
bob wrote:Bitboards are the obvious future of computer chess.
Is this a commonly held opinion?
I am considering rewriting my engine and I can't make up my mind which way to go (at present I use a mailbox representation).
Now that 64 bit operating systems and compilers are available, I don't think there is any doubt.

"now"? I've been running on 64 bit hardware and software since 1980.

sje · Post by **sje** » Thu Nov 13, 2008 9:40 am

bob wrote:Look at the programs that have switched, from Rybka on down. It is not easy to switch without a rewrite, because now you think "sets" (bitmaps) rather than array loops. This takes some getting used to, so the learning curve is pretty slow. But eventually, you "see the light" and it works very well.

I don't think that the learning curve is that steep. Any real adaptation slowness is caused by the desire to squeeze out all possible inefficiencies with the details of low level bitboard realization. Much is gained just by having a simplistic bitboard implementation at first and deferring rotated bitboards, magic bitboards, etc., until later.

A good bitboard implementation will rid a program of a full level of looping in many, many places and this is a big payoff. But an even bigger payoff is that the use of bitboards provides a higher level of abstraction that makes programming easier.

Having a 64 bit processor is nice, but it's not strictly necessary. The new CIL toolkit uses a bitboard representation of a vector of four 16 bit integers, and that's faster than an offset (mailbox) approach in my tests. Using 32 bit or 64 bit values would be faster if such were well supported on the existing free Common Lisp environments. The implementation could be changed at the lowest level with all changes confined to a single source file if needed.

bob · Post by **bob** » Thu Nov 13, 2008 6:34 pm

sje wrote:
bob wrote:Look at the programs that have switched, from Rybka on down. It is not easy to switch without a rewrite, because now you think "sets" (bitmaps) rather than array loops. This takes some getting used to, so the learning curve is pretty slow. But eventually, you "see the light" and it works very well.
I don't think that the learning curve is that steep. Any real adaptation slowness is caused by the desire to squeeze out all possible inefficiencies with the details of low level bitboard realization. Much is gained just by having a simplistic bitboard implementation at first and deferring rotated bitboards, magic bitboards, etc., until later.

A good bitboard implementation will rid a program of a full level of looping in many, many places and this is a big payoff. But an even bigger payoff is that the use of bitboards provides a higher level of abstraction that makes programming easier.

Having a 64 bit processor is nice, but it's not strictly necessary. The new CIL toolkit uses a bitboard representation of a vector of four 16 bit integers, and that's faster than an offset (mailbox) approach in my tests. Using 32 bit or 64 bit values would be faster if such were well supported on the existing free Common Lisp environments. The implementation could be changed at the lowest level with all changes confined to a single source file if needed.

OK, I'll rephrase my statement.

If all you want to do is produce a crappy implementation of a chess engine, using bitboards, then there might not be a lot of "learning curve" issues. IF that is all you want to do. If you want to produce a _good_ implementation, then you have to change from thinking loops to thinking sets. I have done both types of programs. _extensively_. Code to do cute things like generate checks or check evasions is _far_ different when written efficiently. The evaluation code is _far_ different. Some do mobility. There are several ways to do this, one way is "bitboardy" the other way(s) are "loopy" no matter which representation you use.

My comments are almost never directed to the person that will be happy with "a crappy implementation." To do it _right_ takes some learning, and the curve was definitely there, at least for me. And I'm hardly a lousy programmer myself.

Tord Romstad · Post by **Tord Romstad** » Thu Nov 13, 2008 7:47 pm

bob wrote:I disagree. The way you use bitboards is _completely_ different than the way you use mailbox representation. With bitboards, you think "sets of squares" whereas with mailbox, you think "looping over a group of squares". The way you think about things is totally different. From generating checks (generate all moves and then cull non-checks to finding sets of squares that can give check and see if you can find a correct piece with a set of attack squares that intersects with the set of targets.

Generating checks would be different, yes. It is a good example of the little low-level components of a chess program which is likely to depend heavily on the board representation. Regardless of the board representation, however, a generator for checking moves is easy to write, and easy to replace.

Different philosophies. I don't go for speed "up front". But I try to never design with blinders on so that I make decisions that will make it very difficult to go fast later. That is an easy mistake to make, and you end up rewriting things that could have been designed differently up front.

That's precisely why I prefer to always optimize for flexibility. I don't care if all my functions are as fast as technically possible; but I do care that if any part of my program turns out to be too slow, I can easily replace it without massive changes throughout the program.

For chess-related stuff, yes. But for things like evaluating mobility, or generating just checks, or generating just legal check evasions, it is a different thing.

I evaluate mobility in precisely the same way both with and without bitboards: I count the number of attacked squares not occupied by friendly pieces. Of course, exactly how the counting is done depends on the board representation, but it is hardly an interesting or important part of the program, and is easily hidden in a little low-level function. In the main evaluation function, I think

Code: Select all

score += RookMobilityWeight * board.rook_mobility(square);

is both more readable and more flexible than

Code: Select all

score += RookMobilityWeight * count_1s(board.rook_attacks(square) & ~friendlyPieces);

Generating checks and check evasions, as mentioned before, are just semi-trivial low-level which are easy to write and easy to replace, regardless of the board representation.

Ditto for evaluation.

I concede that there is currently a considerable amount of direct bitboard manipulation in my evaluation function, but it doesn't have to be that way.
I consider this to be one of the main weaknesses of my program.

You can ask multiple questions with a single AND if you know how to do it and plan on doing that from the beginning.

That's the kind of thing I never want to do, at least not if it ties me to one specific board representation. I have a general dislike for clever-looking code, unless it can be hidden behind some convenient abstraction.

Without doubt, this difference in philosophy is one of the main reasons why your program is faster than mine, but it is also the reason my program isn't that much weaker, despite being developed with a tiny fraction of the development and testing time, and by a vastly inferior programmer. Writing code in such a way that the board representation becomes unimportant and invisible makes it possible even for mediocre programmers to write a decent chess program.

Nobody can argue against your successes, so your approach surely works for you. However, I am sure it wouldn't work for the average programmer. Very few programmers could have written Crafty. Almost anyone could have written Glaurung.

Tord

Tord Romstad · Post by **Tord Romstad** » Thu Nov 13, 2008 7:54 pm

sje wrote:Using 32 bit or 64 bit values would be faster if such were well supported on the existing free Common Lisp environments. The implementation could be changed at the lowest level with all changes confined to a single source file if needed.

Have you tried wrapping the arithmetic operations inside a logand with #xFFFFFFFF (or #xFFFFFFFFFFFFFFFF for 64-bit operations)? Most good Common Lisp compilers should be able to optimize this into native 32-bit or 64-bit arithmetics. I don't know most of the free Lisps, but I know that at least SBCL does this for 32-bit arithmetics. I am not sure whether there is a 64-bit SBCL yet.

Of course, writing (logand ... #xFFFFFFFF) everywhere in your code would be annoying, but you should probably be able to hide it with some clever macrology. You could, for instance, try to create a with-64bit-arithmetics macro which walks through a block of code and rewrites all arithmetic expressions using the logand trick.

Tord

bob · Post by **bob** » Thu Nov 13, 2008 8:38 pm

Tord Romstad wrote:
bob wrote:I disagree. The way you use bitboards is _completely_ different than the way you use mailbox representation. With bitboards, you think "sets of squares" whereas with mailbox, you think "looping over a group of squares". The way you think about things is totally different. From generating checks (generate all moves and then cull non-checks to finding sets of squares that can give check and see if you can find a correct piece with a set of attack squares that intersects with the set of targets.
Generating checks would be different, yes. It is a good example of the little low-level components of a chess program which is likely to depend heavily on the board representation. Regardless of the board representation, however, a generator for checking moves is easy to write, and easy to replace.

Different philosophies. I don't go for speed "up front". But I try to never design with blinders on so that I make decisions that will make it very difficult to go fast later. That is an easy mistake to make, and you end up rewriting things that could have been designed differently up front.
That's precisely why I prefer to always optimize for flexibility. I don't care if all my functions are as fast as technically possible; but I do care that if any part of my program turns out to be too slow, I can easily replace it without massive changes throughout the program.

For chess-related stuff, yes. But for things like evaluating mobility, or generating just checks, or generating just legal check evasions, it is a different thing.
I evaluate mobility in precisely the same way both with and without bitboards: I count the number of attacked squares not occupied by friendly pieces. Of course, exactly how the counting is done depends on the board representation, but it is hardly an interesting or important part of the program, and is easily hidden in a little low-level function. In the main evaluation function, I think
Code: Select all
score += RookMobilityWeight * board.rook_mobility(square);
is both more readable and more flexible than
Code: Select all
score += RookMobilityWeight * count_1s(board.rook_attacks(square) & ~friendlyPieces);
Generating checks and check evasions, as mentioned before, are just semi-trivial low-level which are easy to write and easy to replace, regardless of the board representation.

Ditto for evaluation.
I concede that there is currently a considerable amount of direct bitboard manipulation in my evaluation function, but it doesn't have to be that way.
I consider this to be one of the main weaknesses of my program.

You can ask multiple questions with a single AND if you know how to do it and plan on doing that from the beginning.
That's the kind of thing I never want to do, at least not if it ties me to one specific board representation. I have a general dislike for clever-looking code, unless it can be hidden behind some convenient abstraction.

Without doubt, this difference in philosophy is one of the main reasons why your program is faster than mine, but it is also the reason my program isn't that much weaker, despite being developed with a tiny fraction of the development and testing time, and by a vastly inferior programmer. Writing code in such a way that the board representation becomes unimportant and invisible makes it possible even for mediocre programmers to write a decent chess program.

Nobody can argue against your successes, so your approach surely works for you. However, I am sure it wouldn't work for the average programmer. Very few programmers could have written Crafty. Almost anyone could have written Glaurung.

Tord

I would disagree _strongly_ with both of those last statements.

The only thing that really sets crafty apart, speed-wise, is that I am intimately familiar with computer architecture and assembly language programming, and compiler translation, so that I tend to look at things from a different perspective, knowing that if I write something in C one way, the final product will be more or less efficient than if I write it another way.

I doubt many could write code as strong as G2 either. I test against it all the time. Using books, Crafty does "OK" against glaurung. Using my test positions, it does less "OK" but primarily because that forces us to play positions neither of us would normally choose to play, and differences in the two programs can be exaggerated. I'm working on that as we speak.

But I have rewritten my programs too many times. And each time it was because of a poor decision made early-on that dictated major changes to get around some sort of performance bottleneck for a desired new feature. I have not rewritten Crafty nearly as many times, perhaps 2-3 including this most recent black/white elimination. That was the result of just not thinking about efficient ways to do things without the duplication, and it led to code that was extraordinarily hard to modify without introducing symmetry bugs. Had I thought more about the issue before starting to write code, I could have even avoided that rewrite.

sje · Post by **sje** » Thu Nov 13, 2008 9:40 pm

Tord Romstad wrote:
sje wrote:Using 32 bit or 64 bit values would be faster if such were well supported on the existing free Common Lisp environments. The implementation could be changed at the lowest level with all changes confined to a single source file if needed.
Have you tried wrapping the arithmetic operations inside a logand with #xFFFFFFFF (or #xFFFFFFFFFFFFFFFF for 64-bit operations)? Most good Common Lisp compilers should be able to optimize this into native 32-bit or 64-bit arithmetics.

The problem here is that any use of a literal integer value outside the fixnum bounds will contagiously promote all operations into bignum territory, at least on the platforms I'm using. Further, the fixnum limit is only 24 bits for some environments.

The better solution would be to use simple bit vectors throughout, except that my tests showed significant lossage due to poor implementations. I guess that having fast simple bit vector operations is not a high priority among Lisp compiler writers.

Zach Wegner · Post by **Zach Wegner** » Thu Nov 13, 2008 10:00 pm

Hello Tord,

Tord Romstad wrote:Surprisingly, I find myself to disagree with almost everything in Bob's last two posts, probably because I am a high-level guy while Bob is a low-level guy.

Well I'm going to have to disagree here too.

I wouldn't necessarily consider myself a low-level guy. Lower than you, but that's not saying much. Probably about a third of my program would have to be radically changed or rewritten. I don't know if you would consider that difficult, but it's certainly a pain. I have changed representations a few times (I used to use bitboards+0x88, and I've used several different bitboard representations through the years), and while it usually took less than a day IIRC, it wasn't fun. Converting to mailbox would be even harder. The entire move generation, in check, evaluation, make/unmake, and several other smaller functions need a complete rewrite. I guess my issue with it is not that it's hard, but that this code is for the most tedious to write by far. I like experimenting with move generation ideas, but actually writing the move generator is completely boring. Same with make/unmake.

I think if you can convert said functions from bitboard to mailbox without substantially rewriting them, then you have abstracted too much. OTOH, considering how slow and stupid you always say your program is, you need to keep in mind that your engine is still significantly faster than mine in NPS despite having a much larger (and smarter) eval. So that abstraction might not be so bad after all.

Zach

Need some help on speed optimization

Re: Need some help on speed optimization

Bitboard vs offset (mailbox)

Re: Bitboard vs offset (mailbox)

Re: Need some help on speed optimization

Re: Bitboard vs offset (mailbox)

Re: Need some help on speed optimization

Re: Bitboard vs offset (mailbox)

Re: Need some help on speed optimization