TalkChess.com

Posted: **Fri Feb 03, 2012 4:23 pm**

Hi,

Maybe i'm using too much the well known msb and lsb routines to generate moves for sliding pieces, even in eval.

Never guess, profile and look whether this is your hotspot. If it is then moving to magic bitboards won't help you because the need to localize the positions of set bits in a bitboard does not go away just by changing the bitboard arithmetic.

Thomas...

Posted: **Fri Feb 03, 2012 7:05 pm**

asanjuan wrote:Hi all.
My engine is bitboard-based, but not using magics, nor rotated BB. I think it searches with a reasonable branching factor, but to my surprise and after reading some posts, i found that is quite slow in nps (2.5 Mnps without eval in a single intel core2).
Currently, my search only reaches 270knps. And as i said, it seems to me that is quite slow.
Maybe i'm using too much the well known msb and lsb routines to generate moves for sliding pieces, even in eval.

After reading the wiki, finally understood how magic bitboard works, and i'm planning to implement it into Rhetoric (my engine).

any advice from the experts?
(and please, don't say patience )

2.5 Mnps seems fast to me, Komodo does much less. Of course Komodo has an evaluation function ...
What is your hardware and how many Mnps do the other programs do on this hardware?

Posted: **Sat Feb 04, 2012 10:39 pm**

My perft function alone, doing only move generation, reaches 2.4 Mnps
The simple alpha-beta doing eval on quiescence reaches 600 knps, but aplying some heuristics, like lazy eval, futility, razoring, hash lookup, nullmove... the node speed is reduced to 270 knps for search.
My machine is a 2,4Mhz intel core 2.

On my hardware Fruit makes about 900knps on pure search, houdini about 1.5 Mnps...

Reading the posts of this forum, people say that they reach 16Mnps on perft tests, which is very impressive in my opinion.
This is why i'm trying to improve move generation speed.

... or am i missing something?

Posted: **Sat Feb 04, 2012 10:55 pm**

Never guess, profile and look whether this is your hotspot

I've never used a profile tool. I have always optimized code by "intuition" with excellent results

But, ok, is a very good answer, so thank you very much. I'll try to profile.
Do you know any good profiler for windows?
(at this moment is when my ignorance is revealed

)

Posted: **Sat Feb 04, 2012 10:59 pm**

asanjuan wrote: On my hardware Fruit makes about 900knps on pure search, houdini about 1.5 Mnps...

Reading the posts of this forum, people say that they reach 16Mnps on perft tests, which is very impressive in my opinion.
This is why i'm trying to improve move generation speed.

What person A calls a node may not be the same as what person B calls a node, so be careful when comparing raw numbers.
As to perft, you can get a massive speedup by doing legal-only move generation and simply return the number of legal moves directly at the leafs (as opposed to trying them all on the board and testing whether they're legal). This doesn't really pay off in a regular search (at least in my experience it doesn't although the penalty is not as bad as I would have thought). Finally, you can get better results by using transposition tables to speed up the perft calculation.

In other words, just looking at the numbers produced by different programs is meaningless unless you know exactly what those programs are doing. To test the speed of your move generator (plus make/unmake) a better measure is time-to-depth. My recommendation would be to compare with Crafty, which does nothing special to speed up perft (I'm sure there are other programs you can use for exactly the same thing for the same reason, but I did not try any of them).

EDIT: forgot to say: make sure you compile the program you're comparing against (in my suggestion, Crafty) with the same compiler and compiler options as you're using for your own program. If they're not optimal, then at least they're the same in both cases and you're not comparing apples and oranges.

Posted: **Sun Feb 05, 2012 12:30 am**

A minor point of terminology. There are no 'magic bitboards'. There are many methods to generate slider attacks with bitboards, some of which use 'magic' multiplication.

As a precursor to magic slider attack generation, the classic 4-ray, loop-free, branch-free method has some advantages, including being easy to understand. It has the same interface.

Code: Select all

BitBoard BishopAttacks&#40; int sq, BitBoard occ ) 
&#123;
  int         blockingSq; 
  BitBoard    attacks, partAttacks, blockers;

  attacks = gRayNW&#91;sq&#93;; 
  blockers = &#40;attacks & occ&#41; | 0x8000000000000000ULL; 
  blockingSq = GetLowestBit&#40; blockers ); 
  attacks ^= gRayNW&#91;blockingSq&#93;; // mask off beyond blocking square
  
  partAttacks = gRayNE&#91;sq&#93;; 
  blockers = &#40;partAttacks & occ&#41; | 0x8000000000000000ULL; 
  blockingSq = GetLowestBit&#40; blockers ); 
  partAttacks ^= gRayNE&#91;blockingSq&#93;; 
  attacks |= partAttacks; 

  partAttacks = gRaySE&#91;sq&#93;; 
  blockers = &#40;partAttacks & occ&#41; | 1ULL; 
  blockingSq = GetHighestBit&#40; blockers ); // note Get*Highest*Bit&#40;) here
  partAttacks ^= gRaySE&#91;blockingSq&#93;; 
  attacks |= partAttacks; 

  partAttacks = gRaySW&#91;sq&#93;; 
  blockers = &#40;partAttacks & occ&#41; | 1ULL; 
  blockingSq = GetHighestBit&#40; blockers ); // note Get*Highest*Bit&#40;) here
  partAttacks ^= gRaySW&#91;blockingSq&#93;; 
  return attacks | partAttacks; 
&#125;

The auxiliary bitboard arrays gRayNW[64]... are easily precomputed.
RookAttacks() is identical, but with orthogonal rays gRayN[]...

These slider attack generators have respectable performance. Replacing the function bodies with magic gives an increase in NPS that is noticeable and worthwhile, but not huge.

Posted: **Sun Feb 05, 2012 11:12 am**

I've never used a profile tool. I have always optimized code by "intuition" with excellent results

Then your intuition is better than mine. Some things are really counter intuitive. For an easy start I would you gprof, the profiler from gcc. It is included in the mingw suite and very easy to use and free. It will tell you in what functions your program spends most of the time, not necessarily why your program spends your time there.

From a first look I would also say the Intel VTune Amplifier is a very powerful tool, it will tell you per statement in your function what it contributes to the run time of the program. Unfortunately it is not free but there is an evaluation version available.

And don't run your profiler in a VMware image, it won't work. I know it is obvious, but stupid me tried even that.

Thomas...

Posted: **Sun Feb 05, 2012 11:37 am**

asanjuan wrote:Hi all.
My engine is bitboard-based, but not using magics, nor rotated BB. I think it searches with a reasonable branching factor, but to my surprise and after reading some posts, i found that is quite slow in nps (2.5 Mnps without eval in a single intel core2).
Currently, my search only reaches 270knps. And as i said, it seems to me that is quite slow.
Maybe i'm using too much the well known msb and lsb routines to generate moves for sliding pieces, even in eval.

After reading the wiki, finally understood how magic bitboard works, and i'm planning to implement it into Rhetoric (my engine).

any advice from the experts?
(and please, don't say patience )

Compared to magics, I've found rotated BB to be almost as fast. If you're just looking at a board + move generation code (ie. a pure perft) magics won't make a big difference. The only advantage of magics is that the occupancy is a single bitboard (i/o 4 different ones to maintain dynamically with rotated BB). It is not only faster but more elegant, and allows for set operations in O(1): example you want to remove all white pawns from thr occupance, just a single &~ and that's it, while rotated BB would inevitably end up in an ugly loop... I found that useful for the eval, hence the switch from rotated BB to magics in my engine.

If your engine performs perft calculations at an average of 2.5 Mnps, then that's quite slow indeed (mine averages around 25-30 Mnps and it's surely far from optimal). I don't think the bottle neck is that you don't use magics. I can't give you any precise advice, as I haven't looked at your code, but the best way to improve it is to run the compiler profiler and see what you can optimize. Of course it's good to rely on your intuition when you code to make sure your code is efficient, but when you have such a complex combination of interacting functions to optimize, forget your intuition and let the profiler teach you

. I've spent a fair amount of time profilng my perft, and it was definitely worth it.

Basically what I'm saying is that magics or rotated BB or whatever else is just a set of primitive operations, and the inefficieny is probably not in these operations but in the conception of the board and/or move generation that calls these primitive operations. Perhaps you should parse through the Stockfish code to see how a real artist does this! It should give you some idea on how to organize the board/movegen code efficiently.

Another area of improvement is the move sorting machinery. Let's say that your move ordering is the following:
1/ hash move
2/ non losing captures(*) by MVV/LVA or SEE
3/ killer moves
4/ losing captures by MVV/LVA or SEE
5/ quiet moves by history score
(*) "captures" include en passant captures and promotions

You need to make use of early beta cutoffs to implement a lazy move generation scheme. Examples:
* if the hash move triggers a beta cutoff, you can save the whole move generation (just have a function that tests the legality of the hash move). when the hash move exist it should be legal (with infinitesimal chances that it's not) and it will very often be the best move.
* if a non losing capture triggers a beta cutoff, you'll only have generated captures, and you'll save all the quiet move generation
* if a killer move generates a beta cutoff (again test the killer legality before, as it will often be illegal), same story, you can save the quiet move generation
* same story if a losing capture generates the cutoff

My movesorting machinery is the real bottleneck of my engine, and I plan to do what I describe above at some point, when the program is mature enough that raw speed really matters (if I ever reach that stage).

Posted: **Sun Feb 05, 2012 3:16 pm**

basically what I'm saying is that magics or rotated BB or whatever else is just a set of primitive operations, and the inefficieny is probably not in these operations but in the conception of the board and/or move generation that calls these primitive operations.

I tend to agree, but if lsb or msb is implemented very slow it will make a big difference, as it called so so often. The profiler will tell you

I recently go a nice speedup by investing some time in my popcount method.

Thomas...

Posted: **Mon Feb 06, 2012 12:59 pm**

As a precursor to magic slider attack generation, the classic 4-ray, loop-free, branch-free method has some advantages, including being easy to understand. It has the same interface.

Ok, This is the method that i'm currently using.

After taking a wile, i modified my perft function, counting leaf nodes after generating them, without doing/undoing the move.

After that, my perft "speed" reaches 19 Mnps, but this is really false... this measure has no sense for me.
because I can't predict the speed of my search with this function. Is only a measure of how many moves i can generate per second... not how many positions i can reach per second (wich is what i was looking for)[/quote]

TalkChess.com

Moving to Magic Bitboards... any advice?

Re: Moving to Magic Bitboards... any advice?

Re: Moving to Magic Bitboards... any advice?

Re: Moving to Magic Bitboards... any advice?

Re: Moving to Magic Bitboards... any advice?

Re: Moving to Magic Bitboards... any advice?

Re: Moving to Magic Bitboards... any advice?

Re: Moving to Magic Bitboards... any advice?

Re: Moving to Magic Bitboards... any advice?

Re: Moving to Magic Bitboards... any advice?

Re: Moving to Magic Bitboards... any advice?