SEE with alpha beta

Onno Garms · Post by **Onno Garms** » Sun Aug 14, 2011 10:06 pm

There are two different ways to implement SEE.

One is to have some early shortcuts (e.g. in see_sign) and if they don't lead to cutoffs, build the complete swap list first and evaluate it at the end. This approach was for example taken by Fruit and Stockfish.

At first glance this approach seems to be inferior, because once the swap list is started, it is always built completely even if its later evaluation shows that only the first terms would be needed.

Several people have written SEE with alpha beta independently. Me in Onno, Fritz in Loop and HGM here: http://www.talkchess.com/forum/viewtopi ... 82&t=30905
(Note that HGM seems not to prevent moving into check. This cannot be handled with infinite king value if the king is captured by the other king which in turn can be recaptured. Test with this position: 8/8/3p4/4r3/2RKP3/5k2/8/8 b - - ).

I thought that alpha beta SEE would be better. Marco disagreed because of the extra branches.

I gave an alpha beta SEE a try in Stockfish. My result was that the speed is practically the same. nps on "stockfish bench" has no significant differences. Cachegrind estimates time for SEE in at 6.8% in the original and 6.7% with alpha beta (both with my probcut patch, at depth 11). Your mileage may vary, but I would be surprised if either version is significantly faster than the other if a different computer is used.

Anyway, here is the SEE function. Supposed to replace both see and see_sign. Call with update=false if the search window has zero width.

Code: Select all

template<bool update> int Position&#58;&#58;see &#40;Square from, Square to, int alpha, int beta&#41; const
&#123;
    assert&#40;square_is_ok&#40;from&#41;);
    assert&#40;square_is_ok&#40;to&#41;);

    // handle castle moves
    if &#40;color_of_piece_on&#40;from&#41;==color_of_piece_on&#40;to&#41;)
        return 0; // TODO&#58; for FRC, consider to compute SEE of rook move instead

    
    // remove his piece from target square
    // -------------------------------------------------------------

    // get his piece to remove
    PieceType capturedType = type_of_piece_on&#40;to&#41;;
    
    // King cannot be recaptured; illegal moves are pruned at higher level
    if &#40;type_of_piece_on&#40;from&#41; == KING&#41; 
        return seeValues&#91;capturedType&#93;;

    Bitboard occupied = occupied_squares&#40;);

    // Handle en passant moves
    if &#40;st->epSquare == to && type_of_piece_on&#40;from&#41; == PAWN&#41;
    &#123;
        Square capQq = &#40;side_to_move&#40;) == WHITE ? to - DELTA_N &#58; to - DELTA_S&#41;;
        
        assert&#40;capturedType == PIECE_TYPE_NONE&#41;;
        assert&#40;type_of_piece_on&#40;capQq&#41; == PAWN&#41;;
        
        // Remove the captured pawn
        clear_bit&#40;&occupied, capQq&#41;;
        capturedType = PAWN;
    &#125;
    
    int eval = seeValues&#91;capturedType&#93;;

    // Assume that my piece cannot be &#40;re&#41;captured.
    // If the move is still not good enough, return.
    if &#40;eval <= alpha&#41;
        return alpha;

    // update
    if &#40;update && eval < beta&#41;
        beta = eval;

    
    // my initial move 
    // -------------------------------------------------------------

    // get my moving piece
    capturedType = type_of_piece_on &#40;from&#41;;

    // early check&#58; assume that my piece can be &#40;re&#41;captured.
    //              If the move is still good, return.
    eval -= seeValues&#91;capturedType&#93;;
    if &#40;eval >= beta&#41;
        return beta;

    // update
    if &#40;update && eval > alpha&#41;
        alpha = eval;

    // execute the move
    clear_bit&#40;&occupied, from&#41;;


    // some setup
    // -----------------------------------------------------------

    // Find all attackers to the destination square
    Bitboard attackers =  &#40;rook_attacks_bb&#40;to, occupied&#41;  & pieces&#40;ROOK, QUEEN&#41;)
                        | &#40;bishop_attacks_bb&#40;to, occupied&#41;& pieces&#40;BISHOP, QUEEN&#41;)
                        | &#40;attacks_from<KNIGHT>&#40;to&#41;       & pieces&#40;KNIGHT&#41;)
                        | &#40;attacks_from<KING>&#40;to&#41;         & pieces&#40;KING&#41;)
                        | &#40;attacks_from<PAWN>&#40;to, WHITE&#41;  & pieces&#40;PAWN, BLACK&#41;)
                        | &#40;attacks_from<PAWN>&#40;to, BLACK&#41;  & pieces&#40;PAWN, WHITE&#41;);

    // setup colors
    Color me = color_of_piece_on&#40;from&#41;;
    Color him = opposite_color &#40;me&#41;;

    
    while &#40;true&#41;
    &#123;
        // his &#40;re&#41;capture
        // ------------------------------------------------------------
        
        // get his attackers
        Bitboard stmAttackers = attackers & pieces_of_color&#40;him&#41;;
        
        // no more captures possible&#58; return
        if (!stmAttackers&#41;
            return beta;

        // last move or capture was illegal
        if &#40;capturedType==KING&#41;
            return alpha;
        
        // find his cheapest attacker
        for &#40;capturedType = PAWN; !&#40;stmAttackers & pieces&#40;capturedType&#41;); capturedType++);
        
        // early check&#58; assume that his piece can be recaptured
        //              if that would not be sufficient, return
        eval += seeValues&#91;capturedType&#93;;
        if &#40;eval <= alpha&#41;
            return alpha;

        // update
        if &#40;update && eval < beta&#41;
            beta = eval;
        
        // execute his &#40;re&#41; capture
        Bitboard b = stmAttackers & pieces&#40;capturedType&#41;;
        occupied ^= b & (~b + 1&#41;;
        attackers |=  &#40;rook_attacks_bb&#40;to, occupied&#41;   & pieces&#40;ROOK, QUEEN&#41;)
                    | &#40;bishop_attacks_bb&#40;to, occupied&#41; & pieces&#40;BISHOP, QUEEN&#41;);
        attackers &= occupied;


        // my recapture
        // -----------------------------------------------------------
        
        // get my attackers
        stmAttackers = attackers & pieces_of_color&#40;me&#41;;
        
        // no more captures possible&#58; return
        if (!stmAttackers&#41;
            return alpha;

        // last &#40;re&#41;capture was illegal
        if &#40;capturedType==KING&#41;
            return beta;
        
        // find my cheapest attacker
        for &#40;capturedType = PAWN; !&#40;stmAttackers & pieces&#40;capturedType&#41;); capturedType++);
        
        // early check&#58; assume that my piece can be recaptured.
        //              If the move is still good, return
        eval -= seeValues&#91;capturedType&#93;;
        if &#40;eval >= beta&#41;
            return beta;

        // update
        if &#40;update && eval > alpha&#41;
            alpha = eval;
        
        // execute my capture
        b = stmAttackers & pieces&#40;capturedType&#41;;
        occupied ^= b & (~b + 1&#41;;
        attackers |=  &#40;rook_attacks_bb&#40;to, occupied&#41;   & pieces&#40;ROOK, QUEEN&#41;)
            | &#40;bishop_attacks_bb&#40;to, occupied&#41; & pieces&#40;BISHOP, QUEEN&#41;);
        attackers &= occupied;
    &#125;
&#125;

hgm · Post by **hgm** » Sun Aug 14, 2011 11:11 pm

Onno Garms wrote:(Note that HGM seems not to prevent moving into check. This cannot be handled with infinite king value if the king is captured by the other king which in turn can be recaptured. Test with this position: 8/8/3p4/4r3/2RKP3/5k2/8/8 b - - ).

In the actual implementation for Joker I just posted ( http://www.talkchess.com/forum/viewtopi ... 6&start=12 ), moving in check is prevented by the statement

if(w>9) return b;

which is a fail-hard beta cutoff (b = beta) in case an attacker has been established, and the value of the piece on the exchange square (w) is larger than 9 (meaning it must be King; qval[QUEEN] = 9). This should prevent King exchange.

I do not bother to test for pins, however, except for the first recapture of the exchange. (I wanted to be sure I would detect cases where I could safely capture a piece that was only pseudo-legally 'protected' by a single pinned piece.

Desperado · Post by **Desperado** » Mon Aug 15, 2011 12:16 am

Onno Garms wrote:
One is to have some early shortcuts (e.g. in see_sign) and if they don't lead to cutoffs, build the complete swap list first and evaluate it at the end.

... because once the swap list is started, it is always built completely even if its later evaluation shows that only the first terms would be needed.

That is not true.

... prevent moving into check. This cannot be handled with infinite king value if the king is captured by the other king which in turn can be recaptured. Test with this position: 8/8/3p4/4r3/2RKP3/5k2/8/8 b - - ).
...

Done. value will be pawn value because the kingValue is caught by
minimax algorithm.(but maybe i miss your point...)

I thought that alpha beta SEE would be better. Marco disagreed because of the extra branches.

no extra branches! (well, nearly!)

conclusion:
=============

Today i posted a solution which uses _pruning_ for swaplist algorithm.
I was not able to detect a bug so far.

Sorry if i post it a second time my SEE code, but beside some missing features like pins,
or the restriction to use lva ordered captures (which is common behaviour) it simply shows that your arguments do not match.

Well, maybe i missed the point with the king running into check, but
that has no effect for the result, do i miss sth. ?

here it is...

Code: Select all

int seeMove&#40;pos_t *pos,mv_t mve&#41; 
&#123; 
 int gain&#91;32&#93;, d ,aPiece,sidePick,dst = unpackDst&#40;mve&#41;; 
 ui64_t mayXray,fromSet,occ,attadef; 
  
 if&#40;moveIsOO&#40;mve&#41;) return&#40;0&#41;; 
 if&#40;moveIsEP&#40;mve&#41;) return&#40;100&#41;; 

 aPiece    = pos->sq&#91;unpackSrc&#40;mve&#41;&#93;; 
 gain&#91;d=0&#93; = vSee&#91;pos->sq&#91;dst&#93;&#93;; 
 /*exit with see_sign as option possible here*/ 
 sidePick  = pos->ctm ^ side; 
 fromSet   = onebit&#40;unpackSrc&#40;mve&#41;); 
 occ       = OCCUPIED; 
 attadef   = attackers&#40;pos,dst,white&#41; | attackers&#40;pos,dst,black&#41; | fromSet; 

 mayXray   =   occ 
            ^ &#40;pos->bb&#91;wnEnum&#93; | pos->bb&#91;bnEnum&#93; 
            |  pos->bb&#91;wkEnum&#93; | pos->bb&#91;bkEnum&#93;); 

 do 
 &#123; 
  d++; gain&#91;d&#93; = vSee&#91;aPiece&#93; - gain&#91;d-1&#93;; 

  //********************************************************
  //* PRUNING ! 
  //* &#40;interruption of filling the swaplist,which does 
  //*  _not_ influence the result,and is not very "branched"
  //* with respect to code complexity&#41;
  //********************************************************

  if&#40;-maxN&#40;-gain&#91;d-1&#93;,gain&#91;d&#93;)>0&#41; break; 

  attadef^= fromSet; 
  occ    ^= fromSet; 

  if&#40;fromSet & mayXray&#41; 
  &#123; 
   attadef |= attackSlider&#40;pos,occ,dst,white&#41;; 
   attadef |= attackSlider&#40;pos,occ,dst,black&#41;; 
  &#125; 

  for&#40;aPiece=wpEnum+&#40;sidePick&#41;;aPiece<=wkEnum+&#40;sidePick&#41;;aPiece+=2&#41; 
    if&#40;fromSet = pos->bb&#91;aPiece&#93; & attadef&#41; 
      &#123;fromSet &=- fromSet;sidePick^=side;break;&#125; 

 &#125;while&#40;fromSet&#41;; 

 while&#40;--d&#41; gain&#91;d-1&#93; = -maxN&#40;-gain&#91;d-1&#93;,gain&#91;d&#93;); return&#40;gain&#91;0&#93;); 
&#125;

Michael

Don · Post by **Don** » Sat Aug 20, 2011 12:11 am

Onno Garms wrote:There are two different ways to implement SEE.

One is to have some early shortcuts (e.g. in see_sign) and if they don't lead to cutoffs, build the complete swap list first and evaluate it at the end. This approach was for example taken by Fruit and Stockfish.

At first glance this approach seems to be inferior, because once the swap list is started, it is always built completely even if its later evaluation shows that only the first terms would be needed.

Several people have written SEE with alpha beta independently. Me in Onno, Fritz in Loop and HGM here: http://www.talkchess.com/forum/viewtopi ... 82&t=30905
(Note that HGM seems not to prevent moving into check. This cannot be handled with infinite king value if the king is captured by the other king which in turn can be recaptured. Test with this position: 8/8/3p4/4r3/2RKP3/5k2/8/8 b - - ).

I thought that alpha beta SEE would be better. Marco disagreed because of the extra branches.

SEE takes very little time in the grand scheme of things, so the difference in a fast and slow implementation may be significant and yet still only make a very small difference in the execution speed of the programs. But of course you still want it to be as efficient as possible - you cannot give up half a percent here, another half there and so on without it adding up.

I gave an alpha beta SEE a try in Stockfish. My result was that the speed is practically the same. nps on "stockfish bench" has no significant differences. Cachegrind estimates time for SEE in at 6.8% in the original and 6.7% with alpha beta (both with my probcut patch, at depth 11). Your mileage may vary, but I would be surprised if either version is significantly faster than the other if a different computer is used.

Anyway, here is the SEE function. Supposed to replace both see and see_sign. Call with update=false if the search window has zero width.

Code: Select all

template<bool update> int Position&#58;&#58;see &#40;Square from, Square to, int alpha, int beta&#41; const
&#123;
    assert&#40;square_is_ok&#40;from&#41;);
    assert&#40;square_is_ok&#40;to&#41;);

    // handle castle moves
    if &#40;color_of_piece_on&#40;from&#41;==color_of_piece_on&#40;to&#41;)
        return 0; // TODO&#58; for FRC, consider to compute SEE of rook move instead

    
    // remove his piece from target square
    // -------------------------------------------------------------

    // get his piece to remove
    PieceType capturedType = type_of_piece_on&#40;to&#41;;
    
    // King cannot be recaptured; illegal moves are pruned at higher level
    if &#40;type_of_piece_on&#40;from&#41; == KING&#41; 
        return seeValues&#91;capturedType&#93;;

    Bitboard occupied = occupied_squares&#40;);

    // Handle en passant moves
    if &#40;st->epSquare == to && type_of_piece_on&#40;from&#41; == PAWN&#41;
    &#123;
        Square capQq = &#40;side_to_move&#40;) == WHITE ? to - DELTA_N &#58; to - DELTA_S&#41;;
        
        assert&#40;capturedType == PIECE_TYPE_NONE&#41;;
        assert&#40;type_of_piece_on&#40;capQq&#41; == PAWN&#41;;
        
        // Remove the captured pawn
        clear_bit&#40;&occupied, capQq&#41;;
        capturedType = PAWN;
    &#125;
    
    int eval = seeValues&#91;capturedType&#93;;

    // Assume that my piece cannot be &#40;re&#41;captured.
    // If the move is still not good enough, return.
    if &#40;eval <= alpha&#41;
        return alpha;

    // update
    if &#40;update && eval < beta&#41;
        beta = eval;

    
    // my initial move 
    // -------------------------------------------------------------

    // get my moving piece
    capturedType = type_of_piece_on &#40;from&#41;;

    // early check&#58; assume that my piece can be &#40;re&#41;captured.
    //              If the move is still good, return.
    eval -= seeValues&#91;capturedType&#93;;
    if &#40;eval >= beta&#41;
        return beta;

    // update
    if &#40;update && eval > alpha&#41;
        alpha = eval;

    // execute the move
    clear_bit&#40;&occupied, from&#41;;


    // some setup
    // -----------------------------------------------------------

    // Find all attackers to the destination square
    Bitboard attackers =  &#40;rook_attacks_bb&#40;to, occupied&#41;  & pieces&#40;ROOK, QUEEN&#41;)
                        | &#40;bishop_attacks_bb&#40;to, occupied&#41;& pieces&#40;BISHOP, QUEEN&#41;)
                        | &#40;attacks_from<KNIGHT>&#40;to&#41;       & pieces&#40;KNIGHT&#41;)
                        | &#40;attacks_from<KING>&#40;to&#41;         & pieces&#40;KING&#41;)
                        | &#40;attacks_from<PAWN>&#40;to, WHITE&#41;  & pieces&#40;PAWN, BLACK&#41;)
                        | &#40;attacks_from<PAWN>&#40;to, BLACK&#41;  & pieces&#40;PAWN, WHITE&#41;);

    // setup colors
    Color me = color_of_piece_on&#40;from&#41;;
    Color him = opposite_color &#40;me&#41;;

    
    while &#40;true&#41;
    &#123;
        // his &#40;re&#41;capture
        // ------------------------------------------------------------
        
        // get his attackers
        Bitboard stmAttackers = attackers & pieces_of_color&#40;him&#41;;
        
        // no more captures possible&#58; return
        if (!stmAttackers&#41;
            return beta;

        // last move or capture was illegal
        if &#40;capturedType==KING&#41;
            return alpha;
        
        // find his cheapest attacker
        for &#40;capturedType = PAWN; !&#40;stmAttackers & pieces&#40;capturedType&#41;); capturedType++);
        
        // early check&#58; assume that his piece can be recaptured
        //              if that would not be sufficient, return
        eval += seeValues&#91;capturedType&#93;;
        if &#40;eval <= alpha&#41;
            return alpha;

        // update
        if &#40;update && eval < beta&#41;
            beta = eval;
        
        // execute his &#40;re&#41; capture
        Bitboard b = stmAttackers & pieces&#40;capturedType&#41;;
        occupied ^= b & (~b + 1&#41;;
        attackers |=  &#40;rook_attacks_bb&#40;to, occupied&#41;   & pieces&#40;ROOK, QUEEN&#41;)
                    | &#40;bishop_attacks_bb&#40;to, occupied&#41; & pieces&#40;BISHOP, QUEEN&#41;);
        attackers &= occupied;


        // my recapture
        // -----------------------------------------------------------
        
        // get my attackers
        stmAttackers = attackers & pieces_of_color&#40;me&#41;;
        
        // no more captures possible&#58; return
        if (!stmAttackers&#41;
            return alpha;

        // last &#40;re&#41;capture was illegal
        if &#40;capturedType==KING&#41;
            return beta;
        
        // find my cheapest attacker
        for &#40;capturedType = PAWN; !&#40;stmAttackers & pieces&#40;capturedType&#41;); capturedType++);
        
        // early check&#58; assume that my piece can be recaptured.
        //              If the move is still good, return
        eval -= seeValues&#91;capturedType&#93;;
        if &#40;eval >= beta&#41;
            return beta;

        // update
        if &#40;update && eval > alpha&#41;
            alpha = eval;
        
        // execute my capture
        b = stmAttackers & pieces&#40;capturedType&#41;;
        occupied ^= b & (~b + 1&#41;;
        attackers |=  &#40;rook_attacks_bb&#40;to, occupied&#41;   & pieces&#40;ROOK, QUEEN&#41;)
            | &#40;bishop_attacks_bb&#40;to, occupied&#41; & pieces&#40;BISHOP, QUEEN&#41;);
        attackers &= occupied;
    &#125;
&#125;

hgm · Post by **hgm** » Sun Aug 21, 2011 12:09 pm

Don wrote:SEE takes very little time in the grand scheme of things, so the difference in a fast and slow implementation may be significant and yet still only make a very small difference in the execution speed of the programs. But of course you still want it to be as efficient as possible - you cannot give up half a percent here, another half there and so on without it adding up.

Well, I measured it in Joker, and even after this optimalization in a reasonably quiet position (after 1.e4 e5 2.Nf3 Nc6 3. Bc4 Bc5) SEE takes 12.5% of the time. So it is not that small.

Don · Post by **Don** » Sun Aug 21, 2011 5:03 pm

hgm wrote:
Don wrote:SEE takes very little time in the grand scheme of things, so the difference in a fast and slow implementation may be significant and yet still only make a very small difference in the execution speed of the programs. But of course you still want it to be as efficient as possible - you cannot give up half a percent here, another half there and so on without it adding up.
Well, I measured it in Joker, and even after this optimalization in a reasonably quiet position (after 1.e4 e5 2.Nf3 Nc6 3. Bc4 Bc5) SEE takes 12.5% of the time. So it is not that small.

How did you measure it? Did you profile it or do something else?

It's not even measurable in Komodo which is why I mention this. The way I measure the overall effect of a single routine on the speed of the program is to call it twice and see what the difference is.

I do some things to prevent the compiler from optimizing away the second call just in case it is smart enough to realize it is a pure function with no side effects.

one see: 10 .................................................. 0.646 1.4033
two see 10 .................................................. 0.645 1.4069

This basically says that after 10 passes of the komodo self test the average position takes 0.646 seconds and does 1.4033 million nodes.

The version that calls see() twice is slightly FASTER! In other words the slowdown cannot be measured and the noise in the tester overwhelms it.

The test is deterministic and fails if the program returns a different node code, score or PV and is based on 50 positions to depth 13.

I don't think my see() routine is particularly outstanding - I tried to make it so but I didn't spend a lot of time obsessing over it. I do the usual stuff like aborting as soon as I know the move loses or does not lose.

So the conclusion based on our combined results is that it may not matter, or it may matter a lot. Komodo is what I would call a "heavy" program, which to me means it spends a lot more time on "other things", as opposed to moving from node to node as fast as possible. (We move as fast as we can, but not as fast as possible.)

hgm · Post by **hgm** » Sun Aug 21, 2011 5:55 pm

Indeed, I profiled it. Now that I ported Joker to Linux recently, I could finally do that. Under Cygwin every function always took 0%...

I had 48% Eval, 23% Search, 12.5% SEE, 7% GenCaptures, 5% GenMoves and some minor stuff. Not very scientific, as I only used a single position only. Joker's eval is not very elaborate, but it is not very efficient either. A lot of time goes into Pawn evaluation, and it has no Pawn hash. So I could probably speed up Eval significantly, making the SEE contribution even heavier. And I don't subject LxH and equal captures to SEE.

Calling a function twice might bemisleading, because the second time all datait needs will be in cache, and perhaps the branch prediction logic learned to perfectly predict all branches from the previous time. (And newer CPUs might have a uOps cache where the function is still loaded.) It sounds like a very error-prone method to me.

Don · Post by **Don** » Sun Aug 21, 2011 7:27 pm

hgm wrote:Indeed, I profiled it. Now that I ported Joker to Linux recently, I could finally do that. Under Cygwin every function always took 0%...

I had 48% Eval, 23% Search, 12.5% SEE, 7% GenCaptures, 5% GenMoves and some minor stuff. Not very scientific, as I only used a single position only. Joker's eval is not very elaborate, but it is not very efficient either. A lot of time goes into Pawn evaluation, and it has no Pawn hash. So I could probably speed up Eval significantly, making the SEE contribution even heavier. And I don't subject LxH and equal captures to SEE.

Calling a function twice might bemisleading, because the second time all datait needs will be in cache, and perhaps the branch prediction logic learned to perfectly predict all branches from the previous time. (And newer CPUs might have a uOps cache where the function is still loaded.) It sounds like a very error-prone method to me.

Actually, using gcc profile is the most error prone method - it's only good for getting approximate numbers. Let me tell you why:

First of all, there is no caching issue because I'm not calling the same function twice. I made 2 copies of the same function and just changed the names. Then the original name I used as a wrapper - it is now the function that calls both functions and get's a result from each. And just to make absolutely sure the compiler doesn't notice that I ignore one of the results I compared the 2 resutls and return the "larger" of the two (they will always be the same of course, but the compiler doesn't know that.)

So I've actually increased the overhead as you will see - it looks something like this:

int see_cloneA( posiiton *p, mv );
int see_cloneB( positoin *p, mv );

see( position *p, int mv )
{
int x = see_cloneA(p, mv);
int y = see_cloneB(p, mv);

if (x > y) return x;
return y;
}

So now every time the program calls see(), instead of 1 function call/return you get three! And there is no special cache improvement from calling the same function twice in a row.

So I personally believe that profiling is very imprecise and this is far better. Here is the explanation for why that is:

Profiling is subject to a significant amount of noise because it's based on sampling and it actually changes the code - the code you are profiling is NOT the same code you run thanks to -pg flag to gcc which inserts extra code all over the place to make profiling possible. This can have really bad effects on caching and execution speed - far more than my method.

There is no perfect way (except perhaps with special purpose hardware) to profile code, but this method I think is far more accurate (but also far more tedious) than using gcc to profile code.

Of course this is only good to measure the overhead of a single function, it would not work well if I wanted a full program detailed profile of every function in the program, but in this case I did not need that.

However I do occasionally use the gcc profile and see() is never even on the radar. GCC profile is a very useful tool to get a rough idea of the execution speed of various functions.

hgm · Post by **hgm** » Sun Aug 21, 2011 8:25 pm

I thought that profiling was done by requesting periodic timer interrupts, and let the interrupt routine examine the return address, to decide in which subroutine it falls (with the aid of mapping info made for the purpose by the compiler). This would not require any code change, except in the startup code (to set up the timer interrupt, reserve memory space for the histogram, and later save it on file).

It could be that an instruction is added to increment a counter on every function call. But that does not sound very invasive.

Don · Post by **Don** » Sun Aug 21, 2011 8:37 pm

hgm wrote:I thought that profiling was done by requesting periodic timer interrupts, and let the interrupt routine examine the return address, to decide in which subroutine it falls (with the aid of mapping info made for the purpose by the compiler).

I don't really know the exact details, but this is from the gcc manpage:

Code: Select all

  -pg Generate extra code to write profile information suitable for the analysis program gprof.  You must use this option when compiling the source files you want data about, and you must also use it when linking.

Note that there is also a -g option for including debug information, but apparently even more is put in the code when using -pg

Even if that is not the case, try running the profile multiple times - you will get different numbers each time unless you let it run a really long time. And so I don't completely trust the numbers for precise measurements.

There is a another way to measure the execution time of functions too, you can use high resolution timers. But for short running routines there is a kind of uncertainty principle involved where the measurement gets in the way of the thing being measured.

I wonder how well that would work for something like this?

SEE with alpha beta

SEE with alpha beta

Re: SEE with alpha beta

Re: SEE with alpha beta

Re: SEE with alpha beta

Re: SEE with alpha beta

Re: SEE with alpha beta

Re: SEE with alpha beta

Re: SEE with alpha beta

Re: SEE with alpha beta

Re: SEE with alpha beta