C# Performance

RoadWarrior · Post by **RoadWarrior** » Fri Feb 03, 2012 2:34 am

Richard Allbert wrote:r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 1

has 193690690 leafnodes to depth 5, which took the program 78s, or 2488kns.

I used the nprof profiler, here are the results:

Here is my call tree for perft 5 in that position, looking just at the method-level performance - notice the perft recursion:

Here's how the individual methods look:

IsAttacked is an obvious candidate for optimisation - here's its call graph:

After I've finished the evaluation function, I'm going to profile again. If IsAttacked is still expensive, I'll dive down to the line-level timings to see if I can find anything interesting.

micron · Post by **micron** » Fri Feb 03, 2012 4:26 am

RoadWarrior wrote:
Richard Allbert wrote:r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 1

has 193690690 leafnodes to depth 5, which took the program 78s, or 2488kns.

I used the nprof profiler, here are the results:
IsAttacked is an obvious candidate for optimisation
[snip]
After I've finished the evaluation function, I'm going to profile again. If IsAttacked is still expensive, I'll dive down to the line-level timings to see if I can find anything interesting.

The percentage time (28% including children) in IsAttacked seems distinctly but not sensationally out of line. Here's a profile for my engine on the same perft task. IAmInCheck() at 10.5% takes the role of your IsAttacked().

Code: Select all

3917.0ms   25.3%	3917.0	 	MakeMoveSub
3191.0ms   20.6%	3191.0	 	MakeMove
2602.0ms   16.8%	2602.0	 	memmove$VARIANT$sse42
1647.0ms   10.6%	1647.0	 	NextMove
1629.0ms   10.5%	1629.0	 	IAmInCheck
 637.0ms    4.1%	 637.0	 	CaptureMoves
 564.0ms    3.6%	 564.0	 	BinarySEE_NoXrays
 545.0ms    3.5%	 545.0	 	Perft
 519.0ms    3.3%	 519.0	 	NonCaptureMoves
 149.0ms    0.9%	 149.0	 	DYLD-STUB$$memcpy
  42.0ms    0.2%	  42.0	 	OppHasNoAttacksTo
   3.0ms    0.0%	   3.0	 	GetOutOfCheckNonCaptures
   2.0ms    0.0%	   2.0	 	GetOutOfCheckCaptures

You may want to consider a suggestion made earlier in the thread: when deciding whether a castling move can be generated, be sure to get the order of tests right:

Code: Select all

if ( IntermediateSqsNotObstructed&#40;) /*fast*/ && IntermediateSqsNotAttacked&#40;) /*slow*/ ) 
       &#123; GenerateCastlingMove&#40;) &#125;

RoadWarrior · Post by **RoadWarrior** » Fri Feb 03, 2012 10:28 am

micron wrote:You may want to consider a suggestion made earlier in the thread: when deciding whether a castling move can be generated, be sure to get the order of tests right:
Code: Select all
if ( IntermediateSqsNotObstructed&#40;) /*fast*/ && IntermediateSqsNotAttacked&#40;) /*slow*/ ) 
       &#123; GenerateCastlingMove&#40;) &#125;

Here's a fragment from my move generation showing that I already do this:

Code: Select all

// If Black can still castle kingside...
if (&#40;this.PropertyStore.BlackCastlingStatus & Constants.EnumCastlingStatus.CAN_CASTLE_OO&#41; != 0&#41;
&#123;
  // If the Black kingside castling squares &#40;F8/G8&#41; aren't occupied...
  if (&#40;Constants.MASK_FG&#91;Constants.MOVE_BLACK&#93; & this.OccupiedSquares&#41; == 0&#41;
  &#123;
    // If the Black king and kingside castling squares &#40;E8/F8&#41; aren't under attack...
    if (!this.IsAttacked&#40;Constants.MASK_EF&#91;Constants.MOVE_BLACK&#93;, true&#41;)
    &#123;
      // Pseudo-legal&#58; we'll find out later if King has castled into check.
      moveList.Add&#40;Constants.CASTLING_BLACK_OO&#41;;
    &#125;
  &#125;
&#125;

But I agree with you, it does look like the IsAttacked method needs some optimisation. At the moment, it meets my performance budget, but that's likely to change in the near future.

Sven · Post by **Sven** » Fri Feb 03, 2012 10:45 am

RoadWarrior wrote:
micron wrote:You may want to consider a suggestion made earlier in the thread: when deciding whether a castling move can be generated, be sure to get the order of tests right:
Code: Select all
if ( IntermediateSqsNotObstructed&#40;) /*fast*/ && IntermediateSqsNotAttacked&#40;) /*slow*/ ) 
       &#123; GenerateCastlingMove&#40;) &#125;
Here's a fragment from my move generation showing that I already do this:
Code: Select all
// If Black can still castle kingside...
if (&#40;this.PropertyStore.BlackCastlingStatus & Constants.EnumCastlingStatus.CAN_CASTLE_OO&#41; != 0&#41;
&#123;
  // If the Black kingside castling squares &#40;F8/G8&#41; aren't occupied...
  if (&#40;Constants.MASK_FG&#91;Constants.MOVE_BLACK&#93; & this.OccupiedSquares&#41; == 0&#41;
  &#123;
    // If the Black king and kingside castling squares &#40;E8/F8&#41; aren't under attack...
    if (!this.IsAttacked&#40;Constants.MASK_EF&#91;Constants.MOVE_BLACK&#93;, true&#41;)
    &#123;
      // Pseudo-legal&#58; we'll find out later if King has castled into check.
      moveList.Add&#40;Constants.CASTLING_BLACK_OO&#41;;
    &#125;
  &#125;
&#125;
But I agree with you, it does look like the IsAttacked method needs some optimisation. At the moment, it meets my performance budget, but that's likely to change in the near future.

Testing attacks to E8 is the same as testing whether the king is in check before castling. If you already do this within your perft() or search() prior to move generation then keeping an inCheck flag and accessing it in the move generator will save double work, since no castling generation needs to be tried at all if the king is in check.

If you don't have this inCheck information available in the move generator then there is still some potential for improvement since you are doing the test for attacks to E8 twice, for black short castle and for black long castle (if both are still available and intermediate squares are empty), but need it only once. You might still want to perform the faster parts first (castling right available + no obstruction) so doing it this way will be somehow tricky. But I think you will need the "inCheck" prior to move generation anyway at some point, which would make these thoughts obsolete.

Testing attacks to F8 might be postponed, too, until actually making the move but depending on your implementation this might require more branching since it is a test in addition to "king was left in check" that must be done only for castling moves. For this reason I prefer to do it the same way as you do.

Sven

Sven · Post by **Sven** » Fri Feb 03, 2012 11:04 am

RoadWarrior wrote:it does look like the IsAttacked method needs some optimisation. At the moment, it meets my performance budget, but that's likely to change in the near future.

Looking only at the percentages, especially in a perft-only context, is not sufficient to decide whether the overall performance is on a "normal" level, you would have to look also on the absolute times. For instance, perft(5) from the initial position taking one minute or so would be quite slow, few seconds (without bulk counting) would be ok (e.g. Stockfish 2.2.2 needs 63 ms but has bulk counting).

But what is more important for me: don't optimize too early! In a mature engine, the performance of move generation, making/unmaking moves and attack/in-check testing will be much less important than the quality of search algorithms and features, move ordering, and evaluation. Move generation will usually take about 10% or even less in a complete engine, so optimizing it will not help much for the whole thing, provided its overall performance is not too bad. Doubling the move generation speed will have much less effect on playing strength than slightly reducing the branching factor.

Sven

RoadWarrior · Post by **RoadWarrior** » Fri Feb 03, 2012 12:40 pm

Sven Schüle wrote:Looking only at the percentages, especially in a perft-only context, is not sufficient to decide whether the overall performance is on a "normal" level, you would have to look also on the absolute times. For instance, perft(5) from the initial position taking one minute or so would be quite slow, few seconds (without bulk counting) would be ok (e.g. Stockfish 2.2.2 needs 63 ms but has bulk counting).

From the initial position Amaia takes 447 ms for perft 5, without bulk counting or a transposition table. So not really fast compared to Stockfish, but the C# JIT compiler looks like it's doing a good job with my unoptimised code.

I agree absolutely about only optimising the right things at the right time. That's why I'm focusing on the eval function (and then implementing the search) before doing any more optimisation.

But I think Richard might be in a somewhat different performance place, where he does need to analyse the performance of his C# code now.

Richard Allbert · Post by **Richard Allbert** » Sat Feb 04, 2012 8:10 pm

But I think Richard might be in a somewhat different performance place, where he does need to analyse the performance of his C# code now.

certainly am! I'm hoping to get some time on this tomorrow, I'll post the speed to d5 from the start position, and some more detail of the structure.[/quote]

RoadWarrior · Post by **RoadWarrior** » Sat Feb 04, 2012 8:57 pm

Sven Schüle wrote:Testing attacks to E8 is the same as testing whether the king is in check before castling. If you already do this within your perft() or search() prior to move generation then keeping an inCheck flag and accessing it in the move generator will save double work, since no castling generation needs to be tried at all if the king is in check.

I made this change in my move generator, and tested it across my perft suite (76 positions). The mean average slowdown was around 2%.

But I suspect I'll need this info for other reasons later, so I'm going to leave it in, but commented-out for the moment.

lucasart · Post by **lucasart** » Sat Feb 04, 2012 9:31 pm

Richard Allbert wrote:
But I think Richard might be in a somewhat different performance place, where he does need to analyse the performance of his C# code now.
certainly am! I'm hoping to get some time on this tomorrow, I'll post the speed to d5 from the start position, and some more detail of the structure.

[/quote]
I think your problem is not how to optimize the SquareIsAttacked function, but why do you need it in the first place ?

Here's what I do in my program DoubleCheck (written in C). It should give you some ideas:
* calculate a bitboard dcheckers, representing the potential discovery checkers (pieces that can give discovered check if they move out of the "ray")
* calculate a bitboard pinned, representing the pinned pieces (pieces that can give self check if they move out of the "ray")
rays are of course bitboards calculated once at initialization, so testing for self check or discovered check is very fast.

Whenever I calculate something like this, I only do it after I play a move, and store the result in a board stack. When I undo a move, I just read the value from the stack. This divides by 2 the time spent on these calculations, of course.

I also do the following things in my code, that are useful for the search but not for perft
* dynamically calculate piece on square tables
* discovery checker bitboard
* a check flag which is a property of my move object (rather struct). I have the following: check=0 (no check) check=1 (direct check) check=2 (discovered check). This is very useful in the search: checks are typically extended, and when a check is a discovered check, I never trust the SEE which has some implications in the search and especially the qsearch.

But even with those things (useless for pure perft), I still reach over 30 million nodes per seconds or so... on a laptop...
And there's no cheat (no perft hash table, or perft approximations).

You'll certainly be able to improve the performance of your code by a fair amount, but at some point you'll reach a stage where:
1/ you realize that the same code in C# is much slower than C
2/ you have to be careful never to use any of the nice object oriented crap of C#, make sure you never trust anything that the compiler does behind your back (like exceptions, memory allocation, garbage collection)
3/ your code will become C code really, but with C# syntax. and you'll spend more time and effort coding in C# than in plain C, because of 2/

I don't think C# is a good choice for a chess engine. If you were writing a chess GUI for example, then C# would be much easier than plain C, and performance wouldn't be important, so C# would be a better choice than C.

All I can say is: rewrite it in plain C while it's not too late... And the same applies to Java, which IMO is total and utter crap

RoadWarrior · Post by **RoadWarrior** » Sat Feb 04, 2012 9:37 pm

RoadWarrior wrote:
Sven Schüle wrote:Looking only at the percentages, especially in a perft-only context, is not sufficient to decide whether the overall performance is on a "normal" level, you would have to look also on the absolute times. For instance, perft(5) from the initial position taking one minute or so would be quite slow, few seconds (without bulk counting) would be ok (e.g. Stockfish 2.2.2 needs 63 ms but has bulk counting).
From the initial position Amaia takes 447 ms for perft 5, without bulk counting or a transposition table. So not really fast compared to Stockfish, but the C# JIT compiler looks like it's doing a good job with my unoptimised code.

Okay, I was missing a really simple optimisation. Now Amaia takes a mean average of 361 milliseconds (10 attempts) to do perft 5 from the initial position. The fastest time was 345 ms, the slowest was 390 ms.

Of course, perft is primarily a debugging aid rather than a relative performance test, given that there are several different ways of implementing it (bulk counting, etc).

C# Performance

Re: C# Performance

Re: C# Performance

Re: C# Performance

Re: C# Performance

Re: C# Performance

Re: C# Performance

Re: C# Performance

Re: C# Performance

Re: C# Performance

Re: C# Performance