C# Performance

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Richard Allbert
Posts: 792
Joined: Wed Jul 19, 2006 9:58 am

Re: C# Performance

Post by Richard Allbert »

Finding the boxing means looking at the heap allocations, correct?

If I run the 'struct Move' version for the depth 4 mirrored, approx 22MB of heap allocations occur, with 260kB relocated and Final heap 1.2MB. So a lot of creation / destruction

If I run the 'class Move' version, there are 11MB of Heap allocations, 174kB relocated and Final Heap Bytes 495kB. So half the allocations....

Looking at the allocations graph for class Move:

Image

And for struct Move:

Image

Thi suggests that the List<T> constructor itself is responsible?
Richard Allbert
Posts: 792
Joined: Wed Jul 19, 2006 9:58 am

Re: C# Performance

Post by Richard Allbert »

I Changed the perft to use an array of structs

Code: Select all

Move&#91;&#93; moveArray = new Move&#91;Moves.MaxMoveListMoves&#93;;
int MoveCount = 0;
MoveGenerator.GenerateMoves&#40;tree.board, moveArray, ref MoveCount&#41;;
And had pretty much the same result:

Image

But there is a speed up - 2300 knps from the 2000 knps using a List<struct>
RoadWarrior
Posts: 73
Joined: Fri Jan 13, 2012 12:39 am
Location: London, England

Re: C# Performance

Post by RoadWarrior »

Richard Allbert wrote:Finding the boxing means looking at the heap allocations, correct?
Best way is to use ildasm to look at the IL - look for the "box" operation.

There are lots of ways that a struct can be inadvertantly boxed. Perhaps the most common ways are using Console.WriteLine without a custom ToString(), and also where a struct is a member of a parent class. Here are some related links:

http://stackoverflow.com/questions/3032 ... and-boxing
http://stackoverflow.com/questions/1249 ... g-tostring
http://www.codeguru.com/csharp/csharp/c ... .php/c5883
http://stackoverflow.com/questions/1917 ... y-referenc
http://stackoverflow.com/questions/6163 ... out-boxing
Richard Allbert wrote:If I run the 'struct Move' version for the depth 4 mirrored, approx 22MB of heap allocations occur, with 260kB relocated and Final heap 1.2MB. So a lot of creation / destruction

If I run the 'class Move' version, there are 11MB of Heap allocations, 174kB relocated and Final Heap Bytes 495kB. So half the allocations....
That could definitely explain the speed-up when converting. It looks like the struct is being boxed and possibly copied as well. We should be able to remove this, at which point the related heap allocations will go to zero and you should see another speed-up of maybe 20-40%.

In addition, what happens to the speed when you define Move as a class, and then store in an array rather than a List? Does that also give you a small speed-up?
There are two types of people in the world: Avoid them both.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: C# Performance

Post by Sven »

Hi Richard,

your program deals a lot with allocating heap memory, and I think that is a major slowdown. I suggest not to use dynamic allocation at all, at least for everything you do within your search (resp. perft). You might consider using something like this:

- a fixed-sized "move stack" holding all moves being generated during the whole search (e.g. 20 moves for the root position followed by 20 moves for the position at ply 1 followed by 22 at ply 2 ...), logically partitioned into stack segments (see below), and

- a fixed-sized "position stack" (you can also call it "tree" since it contains something like "nodes") holding all data that is needed once per ply, including (maybe) board state, flags, data needed to undo a move, killer moves, other search-related data, and finally data describing the move list of that node which might simply be a pair of "first" and "top" indices referring to the move stack segment belonging to that node (where "top" is one beyond the last used entry, to simplify implementation).

When doing it like this (which is quite common in C/C++ chess programs and should definitely be possible also in C#) your move generator will no longer call "new Move(...)", you will simply overwrite existing Move entries on the move stack each time you generate moves, and your search will not create new nodes but simply move up and down the "position stack" (or tree). Making a move would imply to initialize the new move list like "first = prevNode.top; top = prevNode.top;".

Dimensioning these two stacks is not difficult. A move stack size of 20000 entries and a position stack size of 1000 entries should be sufficient for all future applications, although much smaller sizes are more realistic.

That should also avoid all those "boxing" issues ...

Sven
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: C# Performance

Post by lucasart »

Richard Allbert wrote: Thi suggests that the List<T> constructor itself is responsible?
C# sucks ;-)
in C allocating an array on the stack takes 1 CPU cycle...
RoadWarrior
Posts: 73
Joined: Fri Jan 13, 2012 12:39 am
Location: London, England

Re: C# Performance

Post by RoadWarrior »

lucasart wrote:C# sucks ;-)
in C allocating an array on the stack takes 1 CPU cycle...
If you want to make C# look bad, or any framework for that matter, then micro-benchmark it doing almost nothing. You can get all the badness you like that way, but it's vacuous badness. :D

It appears that Richard's move structs are being heap-allocated somewhere, and I think fixing that is going to give a much better benchmark.
There are two types of people in the world: Avoid them both.
Richard Allbert
Posts: 792
Joined: Wed Jul 19, 2006 9:58 am

Re: C# Performance

Post by Richard Allbert »

Ok,

A recap, because I ran the test with Class / Array, and the result was very slow. I then ran the other tests and they ewre also slow, so I've lost fiath in my Laptop, and tested using my normal Dev computer.

To reduce the test time, I ran the nomal position as Black and White twice, four times in all, to depth 4 approx, 17m moves.

A quick recap of the results shows that something was amiss on the laptop:

Code: Select all

Array + class Move
3620 knps, 10.7MB Heap Allocations

Array + struct Move
2800 knps, 22.2MB Heap Allocations

List + class Move
3470 knps, 10.7MB Allocations

List + struct Move
2410 knps, 22.3MB Allocations
where with the List

Code: Select all

Perft has&#58;
List<Move> moveArray = new List<Move>&#40;Moves.MaxMoveListMoves&#41;;
            MoveGenerator.GenerateMoves&#40;tree.board, moveArray&#41;;

MoveGen is&#58;
public static void GenerateMoves&#40;Board board, List<Move> moveList&#41;;

adding moves with&#58;
list.Add&#40;new Move&#40;Squares.E8, Squares.C8, Pieces.NoPiece, Pieces.NoPiece, Moves.FlagCastled&#41;);
and with the Array

Code: Select all

Perft has&#58;
Move&#91;&#93; moveArray = new Move&#91;Moves.MaxMoveListMoves&#93;;
            int MoveCount = 0;
            MoveGenerator.GenerateMoves&#40;tree.board, moveArray, ref MoveCount&#41;;

MoveGen is&#58;
public static void GenerateMoves&#40;Board board, Move&#91;&#93; moveList, ref int moveCount&#41;

adding moves with&#58;
 moveList&#91;moveCount++&#93; = new Move&#40;from, toSq, Pieces.NoPiece, board.GetPiece&#40;toSq&#41;, Moves.FlagNone&#41;;
I'm sorry for the confusion, but the Heap allocation still seems to be the problem.

I've also visited some of the links you posted today ;)

I'll look at the IL DASM now, and report back.

Thanks

Richard

To Sven,

This thought also popped into my head - I think it could be a solution. I also think your piecelist suggestion will give an improvement, as the profile data clearly shows a good chunk of time is used in MovePieceOnList due to the loop.

I'd like first to understand the Boxing issue above, and then I will implement the changes you described. If I do it all at the same time, I loose track of which branch is which :)

danke dir

Richard
Richard Allbert
Posts: 792
Joined: Wed Jul 19, 2006 9:58 am

Re: C# Performance

Post by Richard Allbert »

:wink:

It doesn't help that my understanding of C# is (or was) zero when it comes to the performance issues....

:)
RoadWarrior
Posts: 73
Joined: Fri Jan 13, 2012 12:39 am
Location: London, England

Re: C# Performance

Post by RoadWarrior »

Sven Schüle wrote:That should also avoid all those "boxing" issues ...
Sven
Well, I learned something new today. Like a lot of developers, I believed that C# values (i.e. instances of a value type like a struct) are stored on the stack unless they're boxed, in which case they're stored on the heap.

It appears my belief was wrong. Eric Lippert (C# compiler guru) says that what actually happens is that values are stored in storage locations, and storage locations are either short-lived or long-lived. If the analysis of a storage location shows that its lifetime is guaranteed to be short, that storage location goes on the stack or in registers. Otherwise it goes on the heap.

Of course, boxing will always create a copy of the value on the heap, regardless of where it was originally stored.

I'm not sure what effect this might have on the "long-lived" approach to storing moves and other information that you described. I'm going to run some tests with Amaia.
There are two types of people in the world: Avoid them both.
Richard Allbert
Posts: 792
Joined: Wed Jul 19, 2006 9:58 am

Re: C# Performance

Post by Richard Allbert »

I've been through all functions with the IL DASM and box is not used anywhere - I looked in Attacks, Perft, MoveGen, MoveMake

Rather than post random things here, I've put a .zip online with the.dll files inside

Libraries
Project

Maybe that will be of some use!