Finding the boxing means looking at the heap allocations, correct?
If I run the 'struct Move' version for the depth 4 mirrored, approx 22MB of heap allocations occur, with 260kB relocated and Final heap 1.2MB. So a lot of creation / destruction
If I run the 'class Move' version, there are 11MB of Heap allocations, 174kB relocated and Final Heap Bytes 495kB. So half the allocations....
Looking at the allocations graph for class Move:
And for struct Move:
Thi suggests that the List<T> constructor itself is responsible?
C# Performance
Moderators: hgm, Rebel, chrisw
-
- Posts: 792
- Joined: Wed Jul 19, 2006 9:58 am
Re: C# Performance
I Changed the perft to use an array of structs
And had pretty much the same result:
But there is a speed up - 2300 knps from the 2000 knps using a List<struct>
Code: Select all
Move[] moveArray = new Move[Moves.MaxMoveListMoves];
int MoveCount = 0;
MoveGenerator.GenerateMoves(tree.board, moveArray, ref MoveCount);
But there is a speed up - 2300 knps from the 2000 knps using a List<struct>
-
- Posts: 73
- Joined: Fri Jan 13, 2012 12:39 am
- Location: London, England
Re: C# Performance
Best way is to use ildasm to look at the IL - look for the "box" operation.Richard Allbert wrote:Finding the boxing means looking at the heap allocations, correct?
There are lots of ways that a struct can be inadvertantly boxed. Perhaps the most common ways are using Console.WriteLine without a custom ToString(), and also where a struct is a member of a parent class. Here are some related links:
http://stackoverflow.com/questions/3032 ... and-boxing
http://stackoverflow.com/questions/1249 ... g-tostring
http://www.codeguru.com/csharp/csharp/c ... .php/c5883
http://stackoverflow.com/questions/1917 ... y-referenc
http://stackoverflow.com/questions/6163 ... out-boxing
That could definitely explain the speed-up when converting. It looks like the struct is being boxed and possibly copied as well. We should be able to remove this, at which point the related heap allocations will go to zero and you should see another speed-up of maybe 20-40%.Richard Allbert wrote:If I run the 'struct Move' version for the depth 4 mirrored, approx 22MB of heap allocations occur, with 260kB relocated and Final heap 1.2MB. So a lot of creation / destruction
If I run the 'class Move' version, there are 11MB of Heap allocations, 174kB relocated and Final Heap Bytes 495kB. So half the allocations....
In addition, what happens to the speed when you define Move as a class, and then store in an array rather than a List? Does that also give you a small speed-up?
There are two types of people in the world: Avoid them both.
-
- Posts: 4052
- Joined: Thu May 15, 2008 9:57 pm
- Location: Berlin, Germany
- Full name: Sven Schüle
Re: C# Performance
Hi Richard,
your program deals a lot with allocating heap memory, and I think that is a major slowdown. I suggest not to use dynamic allocation at all, at least for everything you do within your search (resp. perft). You might consider using something like this:
- a fixed-sized "move stack" holding all moves being generated during the whole search (e.g. 20 moves for the root position followed by 20 moves for the position at ply 1 followed by 22 at ply 2 ...), logically partitioned into stack segments (see below), and
- a fixed-sized "position stack" (you can also call it "tree" since it contains something like "nodes") holding all data that is needed once per ply, including (maybe) board state, flags, data needed to undo a move, killer moves, other search-related data, and finally data describing the move list of that node which might simply be a pair of "first" and "top" indices referring to the move stack segment belonging to that node (where "top" is one beyond the last used entry, to simplify implementation).
When doing it like this (which is quite common in C/C++ chess programs and should definitely be possible also in C#) your move generator will no longer call "new Move(...)", you will simply overwrite existing Move entries on the move stack each time you generate moves, and your search will not create new nodes but simply move up and down the "position stack" (or tree). Making a move would imply to initialize the new move list like "first = prevNode.top; top = prevNode.top;".
Dimensioning these two stacks is not difficult. A move stack size of 20000 entries and a position stack size of 1000 entries should be sufficient for all future applications, although much smaller sizes are more realistic.
That should also avoid all those "boxing" issues ...
Sven
your program deals a lot with allocating heap memory, and I think that is a major slowdown. I suggest not to use dynamic allocation at all, at least for everything you do within your search (resp. perft). You might consider using something like this:
- a fixed-sized "move stack" holding all moves being generated during the whole search (e.g. 20 moves for the root position followed by 20 moves for the position at ply 1 followed by 22 at ply 2 ...), logically partitioned into stack segments (see below), and
- a fixed-sized "position stack" (you can also call it "tree" since it contains something like "nodes") holding all data that is needed once per ply, including (maybe) board state, flags, data needed to undo a move, killer moves, other search-related data, and finally data describing the move list of that node which might simply be a pair of "first" and "top" indices referring to the move stack segment belonging to that node (where "top" is one beyond the last used entry, to simplify implementation).
When doing it like this (which is quite common in C/C++ chess programs and should definitely be possible also in C#) your move generator will no longer call "new Move(...)", you will simply overwrite existing Move entries on the move stack each time you generate moves, and your search will not create new nodes but simply move up and down the "position stack" (or tree). Making a move would imply to initialize the new move list like "first = prevNode.top; top = prevNode.top;".
Dimensioning these two stacks is not difficult. A move stack size of 20000 entries and a position stack size of 1000 entries should be sufficient for all future applications, although much smaller sizes are more realistic.
That should also avoid all those "boxing" issues ...
Sven
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: C# Performance
C# sucksRichard Allbert wrote: Thi suggests that the List<T> constructor itself is responsible?
in C allocating an array on the stack takes 1 CPU cycle...
-
- Posts: 73
- Joined: Fri Jan 13, 2012 12:39 am
- Location: London, England
Re: C# Performance
If you want to make C# look bad, or any framework for that matter, then micro-benchmark it doing almost nothing. You can get all the badness you like that way, but it's vacuous badness.lucasart wrote:C# sucks
in C allocating an array on the stack takes 1 CPU cycle...
It appears that Richard's move structs are being heap-allocated somewhere, and I think fixing that is going to give a much better benchmark.
There are two types of people in the world: Avoid them both.
-
- Posts: 792
- Joined: Wed Jul 19, 2006 9:58 am
Re: C# Performance
Ok,
A recap, because I ran the test with Class / Array, and the result was very slow. I then ran the other tests and they ewre also slow, so I've lost fiath in my Laptop, and tested using my normal Dev computer.
To reduce the test time, I ran the nomal position as Black and White twice, four times in all, to depth 4 approx, 17m moves.
A quick recap of the results shows that something was amiss on the laptop:
where with the List
and with the Array
I'm sorry for the confusion, but the Heap allocation still seems to be the problem.
I've also visited some of the links you posted today
I'll look at the IL DASM now, and report back.
Thanks
Richard
To Sven,
This thought also popped into my head - I think it could be a solution. I also think your piecelist suggestion will give an improvement, as the profile data clearly shows a good chunk of time is used in MovePieceOnList due to the loop.
I'd like first to understand the Boxing issue above, and then I will implement the changes you described. If I do it all at the same time, I loose track of which branch is which
danke dir
Richard
A recap, because I ran the test with Class / Array, and the result was very slow. I then ran the other tests and they ewre also slow, so I've lost fiath in my Laptop, and tested using my normal Dev computer.
To reduce the test time, I ran the nomal position as Black and White twice, four times in all, to depth 4 approx, 17m moves.
A quick recap of the results shows that something was amiss on the laptop:
Code: Select all
Array + class Move
3620 knps, 10.7MB Heap Allocations
Array + struct Move
2800 knps, 22.2MB Heap Allocations
List + class Move
3470 knps, 10.7MB Allocations
List + struct Move
2410 knps, 22.3MB Allocations
Code: Select all
Perft has:
List<Move> moveArray = new List<Move>(Moves.MaxMoveListMoves);
MoveGenerator.GenerateMoves(tree.board, moveArray);
MoveGen is:
public static void GenerateMoves(Board board, List<Move> moveList);
adding moves with:
list.Add(new Move(Squares.E8, Squares.C8, Pieces.NoPiece, Pieces.NoPiece, Moves.FlagCastled));
Code: Select all
Perft has:
Move[] moveArray = new Move[Moves.MaxMoveListMoves];
int MoveCount = 0;
MoveGenerator.GenerateMoves(tree.board, moveArray, ref MoveCount);
MoveGen is:
public static void GenerateMoves(Board board, Move[] moveList, ref int moveCount)
adding moves with:
moveList[moveCount++] = new Move(from, toSq, Pieces.NoPiece, board.GetPiece(toSq), Moves.FlagNone);
I've also visited some of the links you posted today
I'll look at the IL DASM now, and report back.
Thanks
Richard
To Sven,
This thought also popped into my head - I think it could be a solution. I also think your piecelist suggestion will give an improvement, as the profile data clearly shows a good chunk of time is used in MovePieceOnList due to the loop.
I'd like first to understand the Boxing issue above, and then I will implement the changes you described. If I do it all at the same time, I loose track of which branch is which
danke dir
Richard
-
- Posts: 792
- Joined: Wed Jul 19, 2006 9:58 am
Re: C# Performance
It doesn't help that my understanding of C# is (or was) zero when it comes to the performance issues....
-
- Posts: 73
- Joined: Fri Jan 13, 2012 12:39 am
- Location: London, England
Re: C# Performance
Well, I learned something new today. Like a lot of developers, I believed that C# values (i.e. instances of a value type like a struct) are stored on the stack unless they're boxed, in which case they're stored on the heap.Sven Schüle wrote:That should also avoid all those "boxing" issues ...
Sven
It appears my belief was wrong. Eric Lippert (C# compiler guru) says that what actually happens is that values are stored in storage locations, and storage locations are either short-lived or long-lived. If the analysis of a storage location shows that its lifetime is guaranteed to be short, that storage location goes on the stack or in registers. Otherwise it goes on the heap.
Of course, boxing will always create a copy of the value on the heap, regardless of where it was originally stored.
I'm not sure what effect this might have on the "long-lived" approach to storing moves and other information that you described. I'm going to run some tests with Amaia.
There are two types of people in the world: Avoid them both.
-
- Posts: 792
- Joined: Wed Jul 19, 2006 9:58 am