Question About CPP-C#, Performance, and Square Represenation

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
Cheney
Posts: 104
Joined: Thu Sep 27, 2012 12:24 am

Question About CPP-C#, Performance, and Square Represenation

Post by Cheney » Sat Sep 29, 2012 5:22 pm

Hi,

I decide to take a shot at a chess engine this past July. I used C# because it is what I am familiar with (I have not used C or CPP in a long time).

I am hooked :)

The engine works but after about 5 ply, it gets slow. All I use for searching is negamax with pruning - no Zobrist or iterative deepening.

My perft with a ply of 5 takes 90+ seconds. Compared to some other engines, I'm toast :)

My board's squares is an array of my piece object. Each piece object contains a list of moves and a few other variables. I generate moves for all pieces after a move is made. I feel this is where the latency appears and so I tried bitboards.

In the bitboards model, I did not get an increase in performance. I found that I still rely on my squares being an array of pieces and feel this might be why I did not get a performance increase. Working with OO makes it neat but I really feel all the new initializations of objects along with garbage collection might be causing the latency.

I read some posts here and and feel I found the answers I needed to read but I now need to understand :)

Here are my thoughts and maybe a question or two. I hope someone can review and provide me a nudge in the right direction.

1. C# should not be that much slower than CPP if I design the engines the same. If I am willing to relearn CPP and am patient, go for it.

2. I need to change my squares array to contain bytes representing a piece. With bitboards, I should only need to reference this array as a lookup to determine a capture occupancy.

3. I think I should generate moves into an array on the board and not a list per piece. The moves should just be for the moving side. I use the Mayothi move generation style but seem to be getting a grasp on the magics. Cutting over should help.

4. I have used DeBruijn to get square indexes from the bitboards. I need to use this only when I need to. This really should only be when generating moves as noted in #3.

I feel that following these I can move away from an object based design into a more static design. I'll get away from all the "new" objects, lists, and garbage collection and I believe I should see a drastic increase in performance.

Thank you in advance for any input and advice :)

-Cheney

kbhearn
Posts: 411
Joined: Thu Dec 30, 2010 3:48 am

Re: Question About CPP-C#, Performance, and Square Represena

Post by kbhearn » Sat Sep 29, 2012 6:59 pm

You don't need to obsess too much over point 3. While magics are the fastest option for bitboards, they can also be a pain to debug, and the speedup going from another reasonable move generator to magics is not going to be crucial, use a method you understand well and feel will be easy to implement (and if that's magics, great :)).

It's hard to guess what's your major stumbling block as to your slow perft, could be a really slow implementation for bitscan, could be your oo board format combined with your make move (if for instance you're making a new board that needs to make 64 piece objects it could be a bit of work), could be that you just have a bug somewhere that needs isolating. Try to find out where exactly your program is spending its time :)

User avatar
emadsen
Posts: 277
Joined: Wed Apr 25, 2012 11:51 pm
Location: Oak Park, IL, USA
Full name: Erik Madsen
Contact:

Re: Question About CPP-C#, Performance, and Square Represena

Post by emadsen » Sat Sep 29, 2012 9:06 pm

You can check out the source code to my engine for comparison. Link in my signature. Roughly equal in strength to TSCP. I have unpublished changes that add another 100 elo. Mostly due to an incremental move generator (hash move, captures of queens, then captures of rooks, etc... then non captures) which saves time if a capture causes a beta cutoff. I implemented the incremental move generator using C# enumerators and the yield statement. Very clean. Will post code in a few days.
My C# chess engine: https://www.madchess.net

Cheney
Posts: 104
Joined: Thu Sep 27, 2012 12:24 am

Re: Question About CPP-C#, Performance, and Square Represena

Post by Cheney » Sat Sep 29, 2012 10:16 pm

Thanks for the replies :)

I have VS 2010 Premium and am learning how to use the performance profiler (I think that is what it is called). It is not much help for me now until I understand how to use it.

Some tests I have performed just using stopwatch were to isolate sections of code like move generation and run it in a loop of 1M times and time it. The object based squares compared to bitboards all take the same time an just a few seconds if that.

I do copy the board object and make a move on it to test for legality. I know that recreates all square objects and can be expensive. It alone at 10M iterations in a loop is a few seconds so it did not seem like much.

I made a make/un-make move and ended up with more headaches :) - in the end, it seemed like just as much code as the deep board copy.

One other common item is how I track moves. I actually use the model that ChessBin published. I do not like it. It seem like it is in the way as well.

I found a post in this forum where a move is tracked in an integer. Since I am getting the understanding of bitboards with binary masking I figured I should contine to move in that direction with moves and possibly other items like the board/game state (checks, enp, etc).

Again, thank you for your comments :)

Sven
Posts: 3969
Joined: Thu May 15, 2008 7:57 pm
Location: Berlin, Germany
Full name: Sven Schüle
Contact:

Re: Question About CPP-C#, Performance, and Square Represena

Post by Sven » Sat Sep 29, 2012 11:58 pm

Cheney wrote:Thanks for the replies :)

I have VS 2010 Premium and am learning how to use the performance profiler (I think that is what it is called). It is not much help for me now until I understand how to use it.

Some tests I have performed just using stopwatch were to isolate sections of code like move generation and run it in a loop of 1M times and time it. The object based squares compared to bitboards all take the same time an just a few seconds if that.

I do copy the board object and make a move on it to test for legality. I know that recreates all square objects and can be expensive. It alone at 10M iterations in a loop is a few seconds so it did not seem like much.

I made a make/un-make move and ended up with more headaches :) - in the end, it seemed like just as much code as the deep board copy.

One other common item is how I track moves. I actually use the model that ChessBin published. I do not like it. It seem like it is in the way as well.

I found a post in this forum where a move is tracked in an integer. Since I am getting the understanding of bitboards with binary masking I figured I should contine to move in that direction with moves and possibly other items like the board/game state (checks, enp, etc).

Again, thank you for your comments :)
Hi Cheney,

welcome at CCC :-)

It is o.k. to use C# if that is the language you prefer. I think it is almost always better to use the language you are familiar with, instead of the language that many others are using. I have never written a C# chess program but there were a couple of threads about it recently, I even think it was in this summer. As far as I remember some people have successfully written a C# chess engine that plays reasonable chess in reasonable time, so it must be possible.


After reading your brief description of data structures I get the impression that maintaining a list of moves per piece is really quite expensive. What I understood is also that you update those lists immediately when making a move. This implies the following:

- For perft, you do useless move generation at each leaf node even though it is clear that you will never go beyond those leaves. This can already be a major part of the explanation for your very slow perft.

- For normal search, most of the move generation work is also useless at horizon nodes where you will usually start with a quiescence search that requires much smaller move lists to be generated (captures only).

Therefore most people decouple making moves and move generation to avoid doing something useless too often, especially at the leaves. Keep in mind that the effort taken at leaf nodes (in perft) resp. horizon nodes (in normal search) usually has a significant influence on the overall computation time.


Note, however, that having a fast "perft" function should not be your major goal. 90 seconds for 5 plies is slow, of course, but once you have reached an acceptable speed that allows testing in reasonable cycles I suggest that you use "perft" mainly for the purpose it was designed for: to validate the correctness of your move generator.


Also a move should be a very small data unit, it is like an "event". Frequently allocating and deallocating move objects will really slow down your program, and it is not necessary to do so. I suggest to use a segmented move stack where each segment belongs to one ply and has variable length according to the number of moves you generate. The move stack is allocated once, you maintain a "start" index per ply and a "top" index for the whole stack.

You can also use a fixed-size move list per ply, a size of 256 is always sufficient based on current chess knowledge. This is easier to implement but uses more memory.


The "board copy" approach is not really bad, different people reported different results of comparing "copy&make" vs. "make&unmake" so it was not obvious whether one can say that one or the other is always better. But of course this depends on the amount of data to be copied. I can imagine that in your case with move lists per piece (if I understood correctly) the "copy&make" approach is inferior. However, I would not change it into "make&unmake" but, as I wrote, rethink the move list approach instead.


One last hint: when hunting the reason for performance problems, don't change too much at once, change step by step and repeat your measurements. This helps to track which changes have an effect and which haven't.

Sven

rreagan
Posts: 102
Joined: Sun Sep 09, 2007 4:32 am

Re: Question About CPP-C#, Performance, and Square Represena

Post by rreagan » Sun Sep 30, 2012 3:12 am

When you are trying to optimize your code in any language, you have to use a profiler. Without a profiler all you can do is speculate, and with modern computers it is easy to speculate wrong.

Having said that, I think if you use objects extensively in C#, you may have a hard time improving the speed significantly.

With C++ you can use objects without any speed penalty, as long as you are careful. The compiler is able to inline constructors like any other trivial function. So you can use objects for pieces and moves without any speed penalty.

In a language like C# or Java that may not be the case.

If you are enjoying using C# and like it, stick with it. It's better to use C# and keep making progress than to switch to C++ and give up. It's a lot of work to start over from scratch in a new language. C++ can be an uneccesarily complicated language. I use it for my engine, but it's taken many years of tinkering to come up with something I'm happy with.

It's also important to remember that speed is not everything. If you took the #1 top engine in the world and rewrote it in C# or Java, it would still be one of the top engines.

bpfliegel
Posts: 71
Joined: Fri Mar 16, 2012 9:16 am

Re: Question About CPP-C#, Performance, and Square Represena

Post by bpfliegel » Sun Sep 30, 2012 5:58 am

"It's also important to remember that speed is not everything. If you took the #1 top engine in the world and rewrote it in C# or Java, it would still be one of the top engines."

See the proof here:
https://github.com/bpfliegel/Portfish

For me what seems to be the most common problem when implementing a chess engine is C# is memory management. Every second we could create 10+ million objects, which puts a horrible pressure on the GC, so always reuse your objects, never recreate them (see the ObjectBroker.cs for a possible solution).

C vs C# speed ratio was at the end 1:2,7 in case of Portfish.

Good luck!
Balint

rreagan
Posts: 102
Joined: Sun Sep 09, 2007 4:32 am

Re: Question About CPP-C#, Performance, and Square Represena

Post by rreagan » Sun Sep 30, 2012 4:17 pm

bpfliegel wrote:"It's also important to remember that speed is not everything. If you took the #1 top engine in the world and rewrote it in C# or Java, it would still be one of the top engines."

See the proof here:
https://github.com/bpfliegel/Portfish

For me what seems to be the most common problem when implementing a chess engine is C# is memory management. Every second we could create 10+ million objects, which puts a horrible pressure on the GC, so always reuse your objects, never recreate them (see the ObjectBroker.cs for a possible solution).

C vs C# speed ratio was at the end 1:2,7 in case of Portfish.

Good luck!
Balint
Hi Balint, that is very interesting! Do you know what the ELO difference is between Stockfish and Portfish? If Stockfish is about 2.7 times faster, I would guess the ELO difference is around 200 or so?

bpfliegel
Posts: 71
Joined: Fri Mar 16, 2012 9:16 am

Re: Question About CPP-C#, Performance, and Square Represena

Post by bpfliegel » Sun Sep 30, 2012 5:43 pm

rreagan wrote:
bpfliegel wrote:"It's also important to remember that speed is not everything. If you took the #1 top engine in the world and rewrote it in C# or Java, it would still be one of the top engines."

See the proof here:
https://github.com/bpfliegel/Portfish

For me what seems to be the most common problem when implementing a chess engine is C# is memory management. Every second we could create 10+ million objects, which puts a horrible pressure on the GC, so always reuse your objects, never recreate them (see the ObjectBroker.cs for a possible solution).

C vs C# speed ratio was at the end 1:2,7 in case of Portfish.

Good luck!
Balint
Hi Balint, that is very interesting! Do you know what the ELO difference is between Stockfish and Portfish? If Stockfish is about 2.7 times faster, I would guess the ELO difference is around 200 or so?
I think it is less (around 80 ELO), as it is stronger as Junior and Spike - well, according to my tests. So far, no real professional testing was carried out - I hope I start flaming and someone will test it for good :)

Cheers, Balint

bpfliegel
Posts: 71
Joined: Fri Mar 16, 2012 9:16 am

Re: Question About CPP-C#, Performance, and Square Represena

Post by bpfliegel » Sun Sep 30, 2012 5:44 pm

emadsen wrote:You can check out the source code to my engine for comparison. Link in my signature. Roughly equal in strength to TSCP. I have unpublished changes that add another 100 elo. Mostly due to an incremental move generator (hash move, captures of queens, then captures of rooks, etc... then non captures) which saves time if a capture causes a beta cutoff. I implemented the incremental move generator using C# enumerators and the yield statement. Very clean. Will post code in a few days.
Erik, doesn't yield kill your performance? I also experimented with it (sounds logical), but had disappointing results...

Post Reply