Stockfish port to C# Complete

diep · Post by **diep** » Tue Mar 20, 2012 2:12 am

RoadWarrior wrote:
diep wrote:p.s. some years ago when we compared some algorithmic code in C versus C#, then C# was factor 4 slower, so anything you get it faster than that here, is a big achievement, besides the huge work of porting SF to C#.
My fledgling and relatively unoptimised C# engine Amaia is searching 3.8M nps on a single core. That's using 64-bit Windows on an i7 2600K over-clocked to 4.6 GHz. It may not be a speed demon, but that number meets my current performance budget quite comfortably. At some point I'll run SF on this hardware/software so that I can do a comparison.

The evaluation function includes material, mobility, PSTs, pawn structures, and king safety. There is no SMP or TT yet, and also no QS or extensions - all work in progress.

I haven't looked at the assembly code produced by the JIT compiler yet. When I do, I suspect that I'll find a few decent performance improvements.

Mark, i was getting that 3 mln nps already at an oldie K7, a total inferior processor, and that was INCLUDING a qsearch, and from Diep i had cut'n pasted a few routines for eval, not too much of course yet sure it worked; qsearch of course slows you down BIGTIME.

Also inside the qsearch, which determines at beancounters the speed so much, it was using diep's way of knowing when to capture or not, which obviously is a routine not optimized for speed that much.

Realistically an optimized beancounter doing what you wrote at 4.6Ghz is gonna get nearly 10 mln nps or so.

Now one thing no one can speedup and that's branches of course.

Yet 500 cycles per node, realize Schach 3.0 already was 600 cycles per node at an oldie pentium...

Sure, that was in assembler...

Vincent

bpfliegel · Post by **bpfliegel** » Wed Mar 21, 2012 11:48 pm

Maybe a good initial measurement is to check what the standard benchmark test (hash 128, 1 cpu, search depth 12) returns regarding nps. I'm also working on a 100% safe code and portable (SL, WP7) C# version of Stockfish for 3 months already (second half of the time optimizing it) - the portability will have some differences regarding signaling constructs and some minor areas. I'm getting on my i5 laptop 1080 kps for the original non-SSE gcc version (by running stockfish-222-64-ja.exe bench) and 450 kps for the C# version. In 4CPU mode the ratio is around 1:3 regarding speed when playing. Nothing major is changed principally, but for some of the problems I tried to find answers that are more natural and powerful in .Net - I think started off with just around 160 kps when finishing the raw porting.
The task is really tough, especially the optimization for the 'analyze, optimize, throw it out of the window and start again' part. But some ideas actually work!

Good luck with the port and looking forward to hear your results!

Cheers, Balint

whittenizer · Post by **whittenizer** » Thu Mar 22, 2012 5:19 am

Hi there.

Thanks for the reply. The port is complete but dealing with performance issues. It very well may be that its as good as its going to get. I'm halting any further work for now so I can rethink my goals and objectives. If anything this was a learning experience but it certainly fell a bit short of my expectations.

By the way, how did you handle the pointer arithmetic and templates from SF when porting that stuff over to C#? Also, C++ inline methods are not always easy to reproduce in C#. Sure, you can put alot of code inline but some of the heavier methods, you will simply have to take the extra hit of calling those methods multiple times which I hate. Cant put everything inline ya know.

Anyways, Ill keep you posted if I ever start C# back up again. I'm a bit dissapointed but moving forward with other ideas.

Thanks,

bpfliegel · Post by **bpfliegel** » Thu Mar 22, 2012 7:51 am

Hi David,

I think there are no real news to tell:
Pointers - really depends what the nature of that pointer actually is. One could always declare an integer index next to the array and pass it all around, in case there was a pointer passed to a function which was pointing to a middle of an array.
Templates - every template instance was moved into one function, and if/switch applied as C# generics does not support value dependent templating.
Inlining - now that's a tricky question, as there is no explicit inlining possibility - well there is in 4.5 desktop, but this is still beta: [url]http://msdn.microsoft.com/en-us/library ... 10%29.aspx[/url]. But in .Net the concept is just the opposite: the smaller methods are inlined by the JIT and not the compiler - now that depends on which JIT we actually talk about. Silverlight's JIT is somewhat simpler, I have no knowledge on the what MS guys did for WP7. This area is something one should not worry about much, as the JIT does a good job generally. I also did not like much how it looked first - but maintainability is also an aspect.

Cheers, Balint

whittenizer · Post by **whittenizer** » Thu Mar 22, 2012 4:19 pm

Hey,

Thanks for the reply. Yeah, the way we did the pointers was to have this generic class for the different types, and then used implicit operators to get the different functionality. For example, if variable "st" is a pointer declared as Pointer<Stack> then we can do things like:

ss++ or ~ss, what have ya. A really slick way to mimick pointers but this does have some over head. The implicit operator were used so we could loop on the types but regular operators were used for the actual pointer functionality.

Anyways, thanks for sharing.

David

bpfliegel · Post by **bpfliegel** » Fri Mar 23, 2012 6:47 am

For sure that's a slick idea

Balint

RoadWarrior · Post by **RoadWarrior** » Sat Mar 24, 2012 12:43 pm

diep wrote:Realistically an optimized beancounter doing what you wrote at 4.6Ghz is gonna get nearly 10 mln nps or so.

Apologies for the delay in my reply - too much work on at the moment.

To test your theory, I ran Houdini 2.0c on 3 positions with a 1 MB hash table, single core, and a fixed depth of 15. The use of a fixed depth setting prevents QS slowing the Houdini search, given that Amaia doesn't use QS yet. Then I used the same positions and settings for Amaia. I used Arena 3.0 as the GUI.

Houdini almost certainly has a considerably more sophisticated evaluation function, thus slowing down its search. In return, Amaia has no TT yet, which slows down its search. And also Amaia is relatively unoptimised - I've only spent a grand total of 80 hours working on it from scratch.

The result of this experiment is that Houdini searched between 3.8M and 5.6M nps. Amaia searched between 3.5M and 4.2M nps. Nowhere do I see your "factor of 4", so I don't know where that comes from.

gladius · Post by **gladius** » Sat Mar 24, 2012 2:15 pm

RoadWarrior wrote:To test your theory, I ran Houdini 2.0c on 3 positions with a 1 MB hash table, single core, and a fixed depth of 15. The use of a fixed depth setting prevents QS slowing the Houdini search, given that Amaia doesn't use QS yet. Then I used the same positions and settings for Amaia. I used Arena 3.0 as the GUI.

The use of fixed depth almost certainly does not stop Houdini from using it's QS. It's not open source, so I can't be 100% sure, but everything else runs the main search up to depth 15, and QS is counted as "negative" depth. Also, Houdini is doing a lot of work during the search itself to prune positions, which has a big effect on NPS.

RoadWarrior · Post by **RoadWarrior** » Sat Mar 24, 2012 4:28 pm

gladius wrote:The use of fixed depth almost certainly does not stop Houdini from using it's QS. It's not open source, so I can't be 100% sure, but everything else runs the main search up to depth 15, and QS is counted as "negative" depth.

If fixed depth doesn't prevent the use of Houdini's QS, then that should be taken into account when doing the comparison. On the other hand Amaia is slowed dramatically by not having a main or pawn TT, and only having minimal optimisation. In fact Amaia's optimisation was only based on perft performance, which may even have had a negative effect on the real search.

gladius wrote:Also, Houdini is doing a lot of work during the search itself to prune positions, which has a big effect on NPS.

Amaia also does null-move and other pruning, although obviously I have no way to compare the 2 programs directly. It's likely that Houdini is considerably more sophisticated in this respect.

Clearly this is a very limited experiment. And NPS probably has only a limited correlation with engine strength. My current goal with Amaia is to have the strongest C#/Java engine, a title I think is held by CuckooChess at 2682 Elo. The Elo 3000+ engines are beyond reach given my development skill and available time (3 young children and a demanding job in banking technology).

RoadWarrior · Post by **RoadWarrior** » Sat Mar 24, 2012 6:13 pm

Interesting - Peter Österlund has just released a C++ implementation of his strong (Elo 2682) Java engine CuckooChess. He says that it's about twice as fast as the Java version: http://talkchess.com/forum/viewtopic.php?t=42999

From experience, C# runs faster than Java. So any claim of a C/C# slowdown in excess of a factor of 2 needs some pretty strong evidence before I'm going to believe it.

Stockfish port to C# Complete

Re: Stockfish port to C# Complete

Re: Stockfish port to C# Complete

Re: Stockfish port to C# Complete

Re: Stockfish port to C# Complete

Re: Stockfish port to C# Complete

Re: Stockfish port to C# Complete

Re: Stockfish port to C# Complete

Re: Stockfish port to C# Complete

Re: Stockfish port to C# Complete

Re: Stockfish port to C# Complete