m8 Dev log and C# experiments

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

eduherminio
Posts: 63
Joined: Mon Apr 05, 2021 12:00 am
Full name: Eduardo Caceres

Re: m8 Dev log and C# experiments

Post by eduherminio »

emadsen wrote: Sat Nov 18, 2023 6:36 pm
That doesn't provide type safety (contrasted with defining an enum or struct, which does). You can pass a Move alias to a method that requires a int parameter and the compiler won't complain.
I mentioned that in my post, didn't I? :lol:
Author of Lynx chess engine (GitHub, Lichess)
eduherminio
Posts: 63
Joined: Mon Apr 05, 2021 12:00 am
Full name: Eduardo Caceres

Re: m8 Dev log and C# experiments

Post by eduherminio »

JoAnnP38 wrote: Sat Nov 18, 2023 5:33 pm
I've never seen aliases used like this in the project file. I am going to remember that for my future projects. Thanks for sharing!!
It's relatively recent, it was introduced in C#10 (.NET 6) if I'm not mistaken, together with global usings.
https://learn.microsoft.com/en-us/dotne ... l-modifier
Author of Lynx chess engine (GitHub, Lichess)
mathmoi
Posts: 290
Joined: Mon Mar 13, 2006 5:23 pm
Location: Québec

Re: m8 Dev log and C# experiments

Post by mathmoi »

emadsen wrote: Sat Nov 18, 2023 6:36 pm
mathmoi wrote: Tue Nov 14, 2023 4:58 am I was inspired to try and write a chess engine using C# in another thread and decided to give it a go.
Welcome to the chess programming community. Good luck with your engine.
Thanks! But to be clear, I developped multiple chess engines over the last 25 years. One of them MatMoi VII was released as free but closed source software beacause I'm not particullary proud of the code quanlity and it's in french. I will open source my new c# engine.
mathmoi
Posts: 290
Joined: Mon Mar 13, 2006 5:23 pm
Location: Québec

Re: m8 Dev log and C# experiments

Post by mathmoi »

I have stopped development of my C# chess engines a couples of weeks ago.

I started by developping all the low level structures, like Move, Square, Piece, etc. in a way that would allow me to write expressive and intuitive code. I was afraid this would incure some overhead, so like I reported here before, I benchmark this technique and found that if I inlined all the methods and properties it should not result in a performance penalty. This made my code, IMO, really easy to read and even write. Here is an example from my Make function :

Code: Select all

case MoveType.CastleKingSide:
   Debug.Assert((_castlingOptions & CastlingOptionsHelpers.Create(_sideToMove, CastlingSide.KingSide)) != CastlingOptions.None);
   Debug.Assert(move.To.File == File.g);

   var rookFrom = new Square(GetCastlingFile(CastlingSide.KingSide), move.From.Rank);
   var rookTo = new Square(File.f, move.From.Rank);
   MovePiece(move.From, move.To);
   MovePiece(rookFrom, rookTo);
   _enPassantFile = File.Invalid;
   _castlingOptions &= _castlingMasks[move.From.Value];
   ++_halfMoveClock;
   break;
I also learned a lot about how struct and class are stored/copied in memory and how different containers actually worked. I lerned a lot by reading Leoric and Pedantic's code. This was really enjoyable experience I made me like C# even more than before. It is really easier to write clean code in C# than C++.

However... The performance was not great. I implemented the move generation (using black magic bitboard), the make/unmake methods then I wrote a Perft methods. At this point was the first time I could compare this engine to my previous engine written in C++. It was about 4 times slower. Even though my C++ engine also updates the zobrist hash and the piece-square evaluation. At first I though maybe something was wring and particularly slow, but I searched for multiple days, using VS profiler, event looking at the generated IL code at times and even though I found some things to improve, nothing gave me a significant improvement. I was still mostly 4 times slower (12 Mnps vs 40 Mnps on a signle thread on the same computer).

My conclusion for now is that either C# is fondamentally slower than C++ (I know it is, by I did not though it would be 4x slower) or my use of structure as integer wrappers is having a significant overhead that did not show up in my benchmarks. Unfortunately I did not figure out how I could se the assembly code generated by the jit computer to see by myself if theses structures are "optimized-out" as I think they should.

For now I went back to my C++ engine (which is also named m8, I need to rename one of them), but I hope to comme back to m8# (renamed!) someday I figure out what went wrong or live with the performance penalty, because I was really happy about how the code looked like compared to a C++ engine.
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: m8 Dev log and C# experiments

Post by mvanthoor »

lithander wrote: Wed Nov 15, 2023 12:02 am I like how you tried to replace integers with custom types that are more typesafe.
It's called the newtype pattern, and it's idiomatic for Rust. The problem though is that you can't reach the inner type directly, so you'll have to implement passthrough functions to reach the functionality you need from the inner type. (In Rust you could also implement the Deref operator, which makes the outer type automatically expose all the functions of the inner type, but that is considered ugly.)

It's on my refactor list for somewhere in the future, because now it's possible to swap things such as square, piece and sides.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
krunch
Posts: 10
Joined: Sat Sep 18, 2021 9:36 pm
Full name: Tony Schwebs

Re: m8 Dev log and C# experiments

Post by krunch »

mathmoi wrote: Wed Feb 21, 2024 7:27 pm Unfortunately I did not figure out how I could se the assembly code generated by the jit computer to see by myself if theses structures are "optimized-out" as I think they should.
You can use the DOTNET_JitDisasm=<Method> environment variable to dump assembly to the console. You should get assembly for a Tier0 unoptimized version and a Tier1 optimized version.

Code: Select all

C:\Users\krunc\source\repos\chess\bin\Release\net8.0>SET DOTNET_JitDisasm=GenerateMoves

C:\Users\krunc\source\repos\chess\bin\Release\net8.0>chess.exe

; Assembly listing for method Chess.MoveGenerator.MoveGen:GenerateMoves(int,ulong,ulong,ulong):this (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; 0 inlinees with PGO data; 6 single block inlinees; 0 inlinees without PGO data

G_M000_IG01:                ;; offset=0x0000

G_M000_IG02:                ;; offset=0x0000
       and      r9, r8
       je       SHORT G_M000_IG04
       align    [11 bytes for IG03]

G_M000_IG03:                ;; offset=0x0010
       add      bword ptr [rcx+0x10], -4
       xor      eax, eax
       tzcnt    rax, r9
       xor      eax, edx
       mov      r10, bword ptr [rcx+0x10]
       mov      dword ptr [r10], eax
       blsr     r9, r9
       jne      SHORT G_M000_IG03

G_M000_IG04:                ;; offset=0x002C
       and      r8, qword ptr [rsp+0x28]
       je       SHORT G_M000_IG06
       align    [13 bytes for IG05]

G_M000_IG05:                ;; offset=0x0040
       xor      eax, eax
       tzcnt    rax, r8
       xor      eax, edx
       mov      r10, bword ptr [rcx+0x08]
       mov      dword ptr [r10], eax
       add      bword ptr [rcx+0x08], 4
       blsr     r8, r8
       jne      SHORT G_M000_IG05

G_M000_IG06:                ;; offset=0x005C
       ret

; Total bytes of code 93

OK! 153ms, 774380K NPS
OK! 226ms, 853847K NPS
OK! 330ms, 539988K NPS
OK! 892ms, 791358K NPS
OK! 0ms, 576587K NPS
OK! 193ms, 846268K NPS
Total: 1361558651 Nodes, 1797ms, 757443K NPS

mathmoi
Posts: 290
Joined: Mon Mar 13, 2006 5:23 pm
Location: Québec

Re: m8 Dev log and C# experiments

Post by mathmoi »

krunch wrote: Fri Feb 23, 2024 7:10 am
mathmoi wrote: Wed Feb 21, 2024 7:27 pm Unfortunately I did not figure out how I could se the assembly code generated by the jit computer to see by myself if theses structures are "optimized-out" as I think they should.
You can use the DOTNET_JitDisasm=<Method> environment variable to dump assembly to the console. You should get assembly for a Tier0 unoptimized version and a Tier1 optimized version.

Code: Select all

C:\Users\krunc\source\repos\chess\bin\Release\net8.0>SET DOTNET_JitDisasm=GenerateMoves

C:\Users\krunc\source\repos\chess\bin\Release\net8.0>chess.exe

; Assembly listing for method Chess.MoveGenerator.MoveGen:GenerateMoves(int,ulong,ulong,ulong):this (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; 0 inlinees with PGO data; 6 single block inlinees; 0 inlinees without PGO data

G_M000_IG01:                ;; offset=0x0000

G_M000_IG02:                ;; offset=0x0000
       and      r9, r8
       je       SHORT G_M000_IG04
       align    [11 bytes for IG03]

G_M000_IG03:                ;; offset=0x0010
       add      bword ptr [rcx+0x10], -4
       xor      eax, eax
       tzcnt    rax, r9
       xor      eax, edx
       mov      r10, bword ptr [rcx+0x10]
       mov      dword ptr [r10], eax
       blsr     r9, r9
       jne      SHORT G_M000_IG03

G_M000_IG04:                ;; offset=0x002C
       and      r8, qword ptr [rsp+0x28]
       je       SHORT G_M000_IG06
       align    [13 bytes for IG05]

G_M000_IG05:                ;; offset=0x0040
       xor      eax, eax
       tzcnt    rax, r8
       xor      eax, edx
       mov      r10, bword ptr [rcx+0x08]
       mov      dword ptr [r10], eax
       add      bword ptr [rcx+0x08], 4
       blsr     r8, r8
       jne      SHORT G_M000_IG05

G_M000_IG06:                ;; offset=0x005C
       ret

; Total bytes of code 93

OK! 153ms, 774380K NPS
OK! 226ms, 853847K NPS
OK! 330ms, 539988K NPS
OK! 892ms, 791358K NPS
OK! 0ms, 576587K NPS
OK! 193ms, 846268K NPS
Total: 1361558651 Nodes, 1797ms, 757443K NPS

Hi,

This looks much easier than what I found, I'll give it a try this weekend.

Thanks.
Iketh
Posts: 4
Joined: Fri Oct 28, 2022 6:33 am
Full name: Keith Downes

Re: m8 Dev log and C# experiments

Post by Iketh »

mathmoi wrote: Wed Feb 21, 2024 7:27 pm I have stopped development of my C# chess engines a couples of weeks ago.

I started by developping all the low level structures, like Move, Square, Piece, etc. in a way that would allow me to write expressive and intuitive code. I was afraid this would incure some overhead, so like I reported here before, I benchmark this technique and found that if I inlined all the methods and properties it should not result in a performance penalty. This made my code, IMO, really easy to read and even write. Here is an example from my Make function :

Code: Select all

case MoveType.CastleKingSide:
   Debug.Assert((_castlingOptions & CastlingOptionsHelpers.Create(_sideToMove, CastlingSide.KingSide)) != CastlingOptions.None);
   Debug.Assert(move.To.File == File.g);

   var rookFrom = new Square(GetCastlingFile(CastlingSide.KingSide), move.From.Rank);
   var rookTo = new Square(File.f, move.From.Rank);
   MovePiece(move.From, move.To);
   MovePiece(rookFrom, rookTo);
   _enPassantFile = File.Invalid;
   _castlingOptions &= _castlingMasks[move.From.Value];
   ++_halfMoveClock;
   break;
I also learned a lot about how struct and class are stored/copied in memory and how different containers actually worked. I lerned a lot by reading Leoric and Pedantic's code. This was really enjoyable experience I made me like C# even more than before. It is really easier to write clean code in C# than C++.

However... The performance was not great. I implemented the move generation (using black magic bitboard), the make/unmake methods then I wrote a Perft methods. At this point was the first time I could compare this engine to my previous engine written in C++. It was about 4 times slower. Even though my C++ engine also updates the zobrist hash and the piece-square evaluation. At first I though maybe something was wring and particularly slow, but I searched for multiple days, using VS profiler, event looking at the generated IL code at times and even though I found some things to improve, nothing gave me a significant improvement. I was still mostly 4 times slower (12 Mnps vs 40 Mnps on a signle thread on the same computer).

My conclusion for now is that either C# is fondamentally slower than C++ (I know it is, by I did not though it would be 4x slower) or my use of structure as integer wrappers is having a significant overhead that did not show up in my benchmarks. Unfortunately I did not figure out how I could se the assembly code generated by the jit computer to see by myself if theses structures are "optimized-out" as I think they should.

For now I went back to my C++ engine (which is also named m8, I need to rename one of them), but I hope to comme back to m8# (renamed!) someday I figure out what went wrong or live with the performance penalty, because I was really happy about how the code looked like compared to a C++ engine.
This is user error. C# is easy to destroy the speed of software when accidentally copying objects instead of using references, for example, and overusing object oriented features such as interfaces. There are also several gotchas that only experience will teach you, such as writing expressions in a function call (the compiler wont optimize it or the function). I'm getting C# within 10% the speed of c++ in identical ray tracing programs and within 15% with my chess engine(106mnps vs 120). Probably can get it faster in chess if i spent more time on it. Most if my effort is in c++. The first time i ported chess from c# to c++, i was 70mnps in c# and 73mnps in c++, identical programs producing exact same number of nodes in a position.
mathmoi
Posts: 290
Joined: Mon Mar 13, 2006 5:23 pm
Location: Québec

Re: m8 Dev log and C# experiments

Post by mathmoi »

Iketh wrote: Mon May 06, 2024 4:52 am This is user error. C# is easy to destroy the speed of software when accidentally copying objects instead of using references, for example, and overusing object oriented features such as interfaces. There are also several gotchas that only experience will teach you, such as writing expressions in a function call (the compiler wont optimize it or the function). I'm getting C# within 10% the speed of c++ in identical ray tracing programs and within 15% with my chess engine(106mnps vs 120). Probably can get it faster in chess if i spent more time on it. Most if my effort is in c++. The first time i ported chess from c# to c++, i was 70mnps in c# and 73mnps in c++, identical programs producing exact same number of nodes in a position.
Hi Iketh,

Thanks for your answer. You are probably right part of the problem might be caused by my innexperience with optimizing c# code. This being said, I'm an experienced developper and c# is not a new language to me. I did a lots of reseach to understand how value-types and reference-types are created/copied/destroyed.

Since I started this threads I did more research and I found the primary cause of the slowness I observed. The dotnet JIT compiler as a concept called "inliner budget". It's a limit on how much effort will be made to inline methods, once this budget is exhausted, no more methods are inlined even small one liner that are decorated with the aggressiveInlining attribute. In my case, since I heavilly relly on hundreds of theses small method, the inliner budget is eventually exhausted and simple expressions like "var x = move.From.Column.Value", become three non inlined method called, killing the performance. I confirmed this by examining the code generated by the JIT compiler. Method that should obviously be inlined were not.

I did not find a solution to this, but maybe in a future version of .net this might be improved. There is an opened issue here about this : https://github.com/dotnet/runtime/issues/93069

In the mean time I work on my C++ engine (m8), but I would really like to go back to m8#.

Thanks.