SMP, first shot at implementation

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: SMP, first shot at implementation

Post by mvanthoor »

Joost Buijs wrote: Mon Sep 14, 2020 7:21 pm All this fancy stuff will put you in a straitjacket, I don't know about Rust ('Rust roest' we say here in Holland), but I still think C/C++ is the only serious language to write a chess engine in. Most other languages output slower code (or the compilers are worse), for instance I keep a shadow copy of my engine in Pascal (Free Pascal to be exact) and it is at least 30 to 40% slower as my C++ version.
In the 70's and 80's, people where of the opinion that assembler was the only serious language to write software in.

Rust can be as fast as C and C++. Yes, it's a LOT harder to get started with (especially compared to C), just because you have to know quite a lot about the language if you want to do non-trivial things. However, if you get going, you can be close to 100% sure that if your code compiles, it will run exactly as intended if you didn't make any logic mistakes. (lLike: getting stuff[10] if there are only 5 elements, saying "if x >= 5" if you actually meant 6.)

While Rust isn't an easy language, and I'm sometimes frustrated by how hard some stuff is to actually implement, in the end I'm always glad about it. I'd rather have a hard time during implementation and then knowing all will (very probably) be correct, than having an easy time during implementation and then having to resolve crash bugs because somewhere, I did something that isn't 100% correct.

C is a very easy language to learn, but very hard to write correctly. (I know... I've written a lot of it in the past.)
Rust is a very hard language to learn, but very easy to write correctly. (I'm discovering over and over again right now.)
fabianVDW wrote: Mon Sep 14, 2020 8:16 pm I would love to disprove this quote with an actual fast chess engine, but sadly FabChess is not, since I am not the best programmer. Have a look at Asymptote or Rustic though, they are both very fast. Most compiled Rust code compiles to similar speed as C/C++. Convince yourself with a quick google search about Rust performance vs C++ performance if you want. Fact is that you can port C code to Rust with unsafe and achieve the same performance pretty much guaranteed - this would include a lot of unsafe that one doesn't want though.
Thanks for the props Fabian, but the engine isn't finished yet. Today I got a step closer though. (It now has a threaded communication protocol; on the console for now, so I can input moves manually while writing/testing the search.) I've also implementing the command line (using the CLAP library) so I can now do things such as --fen and --perft 6 from the command line, and it has a very simple text user interface (in its own thread, already) that accepts moves, so I can start writing the search. I'm glad I've been able to resume development.

But you are right: the move generator and make_move/take_move are quite fast. Maybe people remember the optimization thread: http://talkchess.com/forum3/viewtopic.php?f=7&t=73577 Rustic's perft capability (without hash or counting tricks) is just as fast, or faster (depending on the position) than Weiss (written in C), and 20% faster than Minic (written in C++). I assume Minic could be made faster. Maybe Weiss and Minic have become faster; I haven't checked it since that thread.

The only unsafe code I've used the one you yourself suggested: not having the move list zero-initialized when creating it, because I'll be writing moves into them anyway.

===

To get back on topic with regard to SMP and the shared hash table... I'm sure that's going to be possible as well, somehow. I'll see when I get there. I have my own array-based hash table (which works already), but I'll also look into HashMap. The engine already uses a thread for its communication, and there will be a thread for the search as well. As Rust doesn't even allow global variables without going unsafe, the engine has none. Because the search will start out having its own thread (and probably its own copy of the board, which also contains the complete game state and history) from the beginning, it should be a problem to give it 4 threads and 4 boards (or 16 threads and 16 boards), and share the hash table somehow.

===

Oh, and with regard to speed vs. strength:
Booot 6.4 64-bit 4CPU 3297 (Pascal)
Combusken 1.3.0 64-bit 3052 (Go... 3x as slow as Rust, and a language I'd probably never use voluntarily)
CuckooChess 1.12 32-bit 2590 (Java (!) So even though it's not a top engine, it's still possible to write a strong engine in a quite slow language)
Delphil 3.2 64-bit 4CPU 2554 (Delphi... which should be as fast as C++. Borland wrote its C++ Builder GUI and cmpiler in Delphi... and Dephi was written in Delphi itself, which was written in Borland Pascal for Windows.)

In short, I think if you're using any language that compiles to machine code, it's no use mulling over the speed of the language if your engine hasn't at least reached 2500+ ELO on the CCRL-list. If you can write an almost 2600 ELO engine in Java, and a 3000+ ELO engine in Go, you can certainly do it in languages such as Pascal, Rust, Ada, or whatever language that has a current compiler.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: SMP, first shot at implementation

Post by mvanthoor »

fabianVDW wrote: Mon Sep 14, 2020 8:16 pm This is exactly what I do: I take the 16-bit integer which represents the TTMove and check if the resulting move is actually legal with a different procedure than make, solely written for this purpose(first I check if the 16bits even describe a correct move, because I don't save the GameMove enum in the HashTable, but rather 16bytes, to save memory). You are also right in that this also results in a performance loss - however after implementing it in FabChess instead of locks I remember even a slight increase in NPS even on a single thread, much faster on multiple threads. Also: What are you going to do about it? If you don't use perfect hashing(which will cost storage) you have to validate if the move is legal or your engine has to be resilient to making illegal moves in some positions and not crashing.
You can get away with 16 bits, for storing the entire move information? Interesting. (Or did you mean 16 bytes? But that would be a lot.)

I already have these:

Code: Select all

PIECE       	:   3        0-7 (use only 0-6)
FROM        	:   6        0-63
TO          	:   6        0-63
CAPTURE     	:   3        0-7 (captured piece)
PROMOTION   	:   3        0-7 (piece promoted to)
ENPASSANT   	:   1        0-1
DOUBLESTEP  	:   1        0-1
CASTLING    	:   1        0-1
That's 24 bits already, and I still need to store the depth (8 bits) and evaluation (don't know yet, but I assume 12-16 bits).

It's even more interesting that decoding the move and checking for legality is actually faster than waiting for the lock to be lifted on the TT if you have a lot of threads. I'll keep that in mind when I get to implementing Lazy SMP.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: SMP, first shot at implementation

Post by Joost Buijs »

mvanthoor wrote: Tue Sep 15, 2020 2:55 am In the 70's and 80's, people where of the opinion that assembler was the only serious language to write software in.
Yes, I'm aware of this because I wrote my first chess engine in the late 70's and it was written entirely in assembler. At that time C compilers didn't have the same quality as they have nowadays.

Oh, and with regard to speed vs. strength:
Booot 6.4 64-bit 4CPU 3297 (Pascal)
Combusken 1.3.0 64-bit 3052 (Go... 3x as slow as Rust, and a language I'd probably never use voluntarily)
CuckooChess 1.12 32-bit 2590 (Java (!) So even though it's not a top engine, it's still possible to write a strong engine in a quite slow language)
Delphil 3.2 64-bit 4CPU 2554 (Delphi... which should be as fast as C++. Borland wrote its C++ Builder GUI and cmpiler in Delphi... and Dephi was written in Delphi itself, which was written in Borland Pascal for Windows.)
It is true that you can write a strong engine in a slow language, but why should you throw away Elo by not using the fastest language there is?

I've been busy with this stuff for 43 years and used all Borland/Embarcadero languages from Turbo Pascal/Turbo C to Delphi and C++ Builder, these compilers always generate slower code than a modern C/C++ compiler like GCC or CLang. Maybe the differences are small (20% or so), but still so large that I'm not going to use it.

Of course it is possible to develop a Pascal compiler with the same quality of code generation and optimization as current C/C++ compilers have, it doesn't have to to anything with the language by itself, it's just that the compilers are worse.
abulmo2
Posts: 433
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: SMP, first shot at implementation

Post by abulmo2 »

Joost Buijs wrote: Tue Sep 15, 2020 6:48 am
mvanthoor wrote: Tue Sep 15, 2020 2:55 am In the 70's and 80's, people where of the opinion that assembler was the only serious language to write software in.
Yes, I'm aware of this because I wrote my first chess engine in the late 70's and it was written entirely in assembler. At that time C compilers didn't have the same quality as they have nowadays.

Oh, and with regard to speed vs. strength:
Booot 6.4 64-bit 4CPU 3297 (Pascal)
Combusken 1.3.0 64-bit 3052 (Go... 3x as slow as Rust, and a language I'd probably never use voluntarily)
CuckooChess 1.12 32-bit 2590 (Java (!) So even though it's not a top engine, it's still possible to write a strong engine in a quite slow language)
Delphil 3.2 64-bit 4CPU 2554 (Delphi... which should be as fast as C++. Borland wrote its C++ Builder GUI and cmpiler in Delphi... and Dephi was written in Delphi itself, which was written in Borland Pascal for Windows.)
It is true that you can write a strong engine in a slow language, but why should you throw away Elo by not using the fastest language there is?
Is not asmfish a proof that assembler is still significantly faster than modern C/C++ compiler ?
Many languages use the same backend as gcc or clang, and thus benefit of the same optimization capabilities. For instance, Rust uses the LLVM backend and should be as fast as a c/c++ program compiled with clang. The panel of fast languages is most wider than just C or C++.
Richard Delorme
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: SMP, first shot at implementation

Post by Joost Buijs »

abulmo2 wrote: Tue Sep 15, 2020 8:04 am
Joost Buijs wrote: Tue Sep 15, 2020 6:48 am
mvanthoor wrote: Tue Sep 15, 2020 2:55 am In the 70's and 80's, people where of the opinion that assembler was the only serious language to write software in.
Yes, I'm aware of this because I wrote my first chess engine in the late 70's and it was written entirely in assembler. At that time C compilers didn't have the same quality as they have nowadays.

Oh, and with regard to speed vs. strength:
Booot 6.4 64-bit 4CPU 3297 (Pascal)
Combusken 1.3.0 64-bit 3052 (Go... 3x as slow as Rust, and a language I'd probably never use voluntarily)
CuckooChess 1.12 32-bit 2590 (Java (!) So even though it's not a top engine, it's still possible to write a strong engine in a quite slow language)
Delphil 3.2 64-bit 4CPU 2554 (Delphi... which should be as fast as C++. Borland wrote its C++ Builder GUI and cmpiler in Delphi... and Dephi was written in Delphi itself, which was written in Borland Pascal for Windows.)
It is true that you can write a strong engine in a slow language, but why should you throw away Elo by not using the fastest language there is?
Is not asmfish a proof that assembler is still significantly faster than modern C/C++ compiler ?
Many languages use the same backend as gcc or clang, and thus benefit of the same optimization capabilities. For instance, Rust uses the LLVM backend and should be as fast as a c/c++ program compiled with clang. The panel of fast languages is most wider than just C or C++.
Imho comparing Stockfish with asmFish doesn't mean a thing because Stockfish doesn't seem to be written with speed in mind at all.

I agree that there are a few languages that come NEAR the speed of C/C++, and I doubt that you can write a faster program in assembler without spending a lifetime optimizing it by hand. C/C++ also gives you the option to use inline assembler, if you really want to try to improve upon the compilers optimizer it would be a lot easier to replace a time critical routine with inline assembler than to write the whole program in assembler.
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: SMP, first shot at implementation

Post by mvanthoor »

Joost Buijs wrote: Tue Sep 15, 2020 6:48 am It is true that you can write a strong engine in a slow language, but why should you throw away Elo by not using the fastest language there is?

I've been busy with this stuff for 43 years and used all Borland/Embarcadero languages from Turbo Pascal/Turbo C to Delphi and C++ Builder, these compilers always generate slower code than a modern C/C++ compiler like GCC or CLang. Maybe the differences are small (20% or so), but still so large that I'm not going to use it.
Well, I partially agree. I wouldn't write a chess program in Java or Python (I wouldn't write anything in those languages, to be honest), but I can imagine that someone who isjust starting out uses a language they know well, so they are not distracted by learning a new language on top of learning how to write a chess engine. If I had written my engine in C, it would probably have been done already. I could have done it even faster in pseudo-C++ (a C-style program in C++, including the Boost library so I don't have to study C++11 and newer)... but I didn't want to, because I didn't want to write just another C/C++ chess engine.

Because I'm already preparing to implement Lazy SMP in the future, my engine is already threaded. It now has three options for communication (which are not yet complete):

- console (text user interface, for move input like "e2e4", so I can easily test the search)
- UCI (this will be the default)
- XBoard (implemented later)

This communication already runs in a thread and it's working. Now I'm looking into how to share a pointer to the board with this thread, so I can actually make moves on the engine's board. There will also be a thread for searching, which will share the hash table (and get a copy if the board for itself.)

Obviously "comm" and "search" need to communicate.

Rust's concurrency may be "fearless", but implementing it for anything that is less trivial than printing "Hello" 10 times is still no cakewalk if you don't know the language very well yet. But I'll get there.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: SMP, first shot at implementation

Post by Joost Buijs »

mvanthoor wrote: Tue Sep 15, 2020 11:35 am
Joost Buijs wrote: Tue Sep 15, 2020 6:48 am It is true that you can write a strong engine in a slow language, but why should you throw away Elo by not using the fastest language there is?

I've been busy with this stuff for 43 years and used all Borland/Embarcadero languages from Turbo Pascal/Turbo C to Delphi and C++ Builder, these compilers always generate slower code than a modern C/C++ compiler like GCC or CLang. Maybe the differences are small (20% or so), but still so large that I'm not going to use it.
Well, I partially agree. I wouldn't write a chess program in Java or Python (I wouldn't write anything in those languages, to be honest), but I can imagine that someone who isjust starting out uses a language they know well, so they are not distracted by learning a new language on top of learning how to write a chess engine. If I had written my engine in C, it would probably have been done already. I could have done it even faster in pseudo-C++ (a C-style program in C++, including the Boost library so I don't have to study C++11 and newer)... but I didn't want to, because I didn't want to write just another C/C++ chess engine.
Of course it's the best to start with the language you know and feel comfortable with. At the beginning of the 80s I started programming in C, later C++ and made a few sidesteps to Pascal basically because a customer demanded it.

I have to admit that I don't know much about recent languages like D or Rust, if these languages offer default memory safety this must have some impact on performance, this is inevitable.

Scripting langues like Java Script, Python or Ruby are way to slow to build a serious chess engine with, this doesn't mean that you can't make an engine with it that is still a powerful opponent for most humans.
Henk
Posts: 7216
Joined: Mon May 27, 2013 10:31 am

Re: SMP, first shot at implementation

Post by Henk »

Saw a video recently where Robert Martin said Clojure was the best programming language.

Hmm Clojure is a dialect of LISP. So probably not fast.
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: SMP, first shot at implementation

Post by mvanthoor »

Joost Buijs wrote: Tue Sep 15, 2020 12:49 pm I have to admit that I don't know much about recent languages like D or Rust, if these languages offer default memory safety this must have some impact on performance, this is inevitable.
With a garbage collected language such as C#, you're correct.
With a language like Rust, you're mistaken. The memory safety is all done at compile time, not runtime. The compiler checks *everything* (variable life times, thread life cycle, multiple references... etc...) to make sure that everything is safe. That's the reason why Rust is quite hard to get started with, if you're not used to thinking about memory management. It's also the reason why the compiler is slow.

If you have so much C and C++ experience and you're perfectly capable of writing safe programs in those languages, a language such as Rust would be right up your alley, because you're not going to suddenly write unsafe programs. You might, however, encounter the unnerving situation that something you THOUGHT was perfectly safe, and normally is in 99.9% of cases, can still be problematic somehow.

And then there's the sometimes bewildering syntax of some languages (including, sometimes, Rust itself). Sometimes I think languages use archaic notation because that's what programmers are used to.

fn? func? Why not just function?
pub? mod? impl? Why not public, module and implement?

Sometimes I think that language designers forget that we now have code completion in IDE's :P
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
abulmo2
Posts: 433
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: SMP, first shot at implementation

Post by abulmo2 »

mvanthoor wrote: Tue Sep 15, 2020 1:42 pm
Joost Buijs wrote: Tue Sep 15, 2020 12:49 pm I have to admit that I don't know much about recent languages like D or Rust, if these languages offer default memory safety this must have some impact on performance, this is inevitable.
With a garbage collected language such as C#, you're correct.
There is a lot of myths about garbage collection. Here is the relative time spent in the garbage collector (GC) in Amoeba (programmed in D):

Code: Select all

0.00%  amoeba   amoeba              [.] nothrow void* gc.impl.conservative.gc.Gcx.smallAlloc(ulong, ref ulong, uint, const(TypeInfo))
0.00%  amoeba   amoeba              [.] nothrow uint gc.impl.conservative.gc.ConservativeGC.runLocked!(gc.impl.conservative.gc.ConservativeGC.getAttr(void*).go(gc.impl.conservative.gc.Gcx*, void*), gc.impl.conservative.gc.otherTime, gc.impl.conservative.gc.numOthers, gc.impl.conservative.gc.Gcx*, void*).runLocked(ref gc.impl.conservative.gc.Gcx*, ref void*)
0.00%  amoeba   amoeba              [.] nothrow void* gc.impl.conservative.gc.ConservativeGC.runLocked!(gc.impl.conservative.gc.ConservativeGC.mallocNoSync(ulong, uint, ref ulong, const(TypeInfo)), gc.impl.conservative.gc.mallocTime, gc.impl.conservative.gc.numMallocs, ulong, uint, ulong, const(TypeInfo)).runLocked(ref ulong, ref uint, ref ulong, ref const(TypeInfo))
0.00%  amoeba   amoeba              [.] nothrow uint gc.impl.conservative.gc.ConservativeGC.getAttr(void*).go(gc.impl.conservative.gc.Gcx*, void*)
0.00%  amoeba   amoeba              [.] nothrow void gc.impl.conservative.gc.Pool.setBits(ulong, uint)
0.00%  amoeba   amoeba              [.] nothrow int gc.impl.conservative.gc.Gcx.collectAllRoots(bool).__foreachbody3(ref core.gc.gcinterface.Range)
0.00%  amoeba   amoeba              [.] nothrow void gc.impl.conservative.gc.Gcx.scanBackground()
All these sum to 0.00%.
Garbage is collected only when memory allocation is requested. In Amoeba, memory is allocated only during program initialisation and when reading input from the user/GUI, not in critical code section, so costs nothing measurable during the program execution.
Richard Delorme