Testing strategies for my engines playing strength

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Ras
Posts: 2488
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Testing strategies for my engines playing strength

Post by Ras »

lithander wrote: Sat Jan 30, 2021 2:51 amIs there anything you found particularily irritating or consider important for even the most barebone engine?
Don't rely on any particular order of parameters within the go command. You'll need to have a parser that can cope with wtime, btime, winc, binc, movestogo (if present) in any order. Not for now as barebone, but the design should allow for the depth, nodes, and mate parameters to appear anywhere once you implement them.

There's a pitfall with positions transmitted as FEN: the e.p. square may be given even if there is no enemy pawn to possibly do the capture. That can trip up your repetition recognition if you don't sanitise the e.p. square.
Rasmus Althoff
https://www.ct800.net
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Testing strategies for my engines playing strength

Post by mvanthoor »

lithander wrote: Sat Jan 30, 2021 2:51 am I have replaced my piece counting evaluation with the tuned PSTs from Sunfish because I was curious and it made the engine play much stronger. But it's a bit like using neural networks - sure they work but you won't have an intuition why. I'll stick with counting pieces for now, that I understand just fine! ;)
Very good idea. It's good to know what a PST actually does. When your program gets stronger, change the PST's here and there, and see what effect they have on the playing style. A PST basically isn't more than you telling the program: "This is a good square for that piece." If the PST's are tuned, you often use the evaluation of a stronger engine as a reference; at least in the beginning. The PST's then start to include compensation for the lack of knowledge in the evaluation function. This is the reason why you need to retune the PST's every time you change the evaluation.

My PST's are my own; I'm not going to copy/paste PST's from other engines. I'll probably go and tune them with my own engine as a reference to see if there are versions that can make the engine stronger than it is now. That is what I meant with maxing out each feature before adding a new one.
Many thanks for taking the time! Really appreciate it.

The performance doesn't surprise me. First of all I wrote my move generation code from scratch and so it doesn't use bitboards. My board representation is just an 8x8 array of pieces and then you loop over that and whenever you find a piece there's a switch statement that calls piece-type-specific routines to add the moves of this piece. It's very straight forward. Here's the Youtube video with timecode where I explain that part to my (immagined :lol:) audience. But of course conjuring black magic on 64bit integers would be much faster. I just didn't feel like I could do that without looking at existing implementations which is against my rules. My move generator's 'perft 6' takes 15 seconds on the start position. That's about 8 Million nodes generated per second. For a short while I was very proud about myself, then I discovered that 'qperft' finds the same result in 0.47 seconds and I cried a little! ;)
Well... don't use QPerft as a reference. it is one of the fastest perft _calculators_ in the world. It's a fine program, but it includes many optimizations that are specific to perft, and have no bearing on actually playing chess. There are some optimizations in there however, that _are_ suited to make a move generator faster in a 'real' chess program, without any side effects. Those optimizations are for a later date, at least for my engine.

Rustic's perft speed, on an i7-6700K:

Code: Select all

Perft 1: 20 (0 ms, inf leaves/sec)
Perft 2: 400 (0 ms, inf leaves/sec)
Perft 3: 8902 (0 ms, inf leaves/sec)
Perft 4: 197281 (5 ms, 39456200 leaves/sec)
Perft 5: 4865609 (140 ms, 34754350 leaves/sec)
Perft 6: 119060324 (3286 ms, 36232600 leaves/sec)
Total time spent: 3431 ms
Execution speed: 36179695 leaves/second
As said, it's about 5 times faster than your engine, but it can't (yet) hold a candle to something like QPerft. It will become much faster when a transposition table is implemented; Perft is good for testing this, because if your transposition table doesn't work correctly (or your Zobrist hashing is off, which also makes your TT not work as it should), your perft numbers won't match.

It is also good to keep in mind that, when your engine progresses, it will become _slower_ if you don't actively do something to counter that. Previously, my Perft speeds were higher (around 41 million lps), but then I made the calculation of the PST's incremental.

What this means is that the engine doesn't calculate PST's from scratch anymore. Previously, the evaluation would run through the pieces, and add all the PST values for one side. Now, I add them all together when I set up the board. Then, when I move a piece, I substract the PST value from that square, and then add the value of the destination square.

This makes the evaluation MUCH faster. However, because the engine now does more work when making a move, I lost about 10% perft speed.

The evaluation is one of, if not the, heaviest/most called function in the engine, so a faster evaluation is better than a fast perft. Perft has no bearing on how strong/fast your total engine is; it shows how fast it can generate and execute moves.

Also, keep in mind that the numbers will fluctuate. When you add/remove code to your program, the compiler will change the layout and how the program sits in memory. Sometimes even some stuff like adding a single function of 3 lines that calculates one value, can change the perft speed of your program... either negatively, or positively.

With regard to your audience: I think there is one. Seeing how busy this forum is since the beginning of the pandemic in 2020, chess programming seems to have taken a real flight. If you are looking for video's similar to yours, with lots of good information, look at these series:


Chess Programming channel by Maksim Korz (Code Monkey King)
Writing a chess engine in C, by Richard Allbert (Bluefever Software)

In time, as I fill this out, written information about chess programming will also be available on Rustic's website: https://rustic-chess.org/
The search in the version you tried was just minmax, so no pruning at all. It reaches a fixed depth of 4 plys and stops. But quiescence is on my list for version 0.2 because the kind of blunders the lack of it causes is really hard to endure. Even for a bad chess player like myself! :)
Oh, OK. That explains it. Even with basic alpha-beta, qsearch, and simple MVV-LVA sorting, you should be able to reach depth 7-8 in the middle game on something like an i7-6700K, in a few seconds.
I only implement a tiny subset of UCI at the moment. In version 0.2 I plan to at least support time controls and I also want to send an 'info' string back to the GUI. But I don't think I will add all UCI commands. Is there anything you found particularily irritating or consider important for even the most barebone engine?
"Irritating" maybe is the wrong word. But, as Ras says, you should not rely on a particular order of commands or parameters within those commands. Some things I did in my engine:

- If the engine receives a command it doesn't know, it just ignores it without any reply. This is what the protocol states it should do.
- If the engine receives "position startpos moves ..." and it encounters an illegal move, it stops parsing there, and sends an "info string <error>" to the GUI.
- If it receives an illegal FEN, it sends an "info string <error>", and does not change the board as it is.
- As soon as "go" is parsed, the engine assumes "infinite", and changes this if it encounters something else. If it encounters stuff it doesn't know, or it encounters nothing else, "go" will be "go infinite".
- The protocol states that "an engine should start up as fast as possible", and thus it should initialize only after "ucinewgame". Maybe this was a point 20 years ago, but nowadays, I disagree. When I start my engine, I expect it to be ready immediately. This means the following...
- It does not _need_ "ucinewgame" to start playing its first game; if it does receive it, it just resets the board. Then it will have done two initializations. Who cares...
- It doesn't _need_ "uci" to start playing or analyzing. The GUI could conceivably ignore everything, and just immediately send "go <...>", and the engine will work.
- I have implemented an -f (--fen) parameter, so I can set up a fen string on the command line, or in a script, without having to execute "position fen <fen>"
- I have implemented a -q (--quiet) parameter when the engine starts up. In the latest version, it is -q <integer>. If I start the engine with -q0 (or no parameter at all), it outputs full info and stats. -q1 surpresses the sending of stats, and only outputs the info information with regard to depth. I use this when I want to quickly analyze a position. -q2 surpresses everything. Some GUI's are very slow/bad at parsing the incoming information (Arena can't handle great blasts of information when running a tournament, for example). If you want to run a tournament and are not going to watch it, "-q 2" can be used to send no info/stats to the GUI or tournament manager.

All of this means that I can do this:

./rustic -q1 -f "fen_string"
(engine starts)
go

And it will analyze the position, only outputting depth/speed information, and no intermediate stats.

All UCI comamnds are implemented in Rustic, except for the following:

- register/copyprotection: no use for this in an open source engine.
- debug: Not needed it. Maybe, someday, I'll implement it.
- go mate: never used this. Maybe someday I'll implement it.
- go searchmoves: didn't even know it existed. Many GUI's can't even do it. I'll probably not implement this.
I noticed your engine before and liked the patience and care that oozes from it.
Thanks. I try. Sometimes I feel as if my development is too slow, because I take so much time for implementing and optimizing each function to the point where I feel that I have to never touch it again. On the other hand, it doesn't matter... the only person for which I'm writing this engine, is me, myself and I. In the end, I intend to start using this engine to analyze games. The target to reach is (someday) at least 2851 ELO on CCRL. That is the rating of Fritz 11, which has been my main engine for a long time. The next target will be 2895, which is the rating of Glaurung 2.2., the immediate predecessor of Stockfish. I have not set any goals with regard to the time it may take.
Rust is also an interesting choice. It's been on my radar for a while but I never gave it a try, yet. But it sounds perfect for chess engines.
Rust compiles down to LLVM intermediate language (same as used by the Clang and Clang++ compilers), so in the end, Rust is basically C/C++ with regard to speed. The one thing I love & hate at the same time is crates.io. (Rust's version of NuGet, pip, or NPM.) It makes it very easy and convenient to add a dependency, but one small dependency often is small by itself because it leans on 20 other dependencies. Rustic only has these:

rand: Random number generator
if_chain: a way to prevent massive if x && y && z && a && ... {} statements.
clap: command line parser library
crossbeam-channel: much faster multi-threading channel implementation than what is in the standard library right now.

Even so, a compile pulls in 34 crates in total.

However, there is also a huge amount of junk there. IMHO, crates.io should have been a curated repository, which only contains crates that are sanctioned by the Rust Core Team; i.e. crates that are so massive and extensively used, that they can be considered as essential. A crate could start out on "public-crates.io", and if it becomes big and important enough, it could transition to "crates.io". Many such idea's have been floated around, but the Rust Core team doesn't want to implement any of it. They are convinced that "the people" will make the important crates visible/at the top in crates.io, and the rest will become unused automatically.
When I feel like I have a good understanding of what components go into a strong engine (and why) I can totally see myself trying something more ambitious. I even have a name already for my potential long-term project! But one step at a time... it's been only a few months ago that I watched Queens Gambit and got interested in chess, after all, haha. :mrgreen:
Good luck. It's rare to see someone writing a chess engine after just getting into chess. I've been playing (but not studying) chess for 30 years. I'm a decent but not super strong player myself (1850-2000 ELO, if I put the effort in).
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
jdart
Posts: 4367
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Testing strategies for my engines playing strength

Post by jdart »

I have been using C++ for decades, but am a bit frustrated with it now. There are just a lot of things still that are there due to its "C" roots. I don't know Rust, but I have done some Ruby coding, and Ruby is much more powerful in some ways, especially the metaprogramming features. My engine has some tools that actually generate C++ code, for example the tuner does this, but Ruby can basically rewrite its own code at runtime and then execute it. The Ruby array map feature is also very powerful and concise. However, I don't think it is up to C++ in speed.
IanKennedy
Posts: 55
Joined: Sun Feb 04, 2018 12:38 pm
Location: UK

Re: Testing strategies for my engines playing strength

Post by IanKennedy »

lithander wrote: Sat Jan 30, 2021 2:51 am
I have replaced my piece counting evaluation with the tuned PSTs from Sunfish because I was curious and it made the engine play much stronger. But it's a bit like using neural networks - sure they work but you won't have an intuition why. I'll stick with counting pieces for now, that I understand just fine! ;)
It's much more fulfilling to do your own PSTs when you come to it. I'm just in the process of re-doing/expanding mine. It's interesting to see how much you can teach your engine without anything else.
Author of the actively developed PSYCHO chess engine
Ras
Posts: 2488
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Testing strategies for my engines playing strength

Post by Ras »

jdart wrote: Sat Jan 30, 2021 5:17 pmbut Ruby can basically rewrite its own code at runtime and then execute it.
Because it's an interpreted language.
However, I don't think it is up to C++ in speed.
Because it's an interpreted language. 8-)
Rasmus Althoff
https://www.ct800.net
jdart
Posts: 4367
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Testing strategies for my engines playing strength

Post by jdart »

Because it's an interpreted language
It is compiled into virtual instructions run by an interpreter, but now there's a JIT compiler.
User avatar
lithander
Posts: 881
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Testing strategies for my engines playing strength

Post by lithander »

Ras wrote: Sat Jan 30, 2021 4:05 am There's a pitfall with positions transmitted as FEN: the e.p. square may be given even if there is no enemy pawn to possibly do the capture. That can trip up your repetition recognition if you don't sanitise the e.p. square.
Currently I ignore move repetition rules. This is contributing to about 25% of the games ending in draws against similar strength engines. So the feature is high on the list for v0.2 and I'll make sure to remember your advice. It sounds like something that would be really easy to miss.
mvanthoor wrote: Sat Jan 30, 2021 4:10 pm It is also good to keep in mind that, when your engine progresses, it will become _slower_ if you don't actively do something to counter that.
Just yesterday I became painfully aware of this. I had decided to only count material (very fast) while working on the search but then I noticed that by just counting material my engine missed stalemates and mates in the last ply. And when I added that "simple" detail to my eval it caused a severe performance hit. :(
And it also means all my test data generated needs to be regenerated... the path to version 0.2 is a stonier one than anticipated! ;)
mvanthoor wrote: Sat Jan 30, 2021 4:10 pm What this means is that the engine doesn't calculate PST's from scratch anymore.
So basically a board state knows it's evaluation and adapts it with every move? That's clever. I will remember that approach for when it's time to work on improving the eval!
mvanthoor wrote: Sat Jan 30, 2021 4:10 pm Also, keep in mind that the numbers will fluctuate. When you add/remove code to your program, the compiler will change the layout and how the program sits in memory.
Hmm... with C# performance is pretty deterministic between different compilations. You just have to give the JIT compiler a warmup or use AOT compiled builds. And I made sure to disable the boosting on my CPU! That's pretty important imo. Can't benchmark anything if your CPU doesn't provide a consistent performance. (So my CPU is set to a fixed clock of 4.2Ghz at all times)

Oh and thanks for providing details on how you handle UCI commands. Very interesting!
mvanthoor wrote: Sat Jan 30, 2021 4:10 pm Good luck. It's rare to see someone writing a chess engine after just getting into chess. I've been playing (but not studying) chess for 30 years. I'm a decent but not super strong player myself (1850-2000 ELO, if I put the effort in).
I'm also trying to learn to play chess. But right now I'm limited to playing on the computer and if I'm at the computer anyways I'm drawn towards programming... ;) I hope that learning chess and chess programming at the same time complements itself nicely. Since my perft numbers are correct I also feel more confident about myself knowing the rules that decide whether a move is legal. Before I implemented the move generator I had a few misconceptions about castling and didn't know about en-passent et all.
jdart wrote: Sat Jan 30, 2021 5:17 pm I have been using C++ for decades, but am a bit frustrated with it now.
Give C# a try! The language is expressive, Visual Studio is a very powerful IDE and the debugging features (edit code at runtime and continue) are just incredible. The .Net framework has grown to a (almost too large but) well rounded package of utilities so you don't have to worry about managing lots of 3rd party dependencies. And it since a few years it's cross platform, open-source and also very fast!
IanKennedy wrote: Sat Jan 30, 2021 5:36 pm It's much more fulfilling to do your own PSTs when you come to it. I'm just in the process of re-doing/expanding mine. It's interesting to see how much you can teach your engine without anything else.
Absolutely agree. I wouldn't commit changes to my codebase that I don't understand and didn't author. But before I can make an educated decision on whether to spent time on my own PSTs it's too tempting to take a (temporary) peek into the future and make a quick built with someone else's tuned tables. Just so you have a better idea of what to expect from that feature and what to aim for.
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Testing strategies for my engines playing strength

Post by mvanthoor »

lithander wrote: Mon Feb 01, 2021 6:32 pm Just yesterday I became painfully aware of this. I had decided to only count material (very fast) while working on the search but then I noticed that by just counting material my engine missed stalemates and mates in the last ply. And when I added that "simple" detail to my eval it caused a severe performance hit. :(
And it also means all my test data generated needs to be regenerated... the path to version 0.2 is a stonier one than anticipated! ;)
The evaluation is one of the functions most often called in the program. If you add stuff there, it has a huge performance impact. So, make sure you don't do any unnecessary allocations or copying in there. Ideally, the evaluation only calculates. (This is made very easy when you're using bitboards; you don't have to loop over anything.) If you can keep stuff out of the evaluation, such as recalculating the PST's, then do it.
So basically a board state knows it's evaluation and adapts it with every move? That's clever. I will remember that approach for when it's time to work on improving the eval!
Only for the PST's, because moving one piece from one square to another changes a very specific set of values. You start out by adding all the square values for each piece in the starting position, and store this; one value for white, one for black. If you now move a piece, you subtract the square value of the "from" square (for that piece) from this total, and add the square value of the "to" square. That saves you having to recalculate the value by looping over up to 16 pieces.

You can't do this with most other terms, such as the value for mobility; moving one little pawn one step can have massive implications for the mobility of other pieces. That pawn move could restrict two knight squares for your opponent, but also open up lines and diagonals for you; and close others for your opponent. The entire board changes, so you'll have to evaluate everything again.
Oh and thanks for providing details on how you handle UCI commands. Very interesting!
I hope its useful for your engine's development :)

Give C# a try! The language is expressive, Visual Studio is a very powerful IDE and the debugging features (edit code at runtime and continue) are just incredible. The .Net framework has grown to a (almost too large but) well rounded package of utilities so you don't have to worry about managing lots of 3rd party dependencies. And it since a few years it's cross platform, open-source and also very fast!
Absolutely agree. I wouldn't commit changes to my codebase that I don't understand and didn't author.
That is a very good viewpoint. I adhere to the same. There is only one exception: endgame tablebase probing code. I have no interest in writing or understanding this. There is a very well known library out there called Fathom (or Andrew Grant's fork, Pyrric), that can do this. When I need it, I'll use this library.
But before I can make an educated decision on whether to spent time on my own PSTs it's too tempting to take a (temporary) peek into the future and make a quick built with someone else's tuned tables. Just so you have a better idea of what to expect from that feature and what to aim for.
Making your own PST's by hand is hard if you can't play chess relatively well yourself; you are in essence telling the program what good squares are for your pieces in every game phase. So YOU have to know where your pieces should (in general) be. You can't teach someone if you don't know the material yourself. So in your case, I'd completely understand if you start out with simple PSQT's from others so you can understand them. (NOT tuned PST's, because those will include compensation for the lack of knowledge in the rest of the evaluation, and you won't understand the numbers. You need 'pure' PST's.)
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
lithander
Posts: 881
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Testing strategies for my engines playing strength

Post by lithander »

mvanthoor wrote: Mon Feb 01, 2021 8:47 pm Making your own PST's by hand is hard if you can't play chess relatively well yourself; you are in essence telling the program what good squares are for your pieces in every game phase. So YOU have to know where your pieces should (in general) be. You can't teach someone if you don't know the material yourself. So in your case, I'd completely understand if you start out with simple PSQT's from others so you can understand them. (NOT tuned PST's, because those will include compensation for the lack of knowledge in the rest of the evaluation, and you won't understand the numbers. You need 'pure' PST's.)
Lack of insight into what makes a good position is one of the reasons why a better eval is not on my list for my next version of Minimal Chess!

And PSTs... well, as you say, that's nothing I could just write by hand at this point. But maybe I'd try to tackle the problem by statistically evaluating a bunch of PNGs from quality games. If I can't write my own PSTs I might be content with writing code that writes PSTs for me. As long as I understand and authored the generating code I'm not going to feel too guilty about enjoying the fruit of my computer's labor! ;)

But that's dreaming about the far future... by then I might even have learned to play some better chess myself! ;)
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Testing strategies for my engines playing strength

Post by mvanthoor »

lithander wrote: Mon Feb 01, 2021 9:28 pm But that's dreaming about the far future... by then I might even have learned to play some better chess myself! ;)
Well... it's epic to author your own PST's completely by hand.

It just so happens that my engine, without a hash table on my computer, thinks about as far ahead as I do in the middle game (7-8 plies). Because the engine's depth is about the same as my own, and the PST's are written by me, it feels as if I'm playing against a dumbed-down version of myself because the engine doesn't yet know things such as the bishop pair, rooks on open files, etc... so it has lees knowledge than I do.
I can actually trick the engine into playing certain moves (or trick it into ignoring threads) because I can very often predict what it's going to do.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL