Devlog of Leorik

lithander · Post by **lithander** » Mon May 09, 2022 4:20 pm

Mike Sherwin wrote: ↑Sun May 08, 2022 6:01 pm
OliThink 5.10.1
Complete rewrite of OliThink with a very fast move generator (based on OliPerft). Engine runs without any chess information. No pawnstructure, no tables etc.
ELO about 2950. 128 MB fix Hashsize.
2848 CCRL 40/15

As far as I can see at a glance OliThink takes a very interesting and unique approach for it's evaluation but "without any chess information" is too bold a claim. The evaluation is mobility based but has a lot of "special" rules and a good bunch of formulas with literal numbers (values defined in code instead of tables) and it also considers the phase of the game and pawn properties such as passed pawns and the squares around the kings get a special treatment, too. What is all that if not chess knowledge?

dangi12012 · Post by **dangi12012** » Mon May 09, 2022 9:18 pm

lithander wrote: ↑Mon May 09, 2022 4:20 pm
Mike Sherwin wrote: ↑Sun May 08, 2022 6:01 pm
OliThink 5.10.1
Complete rewrite of OliThink with a very fast move generator (based on OliPerft). Engine runs without any chess information. No pawnstructure, no tables etc.
ELO about 2950. 128 MB fix Hashsize.
2848 CCRL 40/15
As far as I can see at a glance OliThink takes a very interesting and unique approach for it's evaluation but "without any chess information" is too bold a claim. The evaluation is mobility based but has a lot of "special" rules and a good bunch of formulas with literal numbers (values defined in code instead of tables) and it also considers the phase of the game and pawn properties such as passed pawns and the squares around the kings get a special treatment, too. What is all that if not chess knowledge?

You can let the predecessor of Alphazero (giraffe) generate a very good level static evaluation network. The code is there with a howto:
https://github.com/ianfab/Giraffe
All you would need to do is implement a simple feedforward function in C#.

I like the above repository a lot because the person actually created something completely new - went on into the alphazero team and from that the groundwork of NNUE emerged.

So maybe even ignore above recommendation and venture out into the unknown with new ideas of how to do evaluation. It is what brings chessprogramming forward after all.

Mike Sherwin · Post by **Mike Sherwin** » Tue May 10, 2022 12:51 am

lithander wrote: ↑Mon May 09, 2022 4:20 pm
Mike Sherwin wrote: ↑Sun May 08, 2022 6:01 pm
OliThink 5.10.1
Complete rewrite of OliThink with a very fast move generator (based on OliPerft). Engine runs without any chess information. No pawnstructure, no tables etc.
ELO about 2950. 128 MB fix Hashsize.
2848 CCRL 40/15
As far as I can see at a glance OliThink takes a very interesting and unique approach for it's evaluation but "without any chess information" is too bold a claim. The evaluation is mobility based but has a lot of "special" rules and a good bunch of formulas with literal numbers (values defined in code instead of tables) and it also considers the phase of the game and pawn properties such as passed pawns and the squares around the kings get a special treatment, too. What is all that if not chess knowledge?

However it is minimal and is that not what you are looking for? Fruit's big jump was because of adding mobility and for a short time Fruit2.21 was on top.

lithander · Post by **lithander** » Tue May 10, 2022 11:23 am

dangi12012 wrote: ↑Mon May 09, 2022 9:18 pm You can let the predecessor of Alphazero (giraffe) generate a very good level static evaluation network. The code is there with a howto:
https://github.com/ianfab/Giraffe
All you would need to do is implement a simple feedforward function in C#.

I like the above repository a lot because the person actually created something completely new - went on into the alphazero team and from that the groundwork of NNUE emerged.

So maybe even ignore above recommendation and venture out into the unknown with new ideas of how to do evaluation. It is what brings chessprogramming forward after all.

Mike Sherwin wrote: ↑Tue May 10, 2022 12:51 am However it is minimal and is that not what you are looking for? Fruit's big jump was because of adding mobility and for a short time Fruit2.21 was on top.

Actually, MinimalChess had a mobility component in it's evaluation and it was very slow but still added something like 60 Elo. And every strong engine these day uses NNUE but I don't want to make such a big jump right away. For decades it was all about hand crafted evaluations and I want to learn about these techniques a little more. But I plan to use a my tuner to figure out a set of coefficients that go well together instead of trying to parameterize a bunch of little formulas manually.

I thought I had a great idea of what features I should generate and auto-tune. The whole king-phase/king-safety thing. Turns out the idea wasn't that great and I have now decided to not merge the branch back into master.

But I still think the logical next step is to think about extracting other features of a position that will go well together with PSTs and tune coefficients for them and extend the evaluation that way. I take the fact that even OliThink couldn't ignore some pawn-structure properties in its otherwise mobility-centric evaluaiton as a hint to focus on that area next. I was just too lazy/bussy to really work on it recently!

lithander · Post by **lithander** » Tue May 10, 2022 11:31 am

Oh, and Leorik is currently playing quite well in Graham's 93rd Amateur Series Division 8. At the time of writing it has the top spot. I have to say that's quite an incentive to pick up the pace so that there's a new version ready when it get's promoted to the next Division.

lithander · Post by **lithander** » Thu May 19, 2022 12:30 am

Leorik won Graham's 93rd Amateur Series Division 8 and so in about 6 weeks it's going to get pitted against much stronger opponents (up to 2700 Elo CCRL) in Division 7.

So I have dedicated some development time to Leorik again and last night finally saw some improvement.

Code: Select all

   # PLAYER           :  RATING  POINTS  PLAYED   (%)
   1 Leorik 2.0.13    :  2610.1  4132.5    7065    58
   2 Leorik 2.0.2     :  2550.0  2932.5    7065    42

The version numbers might imply that there are a lot of small improvements between the two versions. But that's not the really the case because when I shelved the "king_tables" branch and instead focused on pawn structure I didn't "reuse" the spent version numbers to avoid confusion in my notes and logs. And for the same reason I often increment the version even for extremely minor changes or throw-away experiments. So the only difference between 2.0.2 and 2.0.13 is that I have added eval terms for passed and isolated pawns.

I started with code that would identify doubled pawns, passed pawns, isolated pawns, connected and protected pawns etc in a given BoardState. Quite similar to the normal bitboards representing pieces these feature-detectors would just return a 64 bit integer. And then, instead of tuning coefficients for just the material PSTs I gave the tuner additional bitboards with these features and it gave me back more coefficients. So I ended up with PSTs that can be used to look up the value of e.g. passed pawns on a specific square at a specific phase of the game.

I tried to just use these tables directly and got mixed results. The performance suffered a lot and only when I artificialy slowed down the original version the new one was stronger. And even though the tuner operating on all the new features ended up with a MSE of only 0.239 instead of 0.246 looking at the tables made me suspicious that I was probably overfitting my evaluation to the dataset. The values per square looked quite noisy.

Code: Select all

ConnectedOrProtected - MG
    0,    0,    0,    0,    0,    0,    0,    0,
   23,  -13,    0,   -1,   57,    6,   -9,   -9,
   -6,   23,   27,    9,   62,   30,   39,  -10,
   -6,    2,    3,   12,   26,   38,   12,   32,
    8,    9,   13,   12,   18,   19,   27,   27,
   -4,   17,   16,   11,   12,   15,   38,   10,
    6,    3,   11,    1,    0,   11,    8,   -6,
    0,    0,    0,    0,    0,    0,    0,    0,

ConnectedOrProtected - EG
    0,    0,    0,    0,    0,    0,    0,    0,
    1,   -7,  -22,    2,   37,    6,  -22,  -24,
   28,   -8,    1,   35,  -18,   19,  -28,   37,
   22,   16,   16,    5,   -3,  -37,   -3,  -32,
    3,    5,   -4,    5,  -10,   -9,  -20,  -23,
   11,  -20,   -4,    2,   -1,  -13,  -39,   -8,
  -18,    2,  -12,    5,    6,  -25,   -1,  -14,
    0,    0,    0,    0,    0,    0,    0,    0,

DoubledPawns - MG
    0,    0,    0,    0,    0,    0,    0,    0,
    0,    0,    0,    0,    0,    0,    0,    0,
  -32,  -23,   32,  -41,  -20,  -36,  -31,    1,
  -15,   -6,   -9,   -6,  -34,  -13,  -15,   -9,
   -1,   30,    6,   13,  -10,   22,    6,  -12,
    0,   10,   -3,    0,    8,   -1,   26,    3,
    1,   26,   -3,   -2,    6,    2,   15,    7,
    0,    0,    0,    0,    0,    0,    0,    0,

DoubledPawns - EG
    0,    0,    0,    0,    0,    0,    0,    0,
    0,    0,    0,    0,    0,    0,    0,    0,
  -24,  -16,   20,  -22,  -15,  -30,  -21,   -1,
   -9,  -25,   -8,   -1,   10,  -28,   -9,  -39,
  -26,  -45,  -37,  -33,    1,  -52,  -29,  -12,
  -22,  -14,  -11,   -8,  -14,  -12,  -43,  -31,
  -32,  -50,   -9,  -20,  -34,  -16,  -39,  -35,
    0,    0,    0,    0,    0,    0,    0,    0,

PassedPawns - MG
    0,    0,    0,    0,    0,    0,    0,    0,
   37,   54,   18,   40,   31,   61,    8,  -15,
   52,   23,   23,   15,  -18,   13,  -30,  -20,
   10,   10,    4,    1,    1,   27,  -28,  -16,
   15,   -5,  -10,  -11,  -15,   -1,   -2,   21,
    2,   -3,   -9,  -28,    8,   36,   30,   33,
    5,   13,   10,    0,    9,   18,   32,   -9,
    0,    0,    0,    0,    0,    0,    0,    0,

PassedPawns - EG
    0,    0,    0,    0,    0,    0,    0,    0,
   66,   37,   71,   26,   37,   -5,   78,  109,
   72,  106,   72,   70,   75,   69,  122,  127,
   60,   54,   48,   40,   28,    6,   89,   73,
   16,   38,   33,   31,   36,   18,   36,    5,
    6,   17,   20,   44,   -4,  -36,  -16,  -29,
   -4,   -6,   -9,   13,  -10,  -17,  -28,   12,
    0,    0,    0,    0,    0,    0,    0,    0,

IsolatedPawns - MG
    0,    0,    0,    0,    0,    0,    0,    0,
  -29,  -46,    0,  -13,  -33,  -31,  -48,  -37,
    6,    9,  -18,    8,   13,  -22,   18,   -7,
    6,  -14,   -9,  -25,  -20,    6,   13,   11,
   -2,  -12,   -7,  -17,  -13,  -11,    3,    8,
  -17,   -8,  -15,  -10,  -11,   -6,   -5,   -9,
   -5,  -10,   18,  -24,  -25,   -2,  -22,  -17,
    0,    0,    0,    0,    0,    0,    0,    0,

IsolatedPawns - EG
    0,    0,    0,    0,    0,    0,    0,    0,
   -2,   33,  -24,   11,   33,   51,   42,   34,
  -25,  -38,   25,   -6,   -7,   42,  -29,    7,
  -11,    2,    2,   30,   26,  -19,  -29,  -21,
   10,   13,    7,   18,    5,   14,  -13,   -8,
   25,   -1,   14,    3,    2,    3,    2,   12,
   -3,    1,  -44,   16,   15,   -8,   16,   17,
    0,    0,    0,    0,    0,    0,    0,    0,

ConnectedPassedPawns - MG
    0,    0,    0,    0,    0,    0,    0,    0,
   26,   56,   34,    7,   18,   24,   39,   31,
   14,   31,   22,   -4,   17,    3,   41,   58,
   24,   22,    6,   25,   37,   17,   11,   12,
    5,  -21,   -6,  -10,    3,   23,   -8,    3,
   25,   -2,   17,    9,   12,  -11,   -1,   -5,
    7,    0,   16,   -5,    6,    9,    1,   22,
    0,    0,    0,    0,    0,    0,    0,    0,

ConnectedPassedPawns - EG
    0,    0,    0,    0,    0,    0,    0,    0,
   17,   48,   43,   15,   17,   21,   34,   26,
   -4,   -4,   14,    7,   22,    2,   25,   42,
  -23,   -5,   -2,   -4,  -19,   13,   17,    0,
  -10,   27,   21,   19,   17,   -1,   11,   16,
  -36,    9,   -4,  -12,  -19,    6,  -15,   -2,
  -13,   -4,  -34,  -12,   14,  -16,   -9,  -27,
    0,    0,    0,    0,    0,    0,    0,    0,

So I took these tables not literally but as guidance for writing a handcrafted evaluation for passed and isolated pawns. Simple formulas with just 3 coefficients. I didn't bother with thinking about how to auto-tune those. Instead I just tried different configurations and finally picked the one with which the tuner got the smallest MSE after a few hundred iterations of tuning. Compared with material-only evaluation I now had a MSE of 0.2417 instead of 0.2460...doesn't sound like a lot but I was very hopeful.

But to have the new version actually benefit from this potentially better evaluation I needed to add this additional term without sacrificing speed.
The good thing is that this new term only depends on the Pawns. You only have to update it when there's a change to any of the pawns on the board. And even in that case it's likely that you've seen the exact same configuration before. So what I needed was another hash table. (not my idea, the approach was mentioned on the forum repeatedly) But instead of maintaining a 2nd zobrist hash just for pawns (I'm still using copy/make and the less I have to copy the faster that is) I resolved to just using the pawns-bitboard directly as key for the hashing. When using a prime-sized hash-table (important!) the average hit-rate was over 90% and that's entirely fine I think.

And so in the end the performance impact of the improved evaluation was barely measurable and I ended up with the above mentioned +60 Elo!

I guess that seems like an overly complicated process but I'm just not good enough a chess player to know the rules for valueing passed or isolated pawns intuitively. So if I want to avoid looking at other engines source code for inspiration I need to "extract" all these information from data. And I like the idea that by constantly reinventing the wheel Leorik may end up with a few novelties, unique or unusual implementation details, that justify it's existence as yet-another mediocre chess engine!

lithander · Post by **lithander** » Tue May 31, 2022 2:06 am

I've just released a "minor" new version that adds a pawn structure term to the evaluation.
https://github.com/lithander/Leorik/releases/tag/2.1

The pawn structure evaluation (including pawn hash table) turned out surprisingly simple yet effective! Or at least it feels like it's working well... I'm always struggling to judge when a feature is done enough that it's time to move on. I wish there was a an easier way to asses how much the current implementation exhausts the theoretical potential. So (@all) what's your experience with pawn structure eval terms? How much Elo did you gain from adding it in your engines?

Mike Sherwin wrote: ↑Sun Apr 17, 2022 11:20 pm This is probably the last version of Leorik that I'll be able to win against.

Leorik 2.1 is only about 50 Elo stronger so maybe you can still win against it?

algerbrex · Post by **algerbrex** » Tue May 31, 2022 5:25 am

lithander wrote: ↑Tue May 31, 2022 2:06 am I've just released a "minor" new version that adds a pawn structure term to the evaluation.
https://github.com/lithander/Leorik/releases/tag/2.1

The pawn structure evaluation (including pawn hash table) turned out surprisingly simple yet effective! Or at least it feels like it's working well... I'm always struggling to judge when a feature is done enough that it's time to move on. I wish there was a an easier way to asses how much the current implementation exhausts the theoretical potential. So (@all) what's your experience with pawn structure eval terms? How much Elo did you gain from adding it in your engines?

Mike Sherwin wrote: ↑Sun Apr 17, 2022 11:20 pm This is probably the last version of Leorik that I'll be able to win against.
Leorik 2.1 is only about 50 Elo stronger so maybe you can still win against it?

You've made very impressive progress these last couple of months Thomas, congratulations!

I've recently found the motivation to begin improving Blunder again, from the ground up, and I'm trying a couple of new ideas I've never considered before (most notably a copy/make approach) in the hope of gaining simplicity and/or strength. So hopefully I'll be using Leorik more as a sparing partner in the coming days

lithander · Post by **lithander** » Tue May 31, 2022 10:13 am

algerbrex wrote: ↑Tue May 31, 2022 5:25 am You've made very impressive progress these last couple of months Thomas, congratulations!

I've recently found the motivation to begin improving Blunder again, from the ground up, and I'm trying a couple of new ideas I've never considered before (most notably a copy/make approach) in the hope of gaining simplicity and/or strength. So hopefully I'll be using Leorik more as a sparing partner in the coming days

Good to see you back in the game, Christian! And great news that you plan on developing Blunder further! Perfect timing for me as Leorik 2.1 and Blunder 7.6 seem to be about equal in strength now.

Code: Select all

tc=5+0.5
Score of Leorik-2.1 vs blunder-7.6.0: 260 - 293 - 243  [0.479] 796
...      Leorik-2.1 playing White: 148 - 130 - 120  [0.523] 398
...      Leorik-2.1 playing Black: 112 - 163 - 123  [0.436] 398
...      White vs Black: 311 - 242 - 243  [0.543] 796
Elo difference: -14.4 +/- 20.1, LOS: 8.0 %, DrawRatio: 30.5 %

Sadly the matchup didn't conclude, though. Blunder disconnected/crashed once on drawn position. Let me know if you want more details but I don't have a full log. Just the PGN of the game and it will probably not reproduce the bug.

I'm curious how copy/make will turn out for you. It feels like it's especially effective for a simple and fast engine like Leorik was up to version 2.0... but for a better evaluation it would be helpful (or even necessary) to be able to store additional information besides the bitboards and update that incrementally. But that risks making the BoardState that needs to be copied bigger and bigger so the cost of copying compared to unmake will probably become less favorable the more stuff I add. In the present version it's still fine (For the pawn eval I just have to copy two more 16 bit integers compared to version 2.0) but it's motivated me to not use a dedicated pawn hash for example.

So I think this design choice is making Leorik maybe a more unique/interesting engine and maybe also a simpler one but it comes with it's own set of challenges and might hurt the strength in the end, even.

algerbrex · Post by **algerbrex** » Tue May 31, 2022 4:00 pm

lithander wrote: ↑Tue May 31, 2022 10:13 am Good to see you back in the game, Christian! And great news that you plan on developing Blunder further! Perfect timing for me as Leorik 2.1 and Blunder 7.6 seem to be about equal in strength now.
Code: Select all
tc=5+0.5
Score of Leorik-2.1 vs blunder-7.6.0: 260 - 293 - 243  [0.479] 796
...      Leorik-2.1 playing White: 148 - 130 - 120  [0.523] 398
...      Leorik-2.1 playing Black: 112 - 163 - 123  [0.436] 398
...      White vs Black: 311 - 242 - 243  [0.543] 796
Elo difference: -14.4 +/- 20.1, LOS: 8.0 %, DrawRatio: 30.5 %
Sadly the matchup didn't conclude, though. Blunder disconnected/crashed once on drawn position. Let me know if you want more details but I don't have a full log. Just the PGN of the game and it will probably not reproduce the bug.

Hmmm, strange. I haven't heard about Blunder crashing in a while. Sure if you don't mind, you can PM me the PGN. Maybe I'll be able to work something out with it.

lithander wrote: ↑Tue May 31, 2022 10:13 am I'm curious how copy/make will turn out for you.

Perft-wise is where I've been doing most of the comparisons, and they seem about equal. I've run perft(7) from the start position several times with the new copy/make code and the original make/unmake code, and after optimizing the former a good bit, they're now about equal. I'd still need to do some more testing to see if copy/make is still a little worse or not, but right now things look pretty promising.

By the way, I couldn't quite figure out how to run perft for Leorik from the command line, as I was interested in doing some comparisons. Is this possible, or would I need to re-compile to enable perft testing?

lithander wrote: ↑Tue May 31, 2022 10:13 am It feels like it's especially effective for a simple and fast engine like Leorik was up to version 2.0... but for a better evaluation it would be helpful (or even necessary) to be able to store additional information besides the bitboards and update that incrementally. But that risks making the BoardState that needs to be copied bigger and bigger so the cost of copying compared to unmake will probably become less favorable the more stuff I add. In the present version it's still fine (For the pawn eval I just have to copy two more 16 bit integers compared to version 2.0) but it's motivated me to not use a dedicated pawn hash for example.

Right, right. That's definitely a fair concern, and it might be the case that I'll be switching between copy/make and make/unmake for a while before I'm satisfied with either approach.

With that said, for some reason I've never seen a significant speed difference in Blunder when I switched to incrementally updating evaluation stuff like material balance, or piece-square table values. Even in Blunder 7.6.0, I still calculate all of that from scratch. I remember once I tried calculating stuff incrementally, and I didn't really find any sort of Elo gain from it. So I scrapped the code.

I do worry about adding extra information though like a Pawn hash, as copying positions is already pretty expensive. I don't see a good solution to this right now, besides just testing. So it may be before Blunder 8.0.0 is released that it'll switch back to make/unmake.

lithander wrote: ↑Tue May 31, 2022 10:13 am So I think this design choice is making Leorik maybe a more unique/interesting engine and maybe also a simpler one but it comes with it's own set of challenges and might hurt the strength in the end, even.

I did some profiling last night of my copy/make code, and one of the biggest bottlenecks was my `GetPiece` function, which basically loops through each bitboard to determine what type of piece is sitting on a given square. I optimized down more thanks to some research, but I'm still working on trying to develop a branch-less solution, which no doubt will rely on some bit-manipulation wizardry.

RIght now, I think the biggest reason I'm working from the ground up in building Blunder 8.0.0 is to be thorough with my testing. I can admit there are some features in Blunder right now that have way to little testing to justify them being there, but they're there anyway. And that doesn't really sit well with me, since I feel I'm missing out on Elo by not doing my due diligence.

I think this was mostly due to me being in college for the first time. Between trying to keep up in Calculus III, doing track, and trying to have a social life, Blunder's development was on the backburner. But I still wanted to keep making progress. So sometimes if a feature look like it worked through some quick testing, I'd add it in.

So this time around, for each major feature I add beyond the basics (killer moves, TT, check-extension, etc.), I'm going to require a minimum of test self-play test games before it gets added back into the codebase. I'd basically like to have a test result I can attach to the commit of each feature I add back in.

Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik - New Version 2.1

Re: Devlog of Leorik - New Version 2.1

Re: Devlog of Leorik - New Version 2.1

Re: Devlog of Leorik - New Version 2.1