Devlog of Leorik
Moderator: Ras
-
lithander
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Devlog of Leorik
I just read my post again...
I thought the generic "you" could be misunderstood in a way where it sounds like I'm addressing someone specific. Didn't want anyone to feel offended so I decided to replace it with something *clearly* generic: people. But I forgot to delete the "you" and now I've made it sound like I'm an alien or robot or something.
English is hard!
-
lithander
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Devlog of Leorik
I managed to catch up in terms of playing strength with Leorik 2.2 (the last released version) a few days ago. But of course whenever you reach a milestone after 5 seconds of joy you set the goal higher: There's also unreleased version Leorik 2.2.7 ... the strongest version I have with a few fixes and improvements over 2.2 but still based on the Zurichess data.
Could my new dataset relying only on pure selfplay and a great deal of randomization and labeled only based on the games outcome hope to outperform one that is widely accepted as high quality and using Stockfish playouts on each position for labeling? I thought it wouldn't be hopeless because my data would be tailored to Leoriks capabilites. I don't care if Stockfish can win with a certain position if my much simpler engine can't.
Been fiddling around for a few days and the test I ran last night seems to indicate I might have succeeded!!
But when I looked at a few games I noticed the playing style is nothing alike old Leorik or even the previous self-trained versions. What I did with this one (and it was just a stupid experiment I didn't expect to lead anywhere) is that I ignored all the drawn games when compiling my EPDs. That means every position in there is either labeled as winning or losing. The tuner had problems to converge, the MSE=0,42024 is terrible compared with the MSE = 0,247370 on the Zurichess dataset. And still it works in actual play. To me some moves looked very bold, so I used Stefan Pohl's EAS tool to compute the versions EAS score:
I also compared it with a bunch of other engines at my level...
...and not only did the new version really good Elo-wise it also plays short, high risk matches with a lot of sacrifices and I think that is something humans like to watch in engine matches. So while I'm not quite ready to publish this as version 2.3 I'm going to try my best to keep this new, aggressive playstyle!
Could my new dataset relying only on pure selfplay and a great deal of randomization and labeled only based on the games outcome hope to outperform one that is widely accepted as high quality and using Stockfish playouts on each position for labeling? I thought it wouldn't be hopeless because my data would be tailored to Leoriks capabilites. I don't care if Stockfish can win with a certain position if my much simpler engine can't.
Been fiddling around for a few days and the test I ran last night seems to indicate I might have succeeded!!
Code: Select all
Score of Leorik-2.2.8theta vs Leorik 2.2.7: 3018 - 2932 - 2424 [0.505] 8374
... Leorik-2.2.8theta playing White: 1711 - 1296 - 1181 [0.550] 4188
... Leorik-2.2.8theta playing Black: 1307 - 1636 - 1243 [0.461] 4186
... White vs Black: 3347 - 2603 - 2424 [0.544] 8374
Elo difference: 3.6 +/- 6.3, LOS: 86.8 %, DrawRatio: 28.9 %Code: Select all
Rank EAS-Score sacs shorts draws moves Engine/player
-------------------------------------------------------------------
1 112519 13.71% 17.29% 13.33% 71 Leorik-2.2.8theta
2 62636 06.43% 12.54% 20.43% 76 Leorik 2.2.7 Code: Select all
Rank EAS-Score sacs shorts draws moves Engine/player
-------------------------------------------------------------------
1 169571 16.88% 30.78% 13.76% 68 Leorik-2.2.8theta
2 86431 09.35% 04.67% 10.68% 98 zahak-5.0
3 66035 08.41% 11.21% 16.96% 83 Inanis-1.1.1
4 51044 09.73% 06.19% 20.00% 85 odonata-0.6.2
5 48340 07.56% 11.76% 24.18% 82 dumb-1.9
6 42417 09.52% 12.38% 29.58% 87 Supernova-2.4
7 38761 04.76% 08.33% 22.68% 86 blunder-8.5.5 -
Mike Sherwin
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Devlog of Leorik
I hope Leorik is not vengeful. I might have to go into hiding. Tell him he can have his undead crown back. I only took it to give it a nice polish for him

-
lithander
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Devlog of Leorik
I've used the 2.2.8theta version to create more selfplay games. Since then I have tried to create a stronger version. I did a few A-B tests to figure out what tuning parameters work best but despite considerable effort and thousands of matches not for data generation but to get a solid estimate on their strength my newly created versions are not a clear step up from 2.2.8theta.
Mike who played a copy of 2.2.8theta had nice things to say about how it played.
Thanks for the motivating posts, everybody!
(I'll post again when the branch is merged and the builds are up on github.)
Mike who played a copy of 2.2.8theta had nice things to say about how it played.
Given his verdict and that the version with this set of weights is stronger than my strongest build from the master branch I think I can declare this experiment a success. I will honor 2.2.8theta for being the first version that I, with tuning alone, couldn't find a clear improvement on and instead of publishing any of the contenders that are roughly in the same ballpark strenghtwise I will go and release Leorik 2.3 with the original 2.2.8theta weights.Bottom line is this Leorik gave me nothing but problems to solve. This Leorik is far more anti human than any of the others. The moves also looked like the moves a grandmaster would play. The difference is astonishing. Congrats!
Thanks for the motivating posts, everybody!
I wasn't planning it at that time but now there is going to be one. And hopefully not the last update ever - but even if it should be: With the eval being completely derived from selfplay I feel like Leorik is a well rounded package now that I wouldn't have to feel regrets about "abandoning".
(I'll post again when the branch is merged and the builds are up on github.)
-
Mike Sherwin
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Devlog of Leorik
Even if you have satisfied all your goals with Leorik and decide to move on to something else there is one more capability I like to see added. The ability to recognise ECO positions and load piece square tables trained on that position. You might only include a relatively few early positions with Leorik but give the user the ability to add/train additional positions. 
-
lithander
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Devlog of Leorik
Isn't a PSQT per ECO position overkill? If I recognize a individual position I could just play the known best move. This is basically just an opening book. Or do you mean that when a game started with a certain opening the entire game is now played on a set of PSQTs optimized for that opening? That could work but sometimes an engine is asked to evaluate a position with no history of moves. In that case I wouldn't know which PSQT to pick. It would be nice to have some more generic way of describing a position on different axes (e.g. open-closed) and use interpolation again like with the 'phase' value that is used to interpolate between the midgame and endgame tables. (Tapered Eval)Mike Sherwin wrote: ↑Tue Dec 20, 2022 3:10 pm Even if you have satisfied all your goals with Leorik and decide to move on to something else there is one more capability I like to see added. The ability to recognise ECO positions and load piece square tables trained on that position. You might only include a relatively few early positions with Leorik but give the user the ability to add/train additional positions.![]()
-
Mike Sherwin
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Devlog of Leorik
I mean train starting from the most popular 2 ply positions like e4 e5, e4 c5, e4 c6 ect. In different starting pawn structures Leorik will do better if it has pstbl's optimised for them. A simple test would be to differentiate between only 1. e4 and 1. d4. That means only three tables, 1 generic like you have now, 2 e4 and 3 d4.lithander wrote: ↑Wed Dec 21, 2022 12:01 pmIsn't a PSQT per ECO position overkill? If I recognize a individual position I could just play the known best move. This is basically just an opening book. Or do you mean that when a game started with a certain opening the entire game is now played on a set of PSQTs optimized for that opening? That could work but sometimes an engine is asked to evaluate a position with no history of moves. In that case I wouldn't know which PSQT to pick. It would be nice to have some more generic way of describing a position on different axes (e.g. open-closed) and use interpolation again like with the 'phase' value that is used to interpolate between the midgame and endgame tables. (Tapered Eval)Mike Sherwin wrote: ↑Tue Dec 20, 2022 3:10 pm Even if you have satisfied all your goals with Leorik and decide to move on to something else there is one more capability I like to see added. The ability to recognise ECO positions and load piece square tables trained on that position. You might only include a relatively few early positions with Leorik but give the user the ability to add/train additional positions.![]()
Then for the user give them the ability to train from any starting position, start a game from that position and use that pstbl.
Sorry if I'm not being very clear. I'm not feeling well at the moment.
-
lithander
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Devlog of Leorik - *New* Version 2.3
I finally released Version 2.3!
Here is a gauntlet I ran with a few engines of similar strength. All Elo values except Leorik's are fixed.
The EAS tool shows some favorable stats for Leorik. Short games, a healthy amount of sacrifices and only few bad draws.
If you have not read my previous posts the small strength increase over version 2.2 may be a disappointment.
But the goal of this new version was to rebuild the evaluation from scratch, no longer relying on any 3rd party dataset or engine for labeling. I purged all knowledge borrowed from Zurichess and Stockfish, and that there still is a strength increase at all is more than I expected!
Also this bodes well for the future: Being able to train my weights on selfplay games means I can effortlessly create larger datasets so that in future versions the evaluation can be extended to pick up on rarer and rarer features. (e.g. in the way that Mike suggested)
For the human players I want to encourage you to use the new UCI options Midgame Randomness and Endgame Randomness that force the engine to assign a random cp bonus to each root move while retaining it's usual speed and search depth. I originally added it for data generation but it's also a really nice way to adjust the engines difficulty level in a way that feels somewhat natural. (to me^^)
Here is a gauntlet I ran with a few engines of similar strength. All Elo values except Leorik's are fixed.
Code: Select all
# PLAYER : RATING POINTS PLAYED (%)
1 Inanis-1.1.1 : 2767.0 323.5 620 52
2 odonata-0.6.2 : 2744.0 298.5 618 48
3 Leorik-2.3 : 2741.3 1960.5 3716 53
4 zahak-5.0 : 2730.0 295.5 620 48
5 dumb-1.9 : 2703.0 325.0 620 52
6 blunder-8.5.5 : 2700.0 255.0 620 41
7 Supernova-2.4 : 2687.0 258.0 618 42Code: Select all
Rank EAS-Score sacs shorts draws moves Engine/player
-------------------------------------------------------------------
1 132596 15.02% 22.27% 13.46% 68 Leorik-2.3
2 57194 11.00% 04.00% 18.79% 85 zahak-5.0
3 52975 06.99% 05.38% 17.33% 83 odonata-0.6.2
4 50228 09.28% 13.50% 27.03% 76 dumb-1.9
5 45085 06.67% 10.00% 24.06% 84 Supernova-2.4
6 44044 04.19% 10.23% 22.95% 78 Inanis-1.1.1
7 34561 06.85% 09.59% 26.24% 81 blunder-8.5.5 But the goal of this new version was to rebuild the evaluation from scratch, no longer relying on any 3rd party dataset or engine for labeling. I purged all knowledge borrowed from Zurichess and Stockfish, and that there still is a strength increase at all is more than I expected!
Also this bodes well for the future: Being able to train my weights on selfplay games means I can effortlessly create larger datasets so that in future versions the evaluation can be extended to pick up on rarer and rarer features. (e.g. in the way that Mike suggested)
For the human players I want to encourage you to use the new UCI options Midgame Randomness and Endgame Randomness that force the engine to assign a random cp bonus to each root move while retaining it's usual speed and search depth. I originally added it for data generation but it's also a really nice way to adjust the engines difficulty level in a way that feels somewhat natural. (to me^^)
-
Mike Sherwin
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Devlog of Leorik
Congrats Thomas! I look forward to see how it does at the rating agencies. Be sure to post some interesting sacrificial games.
Happy Holidays.

Happy Holidays.
-
Mike Sherwin
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Devlog of Leorik
I read that Leorik's eval is sort of like a tiny NN (a poor man's NN). A full size NN also incorporates the square the kings are on. This may be done in Leorik by using 15x15 piece square tables where the king is always in the center. Then the pst is still used as an 8x8 table by pointing to the cell in the 15x15 table that places the kings on the 8x8 table at its current square. Jonathan tried this idea in Winter but only got positive result for the bishop table. Still it was worth +20 elo!