Devlog of Leorik

lithander · Post by **lithander** » Sun Dec 11, 2022 8:37 pm

I just read my post again...

lithander wrote: ↑Sat Dec 10, 2022 2:35 pm You know... how you people settle on an ideology

I thought the generic "you" could be misunderstood in a way where it sounds like I'm addressing someone specific. Didn't want anyone to feel offended so I decided to replace it with something *clearly* generic: people. But I forgot to delete the "you" and now I've made it sound like I'm an alien or robot or something.

English is hard!

lithander · Post by **lithander** » Wed Dec 14, 2022 11:29 am

I managed to catch up in terms of playing strength with Leorik 2.2 (the last released version) a few days ago. But of course whenever you reach a milestone after 5 seconds of joy you set the goal higher: There's also unreleased version Leorik 2.2.7 ... the strongest version I have with a few fixes and improvements over 2.2 but still based on the Zurichess data.

Could my new dataset relying only on pure selfplay and a great deal of randomization and labeled only based on the games outcome hope to outperform one that is widely accepted as high quality and using Stockfish playouts on each position for labeling? I thought it wouldn't be hopeless because my data would be tailored to Leoriks capabilites. I don't care if Stockfish can win with a certain position if my much simpler engine can't.

Been fiddling around for a few days and the test I ran last night seems to indicate I might have succeeded!!

Code: Select all

Score of Leorik-2.2.8theta vs Leorik 2.2.7: 3018 - 2932 - 2424  [0.505] 8374
...      Leorik-2.2.8theta playing White: 1711 - 1296 - 1181  [0.550] 4188
...      Leorik-2.2.8theta playing Black: 1307 - 1636 - 1243  [0.461] 4186
...      White vs Black: 3347 - 2603 - 2424  [0.544] 8374
Elo difference: 3.6 +/- 6.3, LOS: 86.8 %, DrawRatio: 28.9 %

But when I looked at a few games I noticed the playing style is nothing alike old Leorik or even the previous self-trained versions. What I did with this one (and it was just a stupid experiment I didn't expect to lead anywhere) is that I ignored all the drawn games when compiling my EPDs. That means every position in there is either labeled as winning or losing. The tuner had problems to converge, the MSE=0,42024 is terrible compared with the MSE = 0,247370 on the Zurichess dataset. And still it works in actual play. To me some moves looked very bold, so I used Stefan Pohl's EAS tool to compute the versions EAS score:

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    112519  13.71%  17.29%  13.33%   71   Leorik-2.2.8theta  
   2     62636  06.43%  12.54%  20.43%   76   Leorik 2.2.7

I also compared it with a bunch of other engines at my level...

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    169571  16.88%  30.78%  13.76%   68   Leorik-2.2.8theta  
   2     86431  09.35%  04.67%  10.68%   98   zahak-5.0  
   3     66035  08.41%  11.21%  16.96%   83   Inanis-1.1.1  
   4     51044  09.73%  06.19%  20.00%   85   odonata-0.6.2  
   5     48340  07.56%  11.76%  24.18%   82   dumb-1.9  
   6     42417  09.52%  12.38%  29.58%   87   Supernova-2.4  
   7     38761  04.76%  08.33%  22.68%   86   blunder-8.5.5

...and not only did the new version really good Elo-wise it also plays short, high risk matches with a lot of sacrifices and I think that is something humans like to watch in engine matches. So while I'm not quite ready to publish this as version 2.3 I'm going to try my best to keep this new, aggressive playstyle!

Mike Sherwin · Post by **Mike Sherwin** » Wed Dec 14, 2022 3:39 pm

I hope Leorik is not vengeful. I might have to go into hiding. Tell him he can have his undead crown back. I only took it to give it a nice polish for him

lithander · Post by **lithander** » Tue Dec 20, 2022 1:24 pm

I've used the 2.2.8theta version to create more selfplay games. Since then I have tried to create a stronger version. I did a few A-B tests to figure out what tuning parameters work best but despite considerable effort and thousands of matches not for data generation but to get a solid estimate on their strength my newly created versions are not a clear step up from 2.2.8theta.

Mike who played a copy of 2.2.8theta had nice things to say about how it played.

Bottom line is this Leorik gave me nothing but problems to solve. This Leorik is far more anti human than any of the others. The moves also looked like the moves a grandmaster would play. The difference is astonishing. Congrats!

Given his verdict and that the version with this set of weights is stronger than my strongest build from the master branch I think I can declare this experiment a success. I will honor 2.2.8theta for being the first version that I, with tuning alone, couldn't find a clear improvement on and instead of publishing any of the contenders that are roughly in the same ballpark strenghtwise I will go and release Leorik 2.3 with the original 2.2.8theta weights.

Thanks for the motivating posts, everybody!

Mike Sherwin wrote: ↑Thu Nov 03, 2022 11:02 pm Are you planning a christmas surprise for us?

I wasn't planning it at that time but now there is going to be one. And hopefully not the last update ever - but even if it should be: With the eval being completely derived from selfplay I feel like Leorik is a well rounded package now that I wouldn't have to feel regrets about "abandoning".

(I'll post again when the branch is merged and the builds are up on github.)

Mike Sherwin · Post by **Mike Sherwin** » Tue Dec 20, 2022 3:10 pm

Even if you have satisfied all your goals with Leorik and decide to move on to something else there is one more capability I like to see added. The ability to recognise ECO positions and load piece square tables trained on that position. You might only include a relatively few early positions with Leorik but give the user the ability to add/train additional positions.

lithander · Post by **lithander** » Wed Dec 21, 2022 12:01 pm

Mike Sherwin wrote: ↑Tue Dec 20, 2022 3:10 pm Even if you have satisfied all your goals with Leorik and decide to move on to something else there is one more capability I like to see added. The ability to recognise ECO positions and load piece square tables trained on that position. You might only include a relatively few early positions with Leorik but give the user the ability to add/train additional positions.

Isn't a PSQT per ECO position overkill? If I recognize a individual position I could just play the known best move. This is basically just an opening book. Or do you mean that when a game started with a certain opening the entire game is now played on a set of PSQTs optimized for that opening? That could work but sometimes an engine is asked to evaluate a position with no history of moves. In that case I wouldn't know which PSQT to pick. It would be nice to have some more generic way of describing a position on different axes (e.g. open-closed) and use interpolation again like with the 'phase' value that is used to interpolate between the midgame and endgame tables. (Tapered Eval)

Mike Sherwin · Post by **Mike Sherwin** » Wed Dec 21, 2022 4:30 pm

lithander wrote: ↑Wed Dec 21, 2022 12:01 pm
Mike Sherwin wrote: ↑Tue Dec 20, 2022 3:10 pm Even if you have satisfied all your goals with Leorik and decide to move on to something else there is one more capability I like to see added. The ability to recognise ECO positions and load piece square tables trained on that position. You might only include a relatively few early positions with Leorik but give the user the ability to add/train additional positions.
Isn't a PSQT per ECO position overkill? If I recognize a individual position I could just play the known best move. This is basically just an opening book. Or do you mean that when a game started with a certain opening the entire game is now played on a set of PSQTs optimized for that opening? That could work but sometimes an engine is asked to evaluate a position with no history of moves. In that case I wouldn't know which PSQT to pick. It would be nice to have some more generic way of describing a position on different axes (e.g. open-closed) and use interpolation again like with the 'phase' value that is used to interpolate between the midgame and endgame tables. (Tapered Eval)

I mean train starting from the most popular 2 ply positions like e4 e5, e4 c5, e4 c6 ect. In different starting pawn structures Leorik will do better if it has pstbl's optimised for them. A simple test would be to differentiate between only 1. e4 and 1. d4. That means only three tables, 1 generic like you have now, 2 e4 and 3 d4.

Then for the user give them the ability to train from any starting position, start a game from that position and use that pstbl.

Sorry if I'm not being very clear. I'm not feeling well at the moment.

lithander · Post by **lithander** » Thu Dec 22, 2022 1:28 am

I finally released Version 2.3!

Here is a gauntlet I ran with a few engines of similar strength. All Elo values except Leorik's are fixed.

Code: Select all

   # PLAYER           :  RATING  POINTS  PLAYED   (%)
   1 Inanis-1.1.1     :  2767.0   323.5     620    52
   2 odonata-0.6.2    :  2744.0   298.5     618    48
   3 Leorik-2.3       :  2741.3  1960.5    3716    53
   4 zahak-5.0        :  2730.0   295.5     620    48
   5 dumb-1.9         :  2703.0   325.0     620    52
   6 blunder-8.5.5    :  2700.0   255.0     620    41
   7 Supernova-2.4    :  2687.0   258.0     618    42

The EAS tool shows some favorable stats for Leorik. Short games, a healthy amount of sacrifices and only few bad draws.

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    132596  15.02%  22.27%  13.46%   68   Leorik-2.3  
   2     57194  11.00%  04.00%  18.79%   85   zahak-5.0  
   3     52975  06.99%  05.38%  17.33%   83   odonata-0.6.2  
   4     50228  09.28%  13.50%  27.03%   76   dumb-1.9  
   5     45085  06.67%  10.00%  24.06%   84   Supernova-2.4  
   6     44044  04.19%  10.23%  22.95%   78   Inanis-1.1.1  
   7     34561  06.85%  09.59%  26.24%   81   blunder-8.5.5

If you have not read my previous posts the small strength increase over version 2.2 may be a disappointment.

But the goal of this new version was to rebuild the evaluation from scratch, no longer relying on any 3rd party dataset or engine for labeling. I purged all knowledge borrowed from Zurichess and Stockfish, and that there still is a strength increase at all is more than I expected!

Also this bodes well for the future: Being able to train my weights on selfplay games means I can effortlessly create larger datasets so that in future versions the evaluation can be extended to pick up on rarer and rarer features. (e.g. in the way that Mike suggested)

For the human players I want to encourage you to use the new UCI options Midgame Randomness and Endgame Randomness that force the engine to assign a random cp bonus to each root move while retaining it's usual speed and search depth. I originally added it for data generation but it's also a really nice way to adjust the engines difficulty level in a way that feels somewhat natural. (to me^^)

Mike Sherwin · Post by **Mike Sherwin** » Thu Dec 22, 2022 9:04 pm

Congrats Thomas! I look forward to see how it does at the rating agencies. Be sure to post some interesting sacrificial games.
Happy Holidays.

Mike Sherwin · Post by **Mike Sherwin** » Tue Dec 27, 2022 7:02 pm

I read that Leorik's eval is sort of like a tiny NN (a poor man's NN). A full size NN also incorporates the square the kings are on. This may be done in Leorik by using 15x15 piece square tables where the king is always in the center. Then the pst is still used as an 8x8 table by pointing to the cell in the 15x15 table that places the kings on the 8x8 table at its current square. Jonathan tried this idea in Winter but only got positive result for the bishop table. Still it was worth +20 elo!

Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Devlog of Leorik - New Version 2.3

Re: Devlog of Leorik

Re: Devlog of Leorik