I managed to catch up in terms of playing strength with Leorik 2.2 (the last released version) a few days ago. But of course whenever you reach a milestone after 5 seconds of joy you set the goal higher: There's also unreleased version Leorik 2.2.7 ... the strongest version I have with a few fixes and improvements over 2.2 but still based on the Zurichess data. 
Could my new dataset relying only on pure selfplay and a great deal of randomization and labeled only based on the games outcome hope to outperform one that is widely accepted as high quality and using Stockfish playouts on each position for labeling? I thought it wouldn't be hopeless because my data would be tailored to Leoriks capabilites. I don't care if Stockfish can win with a certain position if my much simpler engine can't.
Been fiddling around for a few days and the test I ran last night seems to indicate I might have succeeded!!
Code: Select all
Score of Leorik-2.2.8theta vs Leorik 2.2.7: 3018 - 2932 - 2424  [0.505] 8374
...      Leorik-2.2.8theta playing White: 1711 - 1296 - 1181  [0.550] 4188
...      Leorik-2.2.8theta playing Black: 1307 - 1636 - 1243  [0.461] 4186
...      White vs Black: 3347 - 2603 - 2424  [0.544] 8374
Elo difference: 3.6 +/- 6.3, LOS: 86.8 %, DrawRatio: 28.9 %
But when I looked at a few games I noticed the playing style is nothing alike old Leorik or even the previous self-trained versions. What I did with this one (and it was just a stupid experiment I didn't expect to lead anywhere) is that I ignored all the drawn games when compiling my EPDs. That means every position in there is either labeled as winning or losing. The tuner had problems to converge, the MSE=0,42024 is terrible compared with the MSE = 0,247370 on the Zurichess dataset. And still it works in actual play. To me some moves looked very bold, so I used Stefan Pohl's EAS tool to compute the versions EAS score:
Code: Select all
Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    112519  13.71%  17.29%  13.33%   71   Leorik-2.2.8theta  
   2     62636  06.43%  12.54%  20.43%   76   Leorik 2.2.7  
I also compared it with a bunch of other engines at my level...
Code: Select all
Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    169571  16.88%  30.78%  13.76%   68   Leorik-2.2.8theta  
   2     86431  09.35%  04.67%  10.68%   98   zahak-5.0  
   3     66035  08.41%  11.21%  16.96%   83   Inanis-1.1.1  
   4     51044  09.73%  06.19%  20.00%   85   odonata-0.6.2  
   5     48340  07.56%  11.76%  24.18%   82   dumb-1.9  
   6     42417  09.52%  12.38%  29.58%   87   Supernova-2.4  
   7     38761  04.76%  08.33%  22.68%   86   blunder-8.5.5  
...and not only did the new version really good Elo-wise it also plays short, high risk matches with a lot of sacrifices and I think that is something humans like to watch in engine matches. So while I'm not quite ready to publish this as version 2.3 I'm going to try my best to keep this new, aggressive playstyle!