Devlog of Leorik

lithander · Post by **lithander** » Tue Sep 06, 2022 10:12 am

algerbrex wrote: ↑Sun Sep 04, 2022 6:32 pm
lithander wrote: ↑Sun Sep 04, 2022 3:39 pm One class of blunders has to do with the endgame. For example Leorik doesn't realize that a lone bishop or lone knight doesn't allow it to win. To Leorik it looks like one side is the value of a minor piece up. (200-300cp) It will trade pawns away for gaining this "advantage". I suppose I should encode knowledge for piece-combinations that Leorik should treat as drawn instead of running it's PST based eval on them.
Yea, I noticed that when I was playing some games for fun between Leorik and Blunder. Many times Leorik would be reporting a +300 cp score in a completely drawn endgame because it doesn't have the knowledge yet that king and minor vs king is a draw. Should be a pretty straightforward fix. Especially since those examples are not just theoretical draws, but actual draws. For theoretical draws where it's still possible for the opponent to blunder, I just divide the evaluation by a constant like 16 to indicate to Blunder that that part of the tree leads to a drawn endgame, so it'll prefer to leave pawns on the board when possible.

I've written some code that generates the Endgame Classification of a given position and then researched what kind of endgames are drawn despite a significant material advantage for the stronger side.

Code: Select all

        public static HashSet<string> Drawn = new()
        {
            "KNvK", "KvKN",
            "KBvK", "KvKB",
            "KNNvK", "KvKNN",

            "KNNvKN", "KNvKNN",
            "KNNvKB", "KBvKNN",
            
            "KRvKN", "KNvKR",
            "KRvKB", "KBvKR"
        };

That's what I found. Did I miss anything?

Whenever I normally use the static evaluation score I now pass it through a little function:

Code: Select all

        public int ScaleEndgame(int score)
        {
            int cnt = Bitboard.PopCount(Black | White);
            if (cnt > 5)
                return score;

            if (Drawn.Contains(Notation.GetEndgameClass(this)))
                return score >> 3;
            else
                return score;
        }

The shift by 3 basically scales the score down towards zero but does not make every move look exactly equal.

So this code should do what you suggested, right? And of course the implementation is horrendously inefficient. So I ran a selftest against the previous stable version where each engine get's unlimited time but is limited by 2M nodes max which should factor the engine-speed out of the equation.

Code: Select all

Score of Leorik 2.2.8e vs Leorik 2.2.6: 692 - 521 - 901  [0.540] 2114
...      Leorik 2.2.8e playing White: 440 - 245 - 372  [0.592] 1057
...      Leorik 2.2.8e playing Black: 252 - 276 - 529  [0.489] 1057
...      White vs Black: 716 - 497 - 901  [0.552] 2114
Elo difference: 28.2 +/- 11.2, LOS: 100.0 %, DrawRatio: 42.6 %

This looks very promising indeed!

akanalytics · Post by **akanalytics** » Tue Sep 06, 2022 7:04 pm

Haha! Im building something very similar. The "scaling" approach I've seen elsewhere, though it felt like a reduction of 2 would be sufficient to prevent exchanges. Whats the reasoning behind 1/8th?
The only difference is that I included some balances such as R+N vs R which although not strictly a draw, is probably a draw. I have some logic that tries to reduce the search depth too (or drops into QS) when it finds such positions, but this interacts horribly with the hash table, so not currently enabled...

The other approach I used was to use "texel" tuning to give a specific value to certain material combinations, rather than the piece-wise totals. ie if the material combination was found in a lookup table the tuned centipawn value would be used (for the material component of eval) , otherwise the piecewise total for white less black was used. The tuning process would for example give KNN v KN a centipawn value very close to zero.

There were too many other code changes to alternate between these two approaches so I never got to try them head-to-head unfortunately...

lithander · Post by **lithander** » Wed Sep 07, 2022 12:35 am

akanalytics wrote: ↑Tue Sep 06, 2022 7:04 pm Haha! Im building something very similar. The "scaling" approach I've seen elsewhere, though it felt like a reduction of 2 would be sufficient to prevent exchanges. Whats the reasoning behind 1/8th?

Christian (algerbrex) said he's dividing by 16 and I thought that was maybe a little too much and so I used a right-shift by 3 instead of 4.

akanalytics wrote: ↑Tue Sep 06, 2022 7:04 pm The only difference is that I included some balances such as R+N vs R which although not strictly a draw, is probably a draw. I have some logic that tries to reduce the search depth too (or drops into QS) when it finds such positions, but this interacts horribly with the hash table, so not currently enabled...

A while ago I had downloaded a zip with 16 billion positions for NNUE generation and it came with a textfile of stats among which I found a Distribution of endgame configurations (count W D L Perf%): and I was scanning it for configurations that had a roughly 50% performance but a non-zero material balance. My set of drawn positions got pretty long but at closer inspection the statistics where a little fishy... e.g:

Code: Select all

                        Count     Wins     Draws   Loss
KR   vK      (+5 ):   1192062   561894    630168      0   74%
KPR  vK      (+6 ):    175522    25088    150434      0   57%

...why should having a Pawn extra make it harder to win?!

...so I chose to simplify and just use the most obvious configurations for a start. I'm now looking into the Syzygy table instead of some random file I found on the internet. And for "R+N vs R" only 80% of positions are a draw. I guess I stick to my set for now and focus on a speedy implementation and see how close I can come to the 30 Elo my preliminary test promised me. I can still try to improve upon that baseline later.

Interesting idea to use tuning for that!

algerbrex · Post by **algerbrex** » Wed Sep 07, 2022 8:24 am

lithander wrote: ↑Tue Sep 06, 2022 10:12 am I've written some code that generates the Endgame Classification of a given position and then researched what kind of endgames are drawn despite a significant material advantage for the stronger side.
Code: Select all
        public static HashSet<string> Drawn = new()
        {
            "KNvK", "KvKN",
            "KBvK", "KvKB",
            "KNNvK", "KvKNN",

            "KNNvKN", "KNvKNN",
            "KNNvKB", "KBvKNN",
            
            "KRvKN", "KNvKR",
            "KRvKB", "KBvKR"
        };
That's what I found. Did I miss anything?

Looks like you're covering most important cases now

my only note would be that you could separate the theoretical draws, like knight and rook, versus the actual draws, like bishop vs lone king. A rook can win against a knight in some cases, but there's never a case where a bishop could win against a lone king. So whereas for the later a score of strictly 0 can be returned, for the former you would use the constant division approach.

lithander wrote: ↑Tue Sep 06, 2022 10:12 am Whenever I normally use the static evaluation score I now pass it through a little function:
Code: Select all
        public int ScaleEndgame(int score)
        {
            int cnt = Bitboard.PopCount(Black | White);
            if (cnt > 5)
                return score;

            if (Drawn.Contains(Notation.GetEndgameClass(this)))
                return score >> 3;
            else
                return score;
        }
The shift by 3 basically scales the score down towards zero but does not make every move look exactly equal.

So this code should do what you suggested, right?

Yep, except as I mentioned I return a score of 0 for non-theoretical draws:

Code: Select all

// Evaluate a position and give a score, from the perspective of the side to move (
// more positive if it's good for the side to move, otherwise more negative).
func EvaluatePos(pos *Position) int16 {
	if isDrawn(pos) {
		return Draw // = 0
	}

	...

	score := int16(((int32(mgScore) * (int32(256) - int32(phase))) + (int32(egScore) * int32(phase))) / int32(256))

	if isDrawish(pos) {
		return score / ScaleFactor // = score / 16
	}

	return score
}

lithander wrote: ↑Tue Sep 06, 2022 10:12 am And of course the implementation is horrendously inefficient. So I ran a selftest against the previous stable version where each engine get's unlimited time but is limited by 2M nodes max which should factor the engine-speed out of the equation.
Code: Select all
Score of Leorik 2.2.8e vs Leorik 2.2.6: 692 - 521 - 901  [0.540] 2114
...      Leorik 2.2.8e playing White: 440 - 245 - 372  [0.592] 1057
...      Leorik 2.2.8e playing Black: 252 - 276 - 529  [0.489] 1057
...      White vs Black: 716 - 497 - 901  [0.552] 2114
Elo difference: 28.2 +/- 11.2, LOS: 100.0 %, DrawRatio: 42.6 %
This looks very promising indeed!

Nice! Glad to see you might've found a little more fruit down here to pick

in retrospect I agree 16 may be a little bit too big of a constant, I think I'll experiment with 8 and see if that gains any Elo.

lithander · Post by **lithander** » Wed Sep 07, 2022 10:01 am

algerbrex wrote: ↑Wed Sep 07, 2022 8:24 am Nice! Glad to see you might've found a little more fruit down here to pick in retrospect I agree 16 may be a little bit too big of a constant, I think I'll experiment with 8 and see if that gains any Elo.

First I have to make it fast enough to not tank my 'nps' more than what it provides in value.

I'd be curious to know (if you happen to know have tested that) when you remove the isDrawn() and isDrawish() from your code how that affects your engine's speed and how much net Elo do you lose?

Mike Sherwin · Post by **Mike Sherwin** » Thu Nov 03, 2022 11:02 pm

Are you planning a christmas surprise for us?

lithander · Post by **lithander** » Fri Nov 04, 2022 6:31 pm

Mike Sherwin wrote: ↑Thu Nov 03, 2022 11:02 pm Are you planning a christmas surprise for us?

That would be nice, wouldn't it? Leorik is definitely not abandoned but development is not making much progress either at the moment. I have started to take a deep look into Syzygy tablebases with the goal of porting the probing code to C# and integrate it with Leorik but found it to be very complicated material that takes will-power to plow through.

Also my electricity provider just doubled the price for electricity to 0.53€ per kWh making such a compute heavy hobby literally quite expensive. I need to find someone to retrofit my house with solar panels or something before I can indulge chess programming guilt-free again!

dangi12012 · Post by **dangi12012** » Fri Nov 04, 2022 8:37 pm

lithander wrote: ↑Fri Nov 04, 2022 6:31 pm
Mike Sherwin wrote: ↑Thu Nov 03, 2022 11:02 pm Are you planning a christmas surprise for us?
That would be nice, wouldn't it? Leorik is definitely not abandoned but development is not making much progress either at the moment. I have started to take a deep look into Syzygy tablebases with the goal of porting the probing code to C# and integrate it with Leorik but found it to be very complicated material that takes will-power to plow through.

Also my electricity provider just doubled the price for electricity to 0.53€ per kWh making such a compute heavy hobby literally quite expensive. I need to find someone to retrofit my house with solar panels or something before I can indulge chess programming guilt-free again!

Get rich by producing more than you consume. I can recommend vertex s 405 panels. Good Quality - not cheap - not expensive.
With current energy prices a solar investment into your own house can pay itself back in 3 years. No safe investment I have ever seen has a 33% margin.
Could take longer if prices normalise suddenly ofc.

lithander · Post by **lithander** » Sun Dec 04, 2022 1:08 am

Another month has gone by with no updates. And frankly, since the release of version 2.2 in the summer I haven't made any significant progress with the engine. I made a lot of feature branches, tried ideas that other engines use but didn't lead nowhere for me. I looked at Leorik's blunders against other engines trying to find bugs. Literally tried to get an oracle involved by implementing Syzygy support; wich for a C# engine means either I link to a native-code DLL or I have to port the probing code which is a lot more work than I originally anticipated because of the sophisticated compression. And over all this I struggled keeping my motivation and there have been many weeks in which I didn't even think about chess programming at all.

In the 25 years I'm programming now I have started and abandoned dozens of projects. Some I look back to proudly and for many I have regrets of leaving them too early. So I asked myself what would I come to regret if I left Leorik in the current state?

Recently I was trying to climb the Elo ladder, to do better in tournament matches. But my regrets in hindsight wouldn't be about reaching a certain Elo milestone. Instead the biggest flaw is a lack of "purity". Since I have written my first tuner for the PSQTs in MinimalChess I was using the same set of 725k annotated positions from Zurichess. And looking at the Readme.txt that comes with these positions the label of these positions was derived by playing the position to a conclusion with Stockfish.
I have always avoided looking at other engines sourcecode when implementing new ideas in Leorik (which I got from reading the forum or the wiki) but the tuner just transferred chess-knowledge from Zurichess and Stockfish and encoded it into the weights of Leoriks evaluation. Nothing unethical about that. But when I got interested in chess programming that was after hearing how Alpha-Zero learned chess from scratch by purely self-play.

Imagining myself looking back at Leorik as an abandoned project I would really regret if I hadn't made a serious attempt of doing something like that. All the weights and coefficients of the HCE are owed to the dataset my tuner is using. I need to create my own dataset! And I would have to start with a version of Leorik where all the borrowed knowledge is purged from the evaluation. Which means going back to material values!

Now I was excited again. This was radical enough to make me curious! Neutering the evaluation like that made Leorik play like an imbecile. In fact the games were very short!

...so I added randomness to the engine (as an UCI option) and now I got games like this from selfplay.

[Event "?"] 
[Site "?"] 
[Date "2022.12.04"] 
[Round "?"] 
[White "Leorik-2.2.8a"] 
[Black "Leorik-2.2.8a"] 
[Result "1-0"] 
[ECO "A04"] 
[GameDuration "00:05:56"] 
[GameEndTime "2022-12-04T00:33:15.154 Mitteleuropäische Zeit"] 
[GameStartTime "2022-12-04T00:27:18.673 Mitteleuropäische Zeit"] 
[Opening "Reti Opening"] 
[PlyCount "239"] 
[TimeControl "40/60"] 
 
1. Nf3 {+0.09/22 0.87s} Nc6 {+0.19/22 0.91s} 2. Rg1 {+0.14/21 1.2s} 
b6 {+0.36/20 1.3s} 3. Na3 {+0.36/20 1.1s} h6 {+0.20/20 1.1s} 
4. b3 {+0.15/20 1.3s} Nb4 {+0.45/19 0.90s} 5. c4 {+0.38/19 1.7s} 
Ba6 {+0.32/18 0.89s} 6. d4 {+0.33/18 1.2s} d6 {+0.34/17 1.2s} 
7. Kd2 {+0.02/17 1.5s} Kd7 {+0.47/17 0.90s} 8. Bb2 {+0.01/18 1.3s} 
d5 {+0.20/16 1.3s} 9. c5 {+0.06/15 1.0s} Qc8 {+0.23/16 1.6s} 
10. Ke1 {+0.41/15 1.3s} Qb7 {+0.34/16 1.0s} 11. h4 {+0.09/15 1.1s} 
e6 {+0.22/16 1.4s} 12. Rh1 {+0.15/15 1.1s} Nf6 {+0.20/15 1.4s} 
13. Ne5+ {+0.46/15 1.0s} Ke7 {+0.46/15 1.5s} 14. Bc3 {+0.27/15 1.1s} 
bxc5 {+0.40/15 1.2s} 15. dxc5 {+0.43/15 0.96s} Nh7 {+0.29/14 0.84s} 
16. Qd2 {+0.48/15 1.3s} Nc6 {+0.35/16 1.4s} 17. Nf3 {+0.21/15 0.88s} 
Rb8 {+0.27/14 0.89s} 18. Qf4 {+0.14/15 0.97s} Qc8 {+0.24/15 1.6s} 
19. Kd2 {+0.28/15 1.6s} Nf6 {+0.33/15 1.1s} 20. Ne5 {+0.42/15 1.5s} 
Qe8 {+0.16/14 1.2s} 21. Nxc6+ {+0.34/14 0.99s} Qxc6 {+0.30/17 1.5s} 
22. Bxf6+ {+0.03/16 1.1s} gxf6 {+0.36/17 1.5s} 23. Rc1 {+0.47/16 1.0s} 
Re8 {+0.35/16 1.2s} 24. Ke1 {+0.35/16 1.2s} Rg8 {+0.03/17 1.7s} 
25. Nb1 {+0.15/16 0.99s} Rc8 {+0.28/17 1.2s} 26. Rh3 {+0.44/17 1.3s} 
Rd8 {+0.27/16 1.7s} 27. Qd4 {+0.34/17 1.8s} Rh8 {+0.35/16 0.98s} 
28. Qb2 {+0.35/18 1.2s} Bc8 {+0.12/18 1.3s} 29. Nd2 {+0.27/18 1.3s} 
Qe8 {+0.24/17 1.2s} 30. b4 {+0.36/17 1.8s} Qb5 {+0.29/16 1.3s} 
31. Re3 {+0.18/16 2.4s} Qc6 {+0.21/16 1.1s} 32. Ra1 {+0.44/16 1.5s} 
h5 {+0.30/16 1.3s} 33. Nb3 {+0.08/16 1.4s} Qd7 {+0.10/15 1.6s} 
34. Rg3 {+0.08/17 2.5s} Rh6 {+0.16/17 1.4s} 35. Qd4 {+0.22/17 1.8s} 
Qc6 {0.00/18 3.7s} 36. Qc3 {+0.21/17 1.4s} Qa6 {+0.43/18 3.8s} 
37. e3 {+0.31/17 2.4s} Qb7 {+0.02/18 1.6s} 38. Bd3 {+0.42/18 2.2s} 
Rh8 {+0.07/17 1.5s} 39. Bf1 {+0.35/18 2.7s} c6 {+0.25/19 2.0s} 
40. Kd1 {+0.15/19 2.9s} a6 {+0.24/19 2.5s} 41. Nc1 {+0.47/17 0.96s} 
Qa8 {+0.29/17 1.0s} 42. Rh3 {+0.20/18 1.4s} Bg7 {+0.11/18 1.0s} 
43. a3 {+0.22/17 1.1s} Kd7 {+0.01/18 1.0s} 44. Ke1 {+0.47/17 0.79s} 
Qb7 {+0.28/18 0.97s} 45. Kd2 {+0.43/17 1.0s} Rde8 {+0.32/18 1.5s} 
46. Rb1 {+0.14/18 1.5s} Qc7 {+0.01/17 1.1s} 47. Nd3 {+0.18/17 0.87s} 
Ref8 {+0.20/16 0.84s} 48. Rb3 {+0.39/16 1.3s} Rh6 {+0.23/18 1.00s} 
49. Rf3 {+0.33/16 0.97s} Rfh8 {+0.26/18 1.5s} 50. Qc2 {+0.21/17 1.1s} 
Ke8 {+0.13/18 2.9s} 51. Rb2 {+0.26/17 0.92s} Rf8 {+0.21/17 0.91s} 
52. Kc1 {+0.16/17 0.93s} Qe7 {+0.37/17 2.4s} 53. Ra2 {+0.18/17 2.1s} 
Bb7 {+0.46/17 1.7s} 54. Qd1 {+0.27/14 1.0s} f5 {+0.01/16 1.0s} 
55. Rg3 {+0.21/17 4.3s} Bf6 {+0.32/17 0.92s} 56. Nf4 {+0.29/17 1.7s} 
Bxh4 {+0.42/15 0.95s} 57. Rh3 {+0.26/17 1.8s} Rfh8 {+0.16/16 1.5s} 
58. Qa4 {+0.40/15 1.6s} Kd7 {+0.09/15 1.5s} 59. Bxa6 {+0.04/15 1.4s} 
Bxa6 {+0.37/16 1.4s} 60. Qxa6 {+0.38/16 0.97s} e5 {+0.09/17 1.5s} 
61. Qa7+ {+0.39/16 1.0s} Kc8 {+0.39/18 1.1s} 62. Qa8+ {+0.35/17 1.0s} 
Kc7 {+0.15/19 1.3s} 63. Qa5+ {+0.47/17 1.2s} Kd7 {+0.36/18 0.93s} 
64. g3 {+0.36/17 1.6s} exf4 {+0.04/18 1.6s} 65. gxh4 {+0.45/18 1.6s} 
Qe5 {+0.48/17 1.0s} 66. Rf3 {+0.26/16 1.4s} fxe3 {+0.26/16 1.3s} 
67. Rxe3 {0.00/17 1.3s} Qb8 {+0.22/17 2.6s} 68. Rd2 {+0.24/18 1.7s} 
f4 {+0.30/17 1.2s} 69. Rc3 {+0.19/17 1.2s} Rf6 {+0.32/17 2.1s} 
70. f3 {+0.41/17 2.1s} Re8 {+0.35/18 1.9s} 71. Kd1 {+0.05/18 2.0s} 
Kc8 {+0.43/18 1.5s} 72. Rg2 {+0.46/18 1.5s} Rh6 {+0.18/18 2.2s} 
73. Rg1 {+0.39/17 1.2s} Rf6 {+0.05/18 1.1s} 74. Rc2 {+0.14/18 1.3s} 
Rg6 {+0.33/19 1.8s} 75. Rh1 {+0.03/17 1.5s} Qb5 {+0.26/17 1.9s} 
76. Qxb5 {+0.04/20 2.1s} cxb5 {+0.49/20 2.0s} 77. Re1 {+0.39/21 1.8s} 
Kd8 {+0.13/20 1.3s} 78. Rxe8+ {+1.13/22 1.8s} Kxe8 {-0.77/23 2.0s} 
79. Rd2 {+1.38/24 2.7s} Rg1+ {-0.54/23 3.7s} 80. Kc2 {+1.15/24 4.3s} 
Rf1 {-0.72/22 1.9s} 81. c6 {+1.31/24 4.4s} Rxf3 {-0.58/21 0.90s} 
82. Rxd5 {+1.11/23 3.0s} Rf2+ {-0.79/20 1.1s} 83. Kd1 {+1.28/21 1.1s} 
Rf1+ {-0.53/22 1.2s} 84. Ke2 {+1.38/22 1.3s} Rc1 {-0.64/22 1.4s} 
85. Re5+ {+1.20/22 1.3s} Kf8 {-0.68/22 1.1s} 86. Rxb5 {+1.41/22 0.88s} 
Rc4 {-0.73/21 1.3s} 87. Kd3 {+1.26/20 0.71s} Rxc6 {-0.66/22 0.97s} 
88. Rxh5 {+1.48/21 0.76s} Ke8 {-0.74/21 1.6s} 89. Ra5 {+1.38/20 0.98s} 
Re6 {-0.79/21 0.86s} 90. Kc4 {+1.42/21 1.2s} Re1 {-0.82/20 1.2s} 
91. Kd4 {+1.35/20 0.83s} Rb1 {-0.73/21 0.87s} 92. Ke4 {+1.28/21 1.1s} 
Rh1 {-0.86/22 1.1s} 93. h5 {+1.10/21 1.0s} Rf1 {-0.83/21 0.91s} 
94. Rg5 {+1.07/22 1.4s} f3 {-0.67/21 0.97s} 95. Ke3 {+1.14/21 1.2s} 
Kf8 {-0.75/22 1.4s} 96. Rf5 {+1.11/22 0.97s} Ra1 {-0.57/24 1.7s} 
97. Ra5 {+1.36/22 0.86s} Rf1 {-0.99/23 0.96s} 98. Rg5 {+1.48/22 0.82s} 
Ke8 {-0.61/22 1.6s} 99. Rb5 {+1.21/22 1.2s} Kd7 {-0.99/22 1.7s} 
100. a4 {+1.27/19 1.3s} Kd6 {-0.77/20 1.1s} 101. a5 {+2.10/20 0.99s} 
Ke6 {-1.85/20 1.2s} 102. a6 {+3.04/21 3.6s} f2 {-1.67/20 1.0s} 
103. Ra5 {+5.41/21 2.7s} Rc1 {-1.86/21 2.3s} 104. Kxf2 {+5.07/21 2.4s} 
Rc2+ {-4.61/22 1.3s} 105. Ke3 {+5.26/22 1.0s} Rc3+ {-7.93/21 2.0s} 
106. Kd4 {+6.07/22 0.95s} Rc8 {-8.77/21 1.0s} 107. a7 {+6.39/22 4.6s} 
Rd8+ {-8.65/22 1.7s} 108. Kc5 {+8.36/19 0.80s} Rc8+ {-6.98/21 2.5s} 
109. Kb6 {+9.09/21 0.96s} f5 {-6.99/20 5.2s} 110. Kb7 {+14.39/21 3.5s} 
Rf8 {-8.76/20 1.5s} 111. a8=Q {+14.00/19 0.68s} Rxa8 {-12.53/18 1.1s} 
112. Kxa8 {+14.28/20 1.0s} Kf6 {-13.65/19 0.97s} 113. Ra1 {+15.39/20 1.6s} 
Kg5 {-16.68/18 1.9s} 114. Rh1 {+23.08/19 1.3s} f4 {-22.52/20 3.2s} 
115. h6 {+15.47/17 1.0s} Kg4 {-22.88/18 1.1s} 116. h7 {+22.15/16 0.87s} 
Kf3 {-M62/16 1.1s} 117. h8=Q {+M1/16 1.4s} Kg2 {-M56/16 1.5s} 
118. Qh3+ {0.00/15 1.8s} Kf2 {-M30/16 1.3s} 119. Rh2+ {0.00/14 1.5s} 
Kg1 {-M44/17 1.6s} 120. Qg2# {-M30/15 0.78s, White mates} 1-0

I'm still a terrible chess player but even I find it hard to watch. But after move 78 a proper engine says that what looks like an equal position (if you count only material) is actually winning for white. So I hoped that there would be something to learn even from games like this. That I wouldn't need Stockfish to evaluate my positions for me. I just scored all positions leading to whites eventual win as winning for white. I thought that as long as the wrongly labeled positions cancel each other out to just random noise and if there remain enough positions like the ones after move 78 that are actually correctly labeled, then training on these positions should actually produce weights that are better than the material values I started with.

All the code is written (PGN parser, a function that generates quiet positions from violent ones, a new evaluation that does not contain any handcrafted terms anymore - just features and weights - but should be no less powerful if provided with a good set of weights) and now I'm playing batches of 10k games, eventually add the new pgn files to the training-set (and cull some old ones) and retrain all the weights from scratch. Despite taking PGN files as input (instead of annotated FENs) it takes only a few minutes to train all weights from scratch at the moment. I have repeated the process half a dozen times. Around 150k games total. And so far I haven't plateaued and the engine has just started to win a few games against Leorik 2.2 already! A happy milestone!

Let me know if you are interested in any details. The post is already long enough but I would love to elaborate in the next one.

JoAnnP38 · Post by **JoAnnP38** » Sun Dec 04, 2022 3:53 pm

lithander wrote: ↑Sun Dec 04, 2022 1:08 am Imagining myself looking back at Leorik as an abandoned project I would really regret if I hadn't made a serious attempt of doing something like that. All the weights and coefficients of the HCE are owed to the dataset my tuner is using. I need to create my own dataset! And I would have to start with a version of Leorik where all the borrowed knowledge is purged from the evaluation. Which means going back to material values!

Now I was excited again. This was radical enough to make me curious! Neutering the evaluation like that made Leorik play like an imbecile. In fact the games were very short!

This is exactly what inspired me to write a chess engine some 35-40 years after writing my first engine. Back then I used it as a suitable challenge to help me become proficient at C. Now, I am excited about the prospect of a self-teaching chess engine. I have already built a component based on a genetic algorithm that will let my evaluation evolve over time. Essentially it will be a collection of "features" (as many as can reasonably be implemented) and then I will encode the weights of these features into a "chromosome" so it can be scored and generate progeny with other promising chromosomes in my gene pool. My initial population of chromosomes can just be generated randomly, and I'll enjoy watching evolution in progress. I imagine that there will be a threshold that if a feature's weight is too low it will drop out of the evaluation altogether which could allow different genes to inspire different play. I'm quite excited about this part as I don't think I will need a large database of games to "tune" my engine but rather it will learn over time and each chromosomal generation will start to converge on better and better chromosomes through survival of the fittest. I like this method because if mutations are part of the process, there will be more of a chance that the engine's learning won't get trapped in a local maximum.

BTW, I was really inspired by your implementation of MinimalChess. I thought your code was so clean it was beautiful. I am using it as a reference for developing my UCI interface as I'm not really much of a chess player so some of the time controls in the specification were eluding my understanding. Thanks for making that available.

Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik