Labeled positions for Texel tuning

Robert Pope · Post by **Robert Pope** » Wed Jun 14, 2023 3:43 am

I have a dataset of 725K quiet labeled positions that were generated by Zurichess, but I was wondering if anyone knew of other good datasets that are publicly available? I'm working on generating my own, but I don't think they'll be as good quality. I also tried to follow some of the links in chessprogramming.org, but didn't find much.

AndrewGrant · Post by **AndrewGrant** » Wed Jun 14, 2023 5:15 am

https://talkchess.com/forum3/viewtopic.php?f=7&t=77502

lithander · Post by **lithander** » Wed Jun 14, 2023 3:43 pm

In my devlog I have written a few posts on how I weaned Leorik off the Zurichess set and created my own data from scratch via selfplay, starting with a completely dumb evaluation based on just material values. (100, 300, 500 and 900)

It starts here if you're curious: https://talkchess.com/forum3/viewtopic. ... 40#p938897

Eventually I managed to exceed the performance I got out of the Zurichess set. Version 2.3 and later have been tuned on this growing repository of selfplay games (with randomization).

Be aware that these labels are far from perfect: just the outcome of the game, not involving stockfish in the labeling as it was done (afaik) for the Zurichess set. That it worked so well was a surprise to me, to be honest. And it would be interesting to know if it works well for other engines, too.

So, if you want I can upload a file with a few million labeled positions that I currently use for training version 2.4.X.

Robert Pope · Post by **Robert Pope** » Thu Jun 15, 2023 2:34 am

Thank you, both. I think Andrew's data has what I was looking for to start, so I'm going to try those while I work on generating my own data.

lithander · Post by **lithander** » Sat Jun 17, 2023 2:18 pm

Today I tuned a version of Leorik on the first 5M positions from Andrew's E12.52-1M-D12-Resolved.book (Leorik2.4.3d) to compare it with my current dev version tuned on 3.4M positions from Leorik selfplay games. I didn't change anything else except the dataset.

Code: Select all

   # PLAYER           :  RATING  POINTS  PLAYED   (%)
   1 Leorik-2.4.3c    :  2334.7  1442.0    2413    60
   2 Leorik-2.4.3d    :  2265.3   971.0    2413    40

The result is quite decisive in favor of Leorik tuned on Leorik's own selfplay games. Now I'm curious if my dataset would work equally well for other engines. If you want to try it you can download it here: DATA-L26-3443372.zip

The format is equal to the one used in the Zurichess set:

Code: Select all

8/8/8/8/2q4p/6k1/8/K7 b - - 1 1 c9 "0-1";
8/Q5pk/5p1p/1P3q2/8/8/3r4/K7 w - - 0 1 c9 "0-1";
4R3/p5p1/5rk1/3B3p/2P3bP/5pP1/PP3P2/K7 b - - 2 1 c9 "1-0";
8/8/1p4Q1/pq4pp/5p2/6k1/8/K7 b - - 5 1 c9 "0-1";
4b3/8/8/1k4P1/pp2p3/3pN3/8/K7 b - - 1 1 c9 "0-1";
5k2/4r3/5p1p/2Q4P/p3b3/P5P1/2P2P2/K7 b - - 2 1 c9 "1-0";

Robert Pope · Post by **Robert Pope** » Sat Jun 17, 2023 9:57 pm

Interesting. I'll try yours and report back. I'm embarrassed to note that my FEN/EPD reader code is pretty brittle, and I'm having a little trouble with Andrew's data, so this will be an easy interim project.

Ras · Post by **Ras** » Sat Jun 17, 2023 10:06 pm

lithander wrote: ↑Sat Jun 17, 2023 2:18 pmNow I'm curious if my dataset would work equally well for other engines. If you want to try it you can download it here: DATA-L26-3443372.zip

I tried it against the Zurichess training set in my upcoming engine version. With your training data, the score was 49.5% at 10k games.

An interesting detail I noticed is that with your training set, the bishop pair gets a large advantage when there are no pawns. I suspected it might have something to do with Leorik struggling with KBN:K, which is why it would overrate the bishop pair compared to B+N. Here two colour swapped games at 10s/game from the same KBN:K starting position, no tablebases used, and Leorik doesn't win that endgame (unlike KBB:K).

[pgn][White "Leorik 2.4"]
[Black "CT800 V1.45 64 bit"]
[Result "1/2-1/2"]
[Termination "50 moves rule"]
[FEN "8/8/3k4/8/8/8/8/K2N3B w - - 0 1"]
[PlyCount "100"]

1. Kb2 {1350/18} Ke5 {-738/11} 2. Kc3 {1350/19} Kf5 {-741/11} 3. Ne3+ {1350/19} Ke6 {-744/10}
4. Kd4 {1382/16} Kf6 {-751/11} 5. Ke4 {1423/17} Ke6 {-751/11} 6. Nf5 {1465/17} Kf6 {-750/11}
7. Kf4 {1465/18} Ke6 {-755/11} 8. Bc6 {1465/17} Kf6 {-760/11} 9. Bd5 {1472/17} Kg6 {-760/9}
10. Nh4+ {1492/17} Kh5 {-750/11} 11. Nf3 {1500/18} Kg6 {-760/11} 12. Ke5 {1536/17} Kh6 {-769/11}
13. Kf6 {1576/17} Kh7 {-769/11} 14. Nd4 {1536/17} Kh8 {-762/9} 15. Nf5 {1567/17} Kh7 {-759/2}
16. Kf7 {1536/18} Kh8 {-747/2} 17. Nd6 {1536/19} Kh7 {-759/2} 18. Kf6 {1536/18} Kh8 {-769/11}
19. Nf7+ {1536/18} Kg8 {-779/11} 20. Ne5+ {1536/18} Kh7 {-779/11} 21. Nf3 {1565/17} Kh8 {-768/9}
22. Bf7 {1536/18} Kh7 {-757/2} 23. Nd4 {1552/19} Kh8 {-769/11} 24. Bg6 {1565/20} Kg8 {-757/2}
25. Nc6 {1565/19} Kh8 {-774/11} 26. Ne5 {1565/20} Kg8 {-774/10} 27. Nf3 {1567/20} Kh8 {-768/10}
28. Ne1 {1565/17} Kg8 {-757/2} 29. Nd3 {1565/19} Kh8 {-769/10} 30. Bf7 {1565/19} Kh7 {-757/2}
31. Ne5 {1567/18} Kh8 {-771/10} 32. Ng4 {1567/19} Kh7 {-757/2} 33. Bh5 {1565/18} Kh8 {-769/11}
34. Nh6 {1565/19} Kh7 {-757/2} 35. Nf7 {1565/20} Kg8 {-759/2} 36. Ne5 {1565/20} Kh8 {-772/10}
37. Nc4 {1565/19} Kh7 {-771/10} 38. Bg6+ {1565/19} Kh8 {-768/9} 39. Nb6 {1565/20} Kg8 {-768/9}
40. Nd7 {1565/19} Kh8 {-745/2} 41. Bf7 {1565/20} Kh7 {-757/2} 42. Nb6 {1565/19} Kh8 {-769/11}
43. Nc4 {1565/20} Kh7 {-757/2} 44. Ke7 {1565/19} Kh8 {-764/9} 45. Bh5 {1565/18} Kg7 {-761/10}
46. Be8 {1565/19} Kh8 {0/10} 47. Kf7 {1536/22} Kh7 {-759/2} 48. Ne5 {0/59} Kh6 {0/42}
49. Ke6 {0/99} Kg5 {0/42} 50. Bf7 {0/99} Kf4 {0/42} 1/2-1/2[/pgn]

[pgn][White "CT800 V1.45 64 bit"]
[Black "Leorik 2.4"]
[Result "1-0"]
[Termination "checkmate"]
[FEN "8/8/3k4/8/8/8/8/K2N3B w - - 0 1"]
[PlyCount "27"]

1. Kb2 {733/11} Ke5 {-1350/16} 2. Kc3 {739/11} Kf4 {-1350/17} 3. Kd4 {743/11} Kg3 {-1350/18}
4. Ke5 {778/10} Kh3 {-1350/19} 5. Kf4 {799/10} Kh2 {-1404/18} 6. Bf3 {807/11} Kh3 {-M8/18}
7. Ne3 {808/10} Kh4 {-M7/18} 8. Be2 {809/10} Kh3 {-M6/19} 9. Bg4+ {M6/10} Kh2 {-M5/20}
10. Kf3 {M5/3} Kg1 {-M4/21} 11. Kg3 {M4/3} Kh1 {-M3/21} 12. Kf2 {M3/5} Kh2 {-M2/21}
13. Nf1+ {M2/3} Kh1 {-M1/23} 14. Bf3# {M1/3} 1-0[/pgn]

lithander · Post by **lithander** » Sat Jun 17, 2023 11:28 pm

Ras wrote: ↑Sat Jun 17, 2023 10:06 pm An interesting detail I noticed is that with your training set, the bishop pair gets a large advantage when there are no pawns. I suspected it might have something to do with Leorik struggling with KBN:K, which is why it would overrate the bishop pair compared to B+N. Here two colour swapped games at 10s/game from the same KBN:K starting position, no tablebases used, and Leorik doesn't win that endgame (unlike KBB:K).

Assuming that every engine has it's unique strengths and weaknesses then tuning Leorik on Leorik selfplay games teaches it (besides general knowledge of chess) to avoid positions that it's weak and to play towards positions where it is stronger. Could explain why I get good tuning results on my own data and for you it was a slight regression compared to Zurichess.

Ras wrote: ↑Sat Jun 17, 2023 10:06 pm Leorik struggling with KBN:K

I have no idea how to fix that without using tablebase. Do you have custom eval for certain endgames? In any case thanks for sharing that observation, I'll try to look into it!

Ras · Post by **Ras** » Sun Jun 18, 2023 12:03 am

lithander wrote: ↑Sat Jun 17, 2023 11:28 pmAssuming that every engine has it's unique strengths and weaknesses then tuning Leorik on Leorik selfplay games teaches it (besides general knowledge of chess) to avoid positions that it's weak and to play towards positions where it is stronger. Could explain why I get good tuning results on my own data and for you it was a slight regression compared to Zurichess.

Makes sense, and the miracle of the Zurichess set is how generic it is - and that although the so-called "quiet" set isn't even quiet because there are some 20k positions where the side to move is in check. However, resolving that doesn't change the outcome anyway. I also noticed that with your training set, and Andrew's as well, there's a huge difference in pawn MG/EG value, like 70/130 or so. The Zurichess data don't lead to this. So I conclude that both Ethereal and Leorik are strong attackers and hence like to sac a pawn for the initiative early in the game.

I have no idea how to fix that without using tablebase. Do you have custom eval for certain endgames?

Yes, and it's actually very easy. Basically some special PSTs for that endgame, depending on what colour the bishop is, and override the standard eval. See my eval.c, which is really messy, but you will have no trouble to get the idea on that one. Other special endgames I have are e.g. KP:K with a 24k bitbase, and also "wrong bishop" plus rim pawn, and code for KQ:KR. Doesn't really give noticeable Elo, but was a good pretext for avoiding to address my king safety issues.

Robert Pope · Post by **Robert Pope** » Sun Jun 18, 2023 4:56 am

I also noticed that it looks like the Leorik data is only decisive games - 1-0 or 0-1, no draws. I wonder if that affects the results, too.

Labeled positions for Texel tuning

Labeled positions for Texel tuning

Re: Labeled positions for Texel tuning

Re: Labeled positions for Texel tuning

Re: Labeled positions for Texel tuning

Re: Labeled positions for Texel tuning

Re: Labeled positions for Texel tuning

Re: Labeled positions for Texel tuning

Re: Labeled positions for Texel tuning

Re: Labeled positions for Texel tuning

Re: Labeled positions for Texel tuning