Training positions from gm2600 pgn

maksimKorzh · Post by **maksimKorzh** » Thu Jan 07, 2021 11:55 am

Hi guys

I've generated a dataset for Texel's tuning purposes for my engine.
I've used gm2600.pgn:
4818922 positions:
1000000 positions: https://github.com/maksimKorzh/wukongJS ... itions.txt

Data sample (inspired by datasets from Ethereal data dump thread):

Code: Select all

8/p5p1/1pnnkp2/4p2p/4P3/P2K1P2/1PN3PP/4B3 b - - 0 37 [0.5]
8/4k3/7R/7p/p6P/1bK5/8/8 b - - 40 60 [0.5]
8/8/5n2/5p2/3N2k1/6P1/6K1/8 w - - 12 120 [0.5]
rn1qk2r/pb2bppp/1pp2n2/3pN3/B2P4/2N1P3/PP3PPP/R1BQK2R b KQkq - 1 9 [0.0]
r1bqr1k1/1p3pp1/3p1n1p/2p5/2Pp4/PQ1P2PP/3NPPB1/R3R1K1 b - - 1 17 [0.5]
1q1r1rk1/pb1pbppp/1p2p3/8/2PnPBn1/P1N2N2/1P1QBPPP/2R2RK1 b - - 11 15 [1.0]
5rk1/r2nbppp/pqp1p3/1p1n4/3PP3/2N2BP1/PP1B1P1P/R1QR2K1 b - - 0 17 [0.5]
8/6k1/3p1np1/b3pP1p/6P1/P3BK1P/8/2N5 b - - 2 43 [1.0]
4r1k1/r4p1p/2p1b1p1/3pP3/1p2nP2/5N1P/1PP3P1/3RRBK1 w - - 0 30 [0.5]
rn1qk2b/pb3p2/2p1pn2/4N1p1/2pPP3/2N3B1/PP3PP1/R2QK3 w Qq - 0 14 [0.0]
8/8/6kP/1pp5/4K1n1/1P6/P2B4/8 b - - 10 50 [1.0]
2Q5/4q1pk/7p/5p2/8/6R1/6KP/4r3 b - - 13 52 [0.0]
8/p2r1pk1/1np1b1p1/2p1p1b1/2P1P2p/qP3B1P/2R1NPP1/2NQ2K1 w - - 2 48 [0.0]
2kr1b2/p1q2p1p/4RQrp/2pp2N1/5p2/3P4/PPPN1P2/5K1R w - - 2 20 [0.5]
2r2rk1/4q1pp/p2n1pn1/3p3N/1p1Pp1PP/4P2B/PPP2QR1/5R1K w - - 1 26 [0.5]
1n1q1r1k/pp4pp/4Q2n/1B1p4/3P2P1/8/1R4KP/6B1 w - - 3 32 [0.0]
7k/6b1/4B2p/r7/1p2KP2/1P5P/P5R1/8 b - - 2 56 [1.0]
1r1q1r1k/1p4bp/p1npb3/3Npp2/P1N5/2P3P1/1P3PBP/R2QK2R w KQ - 0 19 [0.5]
2r5/4k3/5b1p/pp1R1Pp1/2n5/2P5/P3BPPP/2R3K1 b - - 1 32 [1.0]
3q2k1/5pp1/1b1r4/p1N5/PpQnp3/4B1P1/1P3PKP/R7 w - - 0 37 [1.0]
2r4k/1p1R2p1/p7/4p1rp/5n2/2P3NP/PP3PP1/5RK1 w - - 0 28 [1.0]
6k1/pp2Q3/2p1P2p/3p3P/3P2r1/8/P4rPK/8 w - - 3 44 [1.0]
8/5Qpk/1p5p/5p2/4q3/6P1/P4PKP/8 w - - 3 35 [0.5]
r5k1/2R2p1p/4p1pB/p1P1P3/6P1/8/2n2P1P/6K1 w - - 1 37 [0.5]
r4rk1/p2n2p1/1p2p2p/2qb3n/2N1B3/P4N2/1P3PPP/R2Q1RK1 w - - 4 20 [0.0]
r2q1rk1/pb3ppp/1pn1pn2/6B1/3P4/P2B1N2/1P3PPP/R2Q1RK1 w - - 2 19 [0.5]
4k3/p4pp1/4p2p/4P3/P3B3/1N5P/5KP1/7r w - - 1 32 [1.0]

Second link is a subset of first one.
Data is already randomized.
First 5 moves in the opening and checkmate positions are skipped.
Only those positions are picked where eval() == quiescence() so that use eval() for faster tuning.

Details of data generation: https://github.com/maksimKorzh/wukongJS ... l_tuner.py
Tune material + PST assuming tapered eval independent from an engine: https://github.com/maksimKorzh/wukongJS ... eval_tuner
I'm still debugging it, readme is about to come later.

This is my very first attempt)
Criticism is highly appreciated!

P.S. I know about gradient decent but yet too dumb to implement it)

WinPooh · Post by **WinPooh** » Wed Jan 13, 2021 12:44 pm

In my experiments with Texel tuning datasets generated from gm2600.pgn usually perform worse than from CCRL games, for subsets of equal size.
Also, I usually skip positions in check, after captures and promotions, and randomly select one position per game, to decrease correlations between samples.

maksimKorzh · Post by **maksimKorzh** » Wed Jan 13, 2021 6:27 pm

WinPooh wrote: ↑Wed Jan 13, 2021 12:44 pm In my experiments with Texel tuning datasets generated from gm2600.pgn usually perform worse than from CCRL games, for subsets of equal size.
Also, I usually skip positions in check, after captures and promotions, and randomly select one position per game, to decrease correlations between samples.

Hi Vladimir

I've been reading your texel's tuning related article in Russian on habr and aware of your experiments)
Also I've been studying your learning code as implemented in GreKo.
My implementation is engine independent and much more primitive that yours.
These are my very first experiments.
Bearing in my how bad I am in math and machine learning getting at least proof-of-concept results are good enough)
Unfortunately for noobs like me only reinventing the wheel empowered by trial and error approach is the only acceptable method of learning/improving.
Thanks for giving a feedback!

P.S. Your "learn" feature in GreKo is awesome.

WinPooh · Post by **WinPooh** » Thu Jan 14, 2021 10:45 am

maksimKorzh wrote: ↑Wed Jan 13, 2021 6:27 pmP.S. Your "learn" feature in GreKo is awesome.

Thank you! I plan to release soon a new version with better learning capabilities. For example, coordinate descent is replaced with stochastic gradient descent, which is orders of magnitude faster. Also some sort of TD-learning after each game is implemented. However, it doesn't work very well and is mostly "just for fun" feature.

maksimKorzh · Post by **maksimKorzh** » Thu Jan 14, 2021 5:41 pm

WinPooh wrote: ↑Thu Jan 14, 2021 10:45 am
maksimKorzh wrote: ↑Wed Jan 13, 2021 6:27 pmP.S. Your "learn" feature in GreKo is awesome.
Thank you! I plan to release soon a new version with better learning capabilities. For example, coordinate descent is replaced with stochastic gradient descent, which is orders of magnitude faster. Also some sort of TD-learning after each game is implemented. However, it doesn't work very well and is mostly "just for fun" feature.

I've already seen SGD in the latest version of GreKo that I've downloaded from your site - it's awesome but I'm too noob to understand the math behind it, well at least for now. Currently my main issue is the spead of calculating mean square error itself. So I kind of had 2 options:
1. Create PGN parser in my engine
2. Implement evaluation function in quiescence in python (using python-chess lib)
Assuming that second is much easier and faster and also bearing in mind the fact that python is faster than javascript I decided
to write tuner completely in python but anyway the matter of setting new FEN to internal board and evaluate it takes too long.
For instance I'm getting 5000 positions evaluated within 1 second! For comparison even my poor javascript search evaluates 1000000+ positions per 10 seconds. So I guess the most time consuming routine is set_fen() from python-chess lib. I understand that in C++ it would be much faster because C++ is
compiled, but there should be some optimization for evaluating positions.

How long does it take to evaluate million of positions in GreKo's errSq() routine? (mean square error calculator)

There's another thing that confuses me: no matter if I use dataset from self play or from gm2600 - if I calculate mean square error only ones for a set
of material weights and PSTs (opening/endgame) then my own PSTs has a slightly less error then rofChade's (3000+ engine by Ronald Friederich) PST
meanwhile rofChade's PSTs are at least 100 Elo points stronger in comparison to my own. I don't know probably I've tested with too miserable number
of positions (100K) or the dataset wasn't generated appropriately but every time mean square table was less for more poor weights - that's very confusing.

Vladimir, I was wondering for quite a bit of time regarding this non-related question - Igel chess engine
It's author said that it's a GreKo's derivative. It seems like Igel is 600+ Elo points stronger.
I can't see another way to get 600+ Elo points other that copypaste stockfish NNUE, but I guess it's not the case,
so how on earth is that possible that derivative is much stronger then the original?
Could you please comment on it?

WinPooh · Post by **WinPooh** » Thu Jan 14, 2021 7:59 pm

maksimKorzh wrote: ↑Thu Jan 14, 2021 5:41 pm How long does it take to evaluate million of positions in GreKo's errSq() routine? (mean square error calculator)

Below is output of learning on file with 1 million positions:

GreKo 210108 (08-Jan-2021)

White(1): learn CCRL_1000_K

Algorithm: stochastic gradient descent
Parameters: 204
Initial value: 0.361232

00:00:30 0.344990 -4.496285 % (5) LR = 1.0000000000
00:00:41 0.343568 -4.890070 % (2) LR = 0.1000000000
00:01:02 0.343303 -4.963350 % (4) LR = 0.0100000000
00:01:13 0.343271 -4.972200 % (2) LR = 0.0010000000
00:01:19 0.343271 -4.972200 % (1) LR = 0.0001000000
00:01:24 0.343271 -4.972200 % (1) LR = 0.0000100000
00:01:29 0.343271 -4.972200 % (1) LR = 0.0000010000
00:01:35 0.343271 -4.972200 % (1) LR = 0.0000001000
00:01:40 0.343271 -4.972200 % (1) LR = 0.0000000100
00:01:46 0.343271 -4.972200 % (1) LR = 0.0000000010
00:01:51 0.343271 -4.972200 % (1) LR = 0.0000000001

There were 20 passes through the file (sum of numbers in round braces). So, 20 millions positions took ~111 seconds, speed is ~180 K positions/sec. This includes file reading operations and fen parsing.
One million positions is already a very large dataset for my number of parameters. Typically I observe saturation at 50...100 K positions.

WinPooh · Post by **WinPooh** » Thu Jan 14, 2021 8:18 pm

maksimKorzh wrote: ↑Thu Jan 14, 2021 5:41 pmVladimir, I was wondering for quite a bit of time regarding this non-related question - Igel chess engine
It's author said that it's a GreKo's derivative. It seems like Igel is 600+ Elo points stronger.
I can't see another way to get 600+ Elo points other that copypaste stockfish NNUE, but I guess it's not the case,
so how on earth is that possible that derivative is much stronger then the original?
Could you please comment on it?

Early versions of Igel were very close to GreKo in strength - weaker at the first time, slightly stronger later.
Then author of Igel made a giant work, and now Igel is in another league. I doubt its source code contains even 10% of original GreKo's code.
Obviously, Volodymir was inspired by numerous ideas from different programs, not only from GreKo. And large gain in strength took place before he started to use NNUE.

As for me, NNUE will hardly be incorporated into my engine. I don't like cut'n'paste programming... And I'm too lazy to re-implement it by myself...

voffka · Post by **voffka** » Thu Jan 14, 2021 10:38 pm

Hello Vladimir,

WinPooh wrote: ↑Thu Jan 14, 2021 10:45 am Thank you! I plan to release soon a new version with better learning capabilities. For example, coordinate descent is replaced with stochastic gradient descent, which is orders of magnitude faster. Also some sort of TD-learning after each game is implemented. However, it doesn't work very well and is mostly "just for fun" feature.

Amazing! Let me know if you need some help in testing or the training for GreKo. I am renting 32 threads AMD EPYC hardware for Igel training/dev work, so I can always spare resources for Igel's father

maksimKorzh · Post by **maksimKorzh** » Fri Jan 15, 2021 12:08 am

voffka wrote: ↑Thu Jan 14, 2021 10:38 pm Hello Vladimir,

WinPooh wrote: ↑Thu Jan 14, 2021 10:45 am Thank you! I plan to release soon a new version with better learning capabilities. For example, coordinate descent is replaced with stochastic gradient descent, which is orders of magnitude faster. Also some sort of TD-learning after each game is implemented. However, it doesn't work very well and is mostly "just for fun" feature.
Amazing! Let me know if you need some help in testing or the training for GreKo. I am renting 32 threads AMD EPYC hardware for Igel training/dev work, so I can always spare resources for Igel's father

Guys, you're spending money money to make engine stronger... this is very cool but WHY?
I really very tempted to know what motivates you for doing that?

voffka · Post by **voffka** » Fri Jan 15, 2021 12:25 am

maksimKorzh wrote: ↑Fri Jan 15, 2021 12:08 am Guys, you're spending money money to make engine stronger... this is very cool but WHY?
I really very tempted to know what motivates you for doing that?

A man's got to have a hobby.

Training positions from gm2600 pgn

Training positions from gm2600 pgn

Re: Training positions from gm2600 pgn

Re: Training positions from gm2600 pgn

Re: Training positions from gm2600 pgn

Re: Training positions from gm2600 pgn

Re: Training positions from gm2600 pgn

Re: Training positions from gm2600 pgn

Re: Training positions from gm2600 pgn

Re: Training positions from gm2600 pgn

Re: Training positions from gm2600 pgn