For example you have initially a param1=10, param2=20, and initial error, increment param1 by 1 if error is reduced set param1 to 11. Now increment param2 by 1 (but set param1 to its original value or 10) if error is reduced compared to initial error set param2 to 21. Then finally calculate the error using the updated param1=11, param2=21. The tuning process looks stable, as the param being tuned first does not affect the subsequent param to be tuned next.
In my test using gradient or difference quotients, with score from evaluation function, it looks promising. Training positions around only 28k are from self-play games. Positions are saved for training if the game move is not a capture not a promote move not a checking move and the side to move is not in check and abs(score) <= 1000cp.
Texel tuning ref is here.
TC=10+0.1
Code: Select all
Score of new vs old: 445 - 429 - 1126 [0.504] 2000
... new playing White: 269 - 181 - 550 [0.544] 1000
... new playing Black: 176 - 248 - 576 [0.464] 1000
... White vs Black: 517 - 357 - 1126 [0.540] 2000
Elo difference: 2.8 +/- 10.1, LOS: 70.6 %, DrawRatio: 56.3 %
pp=passed pawn, mobB=bishop mobility
Code: Select all
+---------+-------+-------+
| name | old | new |
+=========+=======+=======+
| ppR2En | 3 | 3 |
+---------+-------+-------+
| ppR3En | 5 | 46 |
+---------+-------+-------+
| ppR4En | 15 | 28 |
+---------+-------+-------+
| ppR5En | 29 | 48 |
+---------+-------+-------+
| ppR6En | 66 | 66 |
+---------+-------+-------+
| ppR7En | 100 | 100 |
+---------+-------+-------+
| mobB0Op | -22 | -22 |
+---------+-------+-------+
| mobB1Op | -16 | -16 |
+---------+-------+-------+
| mobB2Op | -8 | 0 |
+---------+-------+-------+
| mobB3Op | 0 | -2 |
+---------+-------+-------+