Tapered Evaluation and MSE (Texel Tuning)

Ferdy · Post by **Ferdy** » Sun Jan 17, 2021 6:51 pm

Desperado wrote: ↑Sun Jan 17, 2021 9:35 am
Ferdy wrote: ↑Sun Jan 17, 2021 8:59 am
Desperado wrote: ↑Sun Jan 17, 2021 7:53 am
I keep the best mse from the start. All new training positions has to beat the best so far. For next iteration, generate new training positions, get mse, this mse has to beat the current best. ...
Forgive me, but that sounds wrong.

if you have a dataset of 10 positions like
Code: Select all
0.409683 1r1b4/3q2pk/3p1p1p/1p1BpP2/1P2P1PP/Q4K2/2R5/8 w - -,1-0
0.057722 8/5p2/p5k1/3p4/b1pP1Bp1/6P1/5P2/3qQ1K1 w - -,0-1
0.000000 3q1rn1/1p4bk/pN1p2pp/P2Ppb2/1P2B3/4BP2/4Q1PP/2R3K1 b - -,1/2-1/2
0.000000 3q2k1/R1n2p2/1rrp3p/2p3p1/1p2P1P1/1P3PB1/2PR2PK/3Q4 w - -,1/2-1/2
0.067468 r1b2rk1/1p1p1ppp/p2qpn2/8/8/4P1P1/P1QB1PBP/2R2RK1 b - -,1/2-1/2
0.250000 r1b2rk1/pp3ppp/3bpn2/1P6/3p4/3P2P1/PP1N1PBP/R1B2RK1 b - -,1-0
0.129553 3nnk2/3r4/2p4p/1p1p4/1N1P1P1P/3K1B2/1P3P2/6R1 w - -,1-0
0.019618 r5k1/2p3bp/p3N1p1/4Pb2/Np1P4/4R3/PP5P/4K3 w - -,1/2-1/2
0.019618 8/4k3/4r3/2R3p1/6P1/5K1P/8/8 b - -,1/2-1/2
0.000000 8/3R4/p1p2B1k/2P2P2/8/PK6/3pr3/2b5 w - -,1/2-1/2

MSE: 0.095366
 
then you shuffle and get another 10 positions (using the same best vector)
Code: Select all
0.250000 3r2k1/2r2pp1/4nn1p/1q5b/Np1pP2N/1P3PP1/3R1QBP/1R5K b - -,0-1
0.000000 r5k1/ppq1ppbp/2n1b1p1/4P1Bn/8/2N1QN2/PP3PPP/3BR1K1 b - -,1/2-1/2
0.250000 2r1r1k1/3n1ppp/p1pq4/2pp1b2/4PR2/PP1P1Q2/1BP3PP/R5K1 w - -,1-0
0.250000 3R4/q4pQ1/2pkr3/3p4/8/P7/KPP5/8 b - -,1-0
0.021127 2B5/p1p1b1pp/1pP1p3/1P1k4/P7/6P1/4PPKP/8 b - -,1/2-1/2
0.577215 rb2r3/1p3pk1/8/pP3Np1/N1p5/P2nP1P1/B7/5RK1 b - -,1-0
0.000000 1r6/6pk/5b1p/2pp4/R2P3P/1n2BN2/1P3PP1/6K1 b - -,1/2-1/2
0.000000 2r2k2/5p1p/5BpP/p2R4/Pb3K1P/8/5P2/8 w - -,1/2-1/2
0.000000 r1b2rk1/3n1ppp/2N1p3/2bpP3/p7/5N2/PP1B1PPP/R1R3K1 w - -,1/2-1/2

MSE: 0.136909
the MSE for the current best vector does change! (in any direction).

In the example the new candidate vector needs to beat 0.136906 then, not 0.095366 (it is the same best vector but different data)
You must recalculate the new mse for the best vector and compare to that, otherwise you compare apples and oranges.
The current best is 0.095366. Any new mse has to beat that.
No, Ferdy, that is absolutely wrong what you are saying, for sure!

Your current best vector simple does not correspond to to mse 0.095366 in the new dataset.
This vector will produce for sure another output for the mse because it is different data.

You want to know if there is a vector better than the current best vector belonging to the data you measure.
So you must update/measure the mse for the current data set and use that as reference!

In the example your best solution so far would produce 0.136909 for the new data, so if you want to
know if another vector is better, you have to compare with that number. This mse is produced by the same vector
but for the current data. Your reference vector does not produce 0.095366 any longer, not for the new data.

It is not about comparing two numbers of two different vectors. This is the same vector but different data.

I understand what you are saying, re-read my previous post here carefully of what it is all about.

Ferdy · Post by **Ferdy** » Sun Jan 17, 2021 7:18 pm

@Desperado, let's see how your tuner would behave on this. Generated sample material training positions around 14k pos, most positions have material imbalance. These are from games and positions are only saved if the game move is not a capture, not a promote move, not a checking move and the side to move is not in check.

This is my result using material only eval. There is no randomization done per iteration as the number of training positions is smaller. K=1, initial piece values are [100, 300, 300, 500, 1000] for both mg/eg which can also be seen in the plot. It is using dynamic steps (not within every parameter), starting from 8, 4, 2 then 1. Most mse improvements are in step 8, step 4 has no improvement, steps 2 and 1 have a single improvement and after 2 iterations without mse improvement tuning was aborted.

Total iterations done: 30
Best mse: 0.13605793198602772
Best parameters:

Code: Select all

+----------+--------+---------+
| par      |   init |   tuned |
+==========+========+=========+
| PawnOp   |    100 |      68 |
+----------+--------+---------+
| PawnEn   |    100 |     110 |
+----------+--------+---------+
| KnightOp |    300 |     357 |
+----------+--------+---------+
| KnightEn |    300 |     340 |
+----------+--------+---------+
| BishopOp |    300 |     356 |
+----------+--------+---------+
| BishopEn |    300 |     388 |
+----------+--------+---------+
| RookOp   |    500 |     428 |
+----------+--------+---------+
| RookEn   |    500 |     612 |
+----------+--------+---------+
| QueenOp  |   1000 |     990 |
+----------+--------+---------+
| QueenEn  |   1000 |    1104 |
+----------+--------+---------+

Desperado · Post by **Desperado** » Sun Jan 17, 2021 7:41 pm

Ferdy wrote: ↑Sun Jan 17, 2021 6:51 pm
Desperado wrote: ↑Sun Jan 17, 2021 9:35 am
Ferdy wrote: ↑Sun Jan 17, 2021 8:59 am
Desperado wrote: ↑Sun Jan 17, 2021 7:53 am
I keep the best mse from the start. All new training positions has to beat the best so far. For next iteration, generate new training positions, get mse, this mse has to beat the current best. ...
Forgive me, but that sounds wrong.

if you have a dataset of 10 positions like
Code: Select all
0.409683 1r1b4/3q2pk/3p1p1p/1p1BpP2/1P2P1PP/Q4K2/2R5/8 w - -,1-0
0.057722 8/5p2/p5k1/3p4/b1pP1Bp1/6P1/5P2/3qQ1K1 w - -,0-1
0.000000 3q1rn1/1p4bk/pN1p2pp/P2Ppb2/1P2B3/4BP2/4Q1PP/2R3K1 b - -,1/2-1/2
0.000000 3q2k1/R1n2p2/1rrp3p/2p3p1/1p2P1P1/1P3PB1/2PR2PK/3Q4 w - -,1/2-1/2
0.067468 r1b2rk1/1p1p1ppp/p2qpn2/8/8/4P1P1/P1QB1PBP/2R2RK1 b - -,1/2-1/2
0.250000 r1b2rk1/pp3ppp/3bpn2/1P6/3p4/3P2P1/PP1N1PBP/R1B2RK1 b - -,1-0
0.129553 3nnk2/3r4/2p4p/1p1p4/1N1P1P1P/3K1B2/1P3P2/6R1 w - -,1-0
0.019618 r5k1/2p3bp/p3N1p1/4Pb2/Np1P4/4R3/PP5P/4K3 w - -,1/2-1/2
0.019618 8/4k3/4r3/2R3p1/6P1/5K1P/8/8 b - -,1/2-1/2
0.000000 8/3R4/p1p2B1k/2P2P2/8/PK6/3pr3/2b5 w - -,1/2-1/2

MSE: 0.095366
 
then you shuffle and get another 10 positions (using the same best vector)
Code: Select all
0.250000 3r2k1/2r2pp1/4nn1p/1q5b/Np1pP2N/1P3PP1/3R1QBP/1R5K b - -,0-1
0.000000 r5k1/ppq1ppbp/2n1b1p1/4P1Bn/8/2N1QN2/PP3PPP/3BR1K1 b - -,1/2-1/2
0.250000 2r1r1k1/3n1ppp/p1pq4/2pp1b2/4PR2/PP1P1Q2/1BP3PP/R5K1 w - -,1-0
0.250000 3R4/q4pQ1/2pkr3/3p4/8/P7/KPP5/8 b - -,1-0
0.021127 2B5/p1p1b1pp/1pP1p3/1P1k4/P7/6P1/4PPKP/8 b - -,1/2-1/2
0.577215 rb2r3/1p3pk1/8/pP3Np1/N1p5/P2nP1P1/B7/5RK1 b - -,1-0
0.000000 1r6/6pk/5b1p/2pp4/R2P3P/1n2BN2/1P3PP1/6K1 b - -,1/2-1/2
0.000000 2r2k2/5p1p/5BpP/p2R4/Pb3K1P/8/5P2/8 w - -,1/2-1/2
0.000000 r1b2rk1/3n1ppp/2N1p3/2bpP3/p7/5N2/PP1B1PPP/R1R3K1 w - -,1/2-1/2

MSE: 0.136909
the MSE for the current best vector does change! (in any direction).

In the example the new candidate vector needs to beat 0.136906 then, not 0.095366 (it is the same best vector but different data)
You must recalculate the new mse for the best vector and compare to that, otherwise you compare apples and oranges.
The current best is 0.095366. Any new mse has to beat that.
No, Ferdy, that is absolutely wrong what you are saying, for sure!

Your current best vector simple does not correspond to to mse 0.095366 in the new dataset.
This vector will produce for sure another output for the mse because it is different data.

You want to know if there is a vector better than the current best vector belonging to the data you measure.
So you must update/measure the mse for the current data set and use that as reference!

In the example your best solution so far would produce 0.136909 for the new data, so if you want to
know if another vector is better, you have to compare with that number. This mse is produced by the same vector
but for the current data. Your reference vector does not produce 0.095366 any longer, not for the new data.

It is not about comparing two numbers of two different vectors. This is the same vector but different data.
I understand what you are saying, re-read my previous post here carefully of what it is all about.

Please, just point out what i do miss.

I keep the best mse from the start. All new training positions has to beat the best so far. For next iteration, generate new training positions, get mse, this mse has to beat the current best. ...

And as you see in the quote above, you answered that you would compare mse of different data sets.
You cannot use the mse of the first data set to see if mse of a second data set beats it.

The current best is 0.095366. Any new mse has to beat that.

"All new training positions has to beat the best so far" means they need to be better than 0.136909 not 0.095366.
Both mse are related to the same vector. It is irrelevant what you measure afterwards.
If i miss an important point, please just be clear what i miss. Thank you.

If you would not take the 0.136909 you would run pretty early in a stop condition of your iterations.
For small data sets you will find pretty fast a dataset that cannot be improved, simply because the mse from
the dataset cannot be produced for that data, even the same vector might not be able to equalize.

Ferdy · Post by **Ferdy** » Sun Jan 17, 2021 7:44 pm

Desperado wrote: ↑Sun Jan 17, 2021 7:41 pm
Ferdy wrote: ↑Sun Jan 17, 2021 6:51 pm
Desperado wrote: ↑Sun Jan 17, 2021 9:35 am
Ferdy wrote: ↑Sun Jan 17, 2021 8:59 am
Desperado wrote: ↑Sun Jan 17, 2021 7:53 am
I keep the best mse from the start. All new training positions has to beat the best so far. For next iteration, generate new training positions, get mse, this mse has to beat the current best. ...
Forgive me, but that sounds wrong.

if you have a dataset of 10 positions like
Code: Select all
0.409683 1r1b4/3q2pk/3p1p1p/1p1BpP2/1P2P1PP/Q4K2/2R5/8 w - -,1-0
0.057722 8/5p2/p5k1/3p4/b1pP1Bp1/6P1/5P2/3qQ1K1 w - -,0-1
0.000000 3q1rn1/1p4bk/pN1p2pp/P2Ppb2/1P2B3/4BP2/4Q1PP/2R3K1 b - -,1/2-1/2
0.000000 3q2k1/R1n2p2/1rrp3p/2p3p1/1p2P1P1/1P3PB1/2PR2PK/3Q4 w - -,1/2-1/2
0.067468 r1b2rk1/1p1p1ppp/p2qpn2/8/8/4P1P1/P1QB1PBP/2R2RK1 b - -,1/2-1/2
0.250000 r1b2rk1/pp3ppp/3bpn2/1P6/3p4/3P2P1/PP1N1PBP/R1B2RK1 b - -,1-0
0.129553 3nnk2/3r4/2p4p/1p1p4/1N1P1P1P/3K1B2/1P3P2/6R1 w - -,1-0
0.019618 r5k1/2p3bp/p3N1p1/4Pb2/Np1P4/4R3/PP5P/4K3 w - -,1/2-1/2
0.019618 8/4k3/4r3/2R3p1/6P1/5K1P/8/8 b - -,1/2-1/2
0.000000 8/3R4/p1p2B1k/2P2P2/8/PK6/3pr3/2b5 w - -,1/2-1/2

MSE: 0.095366
 
then you shuffle and get another 10 positions (using the same best vector)
Code: Select all
0.250000 3r2k1/2r2pp1/4nn1p/1q5b/Np1pP2N/1P3PP1/3R1QBP/1R5K b - -,0-1
0.000000 r5k1/ppq1ppbp/2n1b1p1/4P1Bn/8/2N1QN2/PP3PPP/3BR1K1 b - -,1/2-1/2
0.250000 2r1r1k1/3n1ppp/p1pq4/2pp1b2/4PR2/PP1P1Q2/1BP3PP/R5K1 w - -,1-0
0.250000 3R4/q4pQ1/2pkr3/3p4/8/P7/KPP5/8 b - -,1-0
0.021127 2B5/p1p1b1pp/1pP1p3/1P1k4/P7/6P1/4PPKP/8 b - -,1/2-1/2
0.577215 rb2r3/1p3pk1/8/pP3Np1/N1p5/P2nP1P1/B7/5RK1 b - -,1-0
0.000000 1r6/6pk/5b1p/2pp4/R2P3P/1n2BN2/1P3PP1/6K1 b - -,1/2-1/2
0.000000 2r2k2/5p1p/5BpP/p2R4/Pb3K1P/8/5P2/8 w - -,1/2-1/2
0.000000 r1b2rk1/3n1ppp/2N1p3/2bpP3/p7/5N2/PP1B1PPP/R1R3K1 w - -,1/2-1/2

MSE: 0.136909
the MSE for the current best vector does change! (in any direction).

In the example the new candidate vector needs to beat 0.136906 then, not 0.095366 (it is the same best vector but different data)
You must recalculate the new mse for the best vector and compare to that, otherwise you compare apples and oranges.
The current best is 0.095366. Any new mse has to beat that.
No, Ferdy, that is absolutely wrong what you are saying, for sure!

Your current best vector simple does not correspond to to mse 0.095366 in the new dataset.
This vector will produce for sure another output for the mse because it is different data.

You want to know if there is a vector better than the current best vector belonging to the data you measure.
So you must update/measure the mse for the current data set and use that as reference!

In the example your best solution so far would produce 0.136909 for the new data, so if you want to
know if another vector is better, you have to compare with that number. This mse is produced by the same vector
but for the current data. Your reference vector does not produce 0.095366 any longer, not for the new data.

It is not about comparing two numbers of two different vectors. This is the same vector but different data.
I understand what you are saying, re-read my previous post here carefully of what it is all about.
Please, just point out what i do miss.

I keep the best mse from the start. All new training positions has to beat the best so far. For next iteration, generate new training positions, get mse, this mse has to beat the current best. ...
And as you see in the quote above, you answered that you would compare mse of different data sets.
You cannot use the mse of the first data set to see if mse of a second data set beats it.

"All new training positions has to beat the best so far" means they need to be better than 0.136909 not 0.095366.
Both mse are related to the same vector.

The current best is 0.095366. Any new mse has to beat that.
It is irrelevant what you measure afterwards. If i miss an important point, please just be clear what i miss. Thank you.

This one:

I keep the best mse from the start. All new training positions has to beat the best so far. For next iteration, generate new training positions, get mse, this mse has to beat the current best. Call it a variation from Texel tuning method. Based from my experiments its results are not bad at all, results mean game tests after the tuning. There is a minimum mini-batch size that has to be used, depends on the training positions too like if you are training for material, use training positions with material imbalance. If you are training for passed pawns, use training positions with passed pawn imbalance. This is why I can afford to generate random positions from iteration to iteration and compare independent mse because the feature I am training for is already there in all new training positions.

Desperado · Post by **Desperado** » Sun Jan 17, 2021 7:47 pm

Ferdy wrote: ↑Sun Jan 17, 2021 7:44 pm
Desperado wrote: ↑Sun Jan 17, 2021 7:41 pm
Ferdy wrote: ↑Sun Jan 17, 2021 6:51 pm
Desperado wrote: ↑Sun Jan 17, 2021 9:35 am
Ferdy wrote: ↑Sun Jan 17, 2021 8:59 am
Desperado wrote: ↑Sun Jan 17, 2021 7:53 am
I keep the best mse from the start. All new training positions has to beat the best so far. For next iteration, generate new training positions, get mse, this mse has to beat the current best. ...
Forgive me, but that sounds wrong.

if you have a dataset of 10 positions like
Code: Select all
0.409683 1r1b4/3q2pk/3p1p1p/1p1BpP2/1P2P1PP/Q4K2/2R5/8 w - -,1-0
0.057722 8/5p2/p5k1/3p4/b1pP1Bp1/6P1/5P2/3qQ1K1 w - -,0-1
0.000000 3q1rn1/1p4bk/pN1p2pp/P2Ppb2/1P2B3/4BP2/4Q1PP/2R3K1 b - -,1/2-1/2
0.000000 3q2k1/R1n2p2/1rrp3p/2p3p1/1p2P1P1/1P3PB1/2PR2PK/3Q4 w - -,1/2-1/2
0.067468 r1b2rk1/1p1p1ppp/p2qpn2/8/8/4P1P1/P1QB1PBP/2R2RK1 b - -,1/2-1/2
0.250000 r1b2rk1/pp3ppp/3bpn2/1P6/3p4/3P2P1/PP1N1PBP/R1B2RK1 b - -,1-0
0.129553 3nnk2/3r4/2p4p/1p1p4/1N1P1P1P/3K1B2/1P3P2/6R1 w - -,1-0
0.019618 r5k1/2p3bp/p3N1p1/4Pb2/Np1P4/4R3/PP5P/4K3 w - -,1/2-1/2
0.019618 8/4k3/4r3/2R3p1/6P1/5K1P/8/8 b - -,1/2-1/2
0.000000 8/3R4/p1p2B1k/2P2P2/8/PK6/3pr3/2b5 w - -,1/2-1/2

MSE: 0.095366
 
then you shuffle and get another 10 positions (using the same best vector)
Code: Select all
0.250000 3r2k1/2r2pp1/4nn1p/1q5b/Np1pP2N/1P3PP1/3R1QBP/1R5K b - -,0-1
0.000000 r5k1/ppq1ppbp/2n1b1p1/4P1Bn/8/2N1QN2/PP3PPP/3BR1K1 b - -,1/2-1/2
0.250000 2r1r1k1/3n1ppp/p1pq4/2pp1b2/4PR2/PP1P1Q2/1BP3PP/R5K1 w - -,1-0
0.250000 3R4/q4pQ1/2pkr3/3p4/8/P7/KPP5/8 b - -,1-0
0.021127 2B5/p1p1b1pp/1pP1p3/1P1k4/P7/6P1/4PPKP/8 b - -,1/2-1/2
0.577215 rb2r3/1p3pk1/8/pP3Np1/N1p5/P2nP1P1/B7/5RK1 b - -,1-0
0.000000 1r6/6pk/5b1p/2pp4/R2P3P/1n2BN2/1P3PP1/6K1 b - -,1/2-1/2
0.000000 2r2k2/5p1p/5BpP/p2R4/Pb3K1P/8/5P2/8 w - -,1/2-1/2
0.000000 r1b2rk1/3n1ppp/2N1p3/2bpP3/p7/5N2/PP1B1PPP/R1R3K1 w - -,1/2-1/2

MSE: 0.136909
the MSE for the current best vector does change! (in any direction).

In the example the new candidate vector needs to beat 0.136906 then, not 0.095366 (it is the same best vector but different data)
You must recalculate the new mse for the best vector and compare to that, otherwise you compare apples and oranges.
The current best is 0.095366. Any new mse has to beat that.
No, Ferdy, that is absolutely wrong what you are saying, for sure!

Your current best vector simple does not correspond to to mse 0.095366 in the new dataset.
This vector will produce for sure another output for the mse because it is different data.

You want to know if there is a vector better than the current best vector belonging to the data you measure.
So you must update/measure the mse for the current data set and use that as reference!

In the example your best solution so far would produce 0.136909 for the new data, so if you want to
know if another vector is better, you have to compare with that number. This mse is produced by the same vector
but for the current data. Your reference vector does not produce 0.095366 any longer, not for the new data.

It is not about comparing two numbers of two different vectors. This is the same vector but different data.
I understand what you are saying, re-read my previous post here carefully of what it is all about.
Please, just point out what i do miss.

I keep the best mse from the start. All new training positions has to beat the best so far. For next iteration, generate new training positions, get mse, this mse has to beat the current best. ...
And as you see in the quote above, you answered that you would compare mse of different data sets.
You cannot use the mse of the first data set to see if mse of a second data set beats it.

"All new training positions has to beat the best so far" means they need to be better than 0.136909 not 0.095366.
Both mse are related to the same vector.

The current best is 0.095366. Any new mse has to beat that.
It is irrelevant what you measure afterwards. If i miss an important point, please just be clear what i miss. Thank you.
This one:

I keep the best mse from the start. All new training positions has to beat the best so far. For next iteration, generate new training positions, get mse, this mse has to beat the current best. Call it a variation from Texel tuning method. Based from my experiments its results are not bad at all, results mean game tests after the tuning. There is a minimum mini-batch size that has to be used, depends on the training positions too like if you are training for material, use training positions with material imbalance. If you are training for passed pawns, use training positions with passed pawn imbalance. This is why I can afford to generate random positions from iteration to iteration and compare independent mse because the feature I am training for is already there in all new training positions.

we were both writing at the same time... please look the latest post again. And that is not a point, that is a story

.

Desperado · Post by **Desperado** » Sun Jan 17, 2021 7:51 pm

Long story short, you compared mse from different data sets, when choosing 0.095366

All new training positions has to beat the best so far

means they need to be better than 0.136909 not 0.095366.

Both mse are related to the same vector. It is irrelevant what you measure afterwards.
You will find pretty fast a dataset that cannot be improved by any vector, simply because the mse from
the dataset cannot be produced for that data, even the same vector might not be able to equalize.

Ferdy · Post by **Ferdy** » Sun Jan 17, 2021 7:52 pm

Desperado wrote: ↑Sun Jan 17, 2021 7:47 pm
Ferdy wrote: ↑Sun Jan 17, 2021 7:44 pm
Desperado wrote: ↑Sun Jan 17, 2021 7:41 pm
Ferdy wrote: ↑Sun Jan 17, 2021 6:51 pm
Desperado wrote: ↑Sun Jan 17, 2021 9:35 am
Ferdy wrote: ↑Sun Jan 17, 2021 8:59 am
Desperado wrote: ↑Sun Jan 17, 2021 7:53 am
I keep the best mse from the start. All new training positions has to beat the best so far. For next iteration, generate new training positions, get mse, this mse has to beat the current best. ...
Forgive me, but that sounds wrong.

if you have a dataset of 10 positions like
Code: Select all
0.409683 1r1b4/3q2pk/3p1p1p/1p1BpP2/1P2P1PP/Q4K2/2R5/8 w - -,1-0
0.057722 8/5p2/p5k1/3p4/b1pP1Bp1/6P1/5P2/3qQ1K1 w - -,0-1
0.000000 3q1rn1/1p4bk/pN1p2pp/P2Ppb2/1P2B3/4BP2/4Q1PP/2R3K1 b - -,1/2-1/2
0.000000 3q2k1/R1n2p2/1rrp3p/2p3p1/1p2P1P1/1P3PB1/2PR2PK/3Q4 w - -,1/2-1/2
0.067468 r1b2rk1/1p1p1ppp/p2qpn2/8/8/4P1P1/P1QB1PBP/2R2RK1 b - -,1/2-1/2
0.250000 r1b2rk1/pp3ppp/3bpn2/1P6/3p4/3P2P1/PP1N1PBP/R1B2RK1 b - -,1-0
0.129553 3nnk2/3r4/2p4p/1p1p4/1N1P1P1P/3K1B2/1P3P2/6R1 w - -,1-0
0.019618 r5k1/2p3bp/p3N1p1/4Pb2/Np1P4/4R3/PP5P/4K3 w - -,1/2-1/2
0.019618 8/4k3/4r3/2R3p1/6P1/5K1P/8/8 b - -,1/2-1/2
0.000000 8/3R4/p1p2B1k/2P2P2/8/PK6/3pr3/2b5 w - -,1/2-1/2

MSE: 0.095366
 
then you shuffle and get another 10 positions (using the same best vector)
Code: Select all
0.250000 3r2k1/2r2pp1/4nn1p/1q5b/Np1pP2N/1P3PP1/3R1QBP/1R5K b - -,0-1
0.000000 r5k1/ppq1ppbp/2n1b1p1/4P1Bn/8/2N1QN2/PP3PPP/3BR1K1 b - -,1/2-1/2
0.250000 2r1r1k1/3n1ppp/p1pq4/2pp1b2/4PR2/PP1P1Q2/1BP3PP/R5K1 w - -,1-0
0.250000 3R4/q4pQ1/2pkr3/3p4/8/P7/KPP5/8 b - -,1-0
0.021127 2B5/p1p1b1pp/1pP1p3/1P1k4/P7/6P1/4PPKP/8 b - -,1/2-1/2
0.577215 rb2r3/1p3pk1/8/pP3Np1/N1p5/P2nP1P1/B7/5RK1 b - -,1-0
0.000000 1r6/6pk/5b1p/2pp4/R2P3P/1n2BN2/1P3PP1/6K1 b - -,1/2-1/2
0.000000 2r2k2/5p1p/5BpP/p2R4/Pb3K1P/8/5P2/8 w - -,1/2-1/2
0.000000 r1b2rk1/3n1ppp/2N1p3/2bpP3/p7/5N2/PP1B1PPP/R1R3K1 w - -,1/2-1/2

MSE: 0.136909
the MSE for the current best vector does change! (in any direction).

In the example the new candidate vector needs to beat 0.136906 then, not 0.095366 (it is the same best vector but different data)
You must recalculate the new mse for the best vector and compare to that, otherwise you compare apples and oranges.
The current best is 0.095366. Any new mse has to beat that.
No, Ferdy, that is absolutely wrong what you are saying, for sure!

Your current best vector simple does not correspond to to mse 0.095366 in the new dataset.
This vector will produce for sure another output for the mse because it is different data.

You want to know if there is a vector better than the current best vector belonging to the data you measure.
So you must update/measure the mse for the current data set and use that as reference!

In the example your best solution so far would produce 0.136909 for the new data, so if you want to
know if another vector is better, you have to compare with that number. This mse is produced by the same vector
but for the current data. Your reference vector does not produce 0.095366 any longer, not for the new data.

It is not about comparing two numbers of two different vectors. This is the same vector but different data.
I understand what you are saying, re-read my previous post here carefully of what it is all about.
Please, just point out what i do miss.

I keep the best mse from the start. All new training positions has to beat the best so far. For next iteration, generate new training positions, get mse, this mse has to beat the current best. ...
And as you see in the quote above, you answered that you would compare mse of different data sets.
You cannot use the mse of the first data set to see if mse of a second data set beats it.

"All new training positions has to beat the best so far" means they need to be better than 0.136909 not 0.095366.
Both mse are related to the same vector.

The current best is 0.095366. Any new mse has to beat that.
It is irrelevant what you measure afterwards. If i miss an important point, please just be clear what i miss. Thank you.
This one:

I keep the best mse from the start. All new training positions has to beat the best so far. For next iteration, generate new training positions, get mse, this mse has to beat the current best. Call it a variation from Texel tuning method. Based from my experiments its results are not bad at all, results mean game tests after the tuning. There is a minimum mini-batch size that has to be used, depends on the training positions too like if you are training for material, use training positions with material imbalance. If you are training for passed pawns, use training positions with passed pawn imbalance. This is why I can afford to generate random positions from iteration to iteration and compare independent mse because the feature I am training for is already there in all new training positions.
we were both writing at the same time... please look the latest post again. And that is not a point, that is a story .

The point could be in the story, but I won't tell it

Desperado · Post by **Desperado** » Sun Jan 17, 2021 7:53 pm

Ferdy wrote: ↑Sun Jan 17, 2021 7:18 pm @Desperado, let's see how your tuner would behave on this. Generated sample material training positions around 14k pos, most positions have material imbalance. These are from games and positions are only saved if the game move is not a capture, not a promote move, not a checking move and the side to move is not in check.

This is my result using material only eval. There is no randomization done per iteration as the number of training positions is smaller. K=1, initial piece values are [100, 300, 300, 500, 1000] for both mg/eg which can also be seen in the plot. It is using dynamic steps (not within every parameter), starting from 8, 4, 2 then 1. Most mse improvements are in step 8, step 4 has no improvement, steps 2 and 1 have a single improvement and after 2 iterations without mse improvement tuning was aborted.

Total iterations done: 30
Best mse: 0.13605793198602772
Best parameters:
Code: Select all
+----------+--------+---------+
| par      |   init |   tuned |
+==========+========+=========+
| PawnOp   |    100 |      68 |
+----------+--------+---------+
| PawnEn   |    100 |     110 |
+----------+--------+---------+
| KnightOp |    300 |     357 |
+----------+--------+---------+
| KnightEn |    300 |     340 |
+----------+--------+---------+
| BishopOp |    300 |     356 |
+----------+--------+---------+
| BishopEn |    300 |     388 |
+----------+--------+---------+
| RookOp   |    500 |     428 |
+----------+--------+---------+
| RookEn   |    500 |     612 |
+----------+--------+---------+
| QueenOp  |   1000 |     990 |
+----------+--------+---------+
| QueenEn  |   1000 |    1104 |
+----------+--------+---------+

Algorithm: cpw-algorithm
Stepsize: 5
evaltype: qs() tapered - material only
initial vector: 100,100,300,300,300,300,500,500,1000,1000
param-content: P,P,N,N,B,B,R,R,Q,Q
anchor: none
K: 1.0
database: material.epd 13994 positions
batchsize: 13994
Data: no modification

Code: Select all

MG:  65 350 350 410 970
EG: 115 330 370 615 1080 best: 0.136443 epoch: 31

I used a fix step of five. To use variable stepsize i need to update my code now.
But the data looks comparable i guess!

Desperado · Post by **Desperado** » Sun Jan 17, 2021 8:27 pm

Now with variable stepsize 8,4,2,1

Algorithm: cpw-algorithm
Stepsize: variabel 8,4,2,1
evaltype: qs() tapered - material only
initial vector: 100,100,300,300,300,300,500,500,1000,1000
param-content: P,P,N,N,B,B,R,R,Q,Q
anchor: none
K: 1.0
database: material.epd 13994 positions
batchsize: 13994
Data: no modification

Code: Select all

MG:  64 348 344 404  952
EG: 124 384 436 696 1260 best: 0.135957 epoch: 56 (your mse: 0.13605793198602772)

I guess my basic setup gets a better mse, really ?

Let me do it once

- the moral of the story is, don' t compare mse of different data set, lol

You can feed your tuner with these values and look for an improvement. That would be interesting.

Desperado · Post by **Desperado** » Sun Jan 17, 2021 8:42 pm

Please don't hate me, but here is the tuner algorithm in two methods.

The minimizing code

Code: Select all

double Tuner::minimize(param_t* param, int stepsize)
{
    double fitness;
    bool improved = TRUE;
    int backup;

    // mse for the current vector
    double bestFitness = mse();

    for(int i = 0; i < pcount; i++)
    {
        improved = FALSE;
        backup = *param[i].score;

        // modify param
        *param[i].score = backup + stepsize;
        fitness = mse();

        if(fitness < bestFitness)
        {
            bestFitness = fitness;
            improved = TRUE;
        }
        else
        {
            *param[i].score = backup - stepsize;
            fitness = mse();
            if(fitness < bestFitness)
            {
                bestFitness = fitness;
                improved = TRUE;
            }
        }

        // reset - keep original
        if(improved == FALSE)
            *param[i].score = backup;
    }

    return bestFitness;
}

The solver including the dynamic stepsize improvement

Code: Select all

void Tuner::solve()
{
    double fitness, bestFitness = mse();
    int epoch = 0;
    int stepsize = 8;

    printf("\nStart with %.16f", bestFitness);
    getchar();

    for(epoch = 0; epoch < 10000; epoch++)
    {
        //fitness = minimize(param);
        fitness = minimize(param, stepsize);

        if(fitness < bestFitness)
        {
            bestFitness = fitness;
            stepsize = 8;
        }
        else
        {
            if(stepsize == 1) break;
            stepsize /= 2;
        }

        printParameters(param);
        printf(" best: %f epoch: %d", bestFitness, epoch);
    }

    printParameters(param);
    printf("Done: best: %f epoch: %d", bestFitness, epoch);
}

Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)