Cooking Cheese with Texel's tuning method

AlvaroBegue · Post by **AlvaroBegue** » Wed Jun 21, 2017 5:13 pm

jdart wrote: Some people have also used methods such as https://en.wikipedia.org/wiki/BOBYQA or L-BFGS (https://en.wikipedia.org/wiki/Limited-memory_BFGS), which are available in many optimization libraries. These require approximating the gradient and the Hessian (2nd derivative), which is expensive, but on the other hand they converge fast.

I didn't know about BOBYQA, but that Wikipedia page says it does not need a gradient function, so it will create a quadratic approximation to the function using only function evaluations.

L-BFGS requires a gradient function but, as I demonstrated with RuyTune, that can be done automatically from an existing function, with a bit of work and C++ template magic. The approximation of the Hessian is something internal to the workings of L-BFGS.

There are other related algorithms that are variants of the conjugate gradient method, which can use a function that computes the product of the Hessian and a vector. This can also be done using automatic differentiation, although I don't have a good reference that explains how. If anyone is interested, I can try to explain it.

Patrice Duhamel · Post by **Patrice Duhamel** » Wed Jun 21, 2017 6:33 pm

asanjuan wrote:I don't have time to work in my own engine, so let's help others instead.

Thanks, I hope you will find more time.

asanjuan wrote:
1. Try to have equal or close to equal distribution of draws, wins and loses results.
2. More varied openings
3. More positions
4. Collect positions having more common evaluation features i.e training positions
have passers, piece outpost, closed/open positions for mobility, pins, pawn weaknesses such
as isolated, doubled, backward, rook in open files, rook on 7th ranks and others.
Collect as many as you can for positions having these features.
The way that I achieve that is collecting games where one side is stronger (say 50 elo) than the other. It narrows the draw percentage, and you can consider them as real draws because the stronger side is not able to win it against a weaker player.

The ELO difference is really important or just to reduce the draw rate ?

I'm trying to do this :

- Generate a big number of games (50000 or more)
- load games and for each position detect "important features" for the evaluation (ex: rook on 7th, outpost, passed pawns, ....)
- for each feature, add a value to a game score
- sort all games by this score (in 3 arrays, win/lost/draw)
- convert pgn to epd and save the number of positions you want to have the same count of win/lost/draw

Do you think it's a good idea ?

jdart · Post by **jdart** » Thu Jun 22, 2017 1:45 am

The only reason gradient approximation (internal or otherwise) is a problem with these methods (BOBYQA, L-BFGS) is that they can require large numbers of function evaluations for complete convergence (down to many decimal places). But for chess parameters you don't have to get the error down to some tiny level. You just need to get reasonably close to the minimum. For these purposes and with this convergence condition they are ok and gradient approximation is not an issue.

Note also these algorithms are not designed for large dimensional problems although they may work ok for those.

--Jon

asanjuan · Post by **asanjuan** » Thu Jun 22, 2017 12:55 pm

Patrice Duhamel wrote:
asanjuan wrote:I don't have time to work in my own engine, so let's help others instead.

Thanks, I hope you will find more time.

asanjuan wrote:
1. Try to have equal or close to equal distribution of draws, wins and loses results.
2. More varied openings
3. More positions
4. Collect positions having more common evaluation features i.e training positions
have passers, piece outpost, closed/open positions for mobility, pins, pawn weaknesses such
as isolated, doubled, backward, rook in open files, rook on 7th ranks and others.
Collect as many as you can for positions having these features.
The way that I achieve that is collecting games where one side is stronger (say 50 elo) than the other. It narrows the draw percentage, and you can consider them as real draws because the stronger side is not able to win it against a weaker player.
The ELO difference is really important or just to reduce the draw rate ?

I'm trying to do this :

- Generate a big number of games (50000 or more)
- load games and for each position detect "important features" for the evaluation (ex: rook on 7th, outpost, passed pawns, ....)
- for each feature, add a value to a game score
- sort all games by this score (in 3 arrays, win/lost/draw)
- convert pgn to epd and save the number of positions you want to have the same count of win/lost/draw

Do you think it's a good idea ?

well, there are two important things with the elo criteria:

1. the outcome of the game has to be related to the positional advantage of the indivitual position that you are learning from. The Texel method works by correlating positional parameters with the outcome of the game. So you need that the side with that advantage is able to convert the current positional advantages into a winning endgame.
It is easier if the stronger side has the advantage.

2. You need to narrow the draws. If the training data has lots of draws (this is, evaluation = 0) the learning process will set the parameters close to 0. if the training set doesn't have draws, the resulting evaluation parameters will raise, or even exagerate. So you need a compromise: 33% win, 33% loses, 33% draws, for example.

Take into account that slightly exagerated positional parameters leads to an interesting or aggresive playing style.

Patrice Duhamel · Post by **Patrice Duhamel** » Sat Mar 12, 2022 10:46 am

Sorry to reopen this thread 5 years later...

I tried Texel's Tuning method again with a simple evaluation (material + PST), using Zurichess quiet_labeled.epd and this time it worked.

And I had a good result tuning all evaluation parameters at once (except king safety), 571 parameters.

* result tuned quiet-labeled.epd vs Cheese 3.0.1 (10s+0.1s) : ...

Code: Select all

Rank Name                     Elo    +    - games score oppo. draws 
   1 Cheese-301-64-dev-test    20    5    5  4000   56%   -20   34% 
   2 Cheese-301-64-dev        -20    5    5  4000   44%    20   34%

Then I tried to generate my own quiet positions :

- 150000 games at 1s+0.1s with Cheese 1.9.2, 2.0, 2.2, and 3.0, using 2moves_v1.epd
- take 10 random positions by games
- in games with 30 < maxply < 150
- discard book , mate, non quiet positions, and positions with less than 5 pieces
- remove duplicate positions
- run stockfish vs stockfish at 1s+0.1s to get a more accurate game result from these positions
=> milk_v1.epd : 915453 positions (34% win, 32% lost, 34% draw)

result tuned milk_v1.epd vs Cheese 3.0.1 (10s+0.1s)

Code: Select all

Rank Name                     Elo    +    - games score oppo. draws 
   1 Cheese-301-64-dev-test     6    6    6  4000   52%    -6   26% 
   2 Cheese-301-64-dev         -6    6    6  4000   48%     6   26%

To label positions with a game result, is 1s+0.1s too fast ?
this already takes lots of time for 1 million positions (took me more than 80 hours)

How to get better results ?
Should I use more positions or better quality positions ?

jdart · Post by **jdart** » Sat Mar 12, 2022 1:24 pm

- 150000 games at 1s+0.1s with Cheese 1.9.2, 2.0, 2.2, and 3.0, using 2moves_v1.epd
this already takes lots of time for 1 million positions (took me more than 80 hours)

I haven't done this for a while but IIRC I used 0.1+0.1 for producing labeled position sets. That is 6 seconds + 0.1, so slower than your time control. But I have a lot of cores, so can run many matches in parallel.

Patrice Duhamel · Post by **Patrice Duhamel** » Sun Mar 13, 2022 12:14 pm

jdart wrote: ↑Sat Mar 12, 2022 1:24 pm I haven't done this for a while but IIRC I used 0.1+0.1 for producing labeled position sets. That is 6 seconds + 0.1, so slower than your time control. But I have a lot of cores, so can run many matches in parallel.

6s+0.1s will takes to much time, I need almost 2 hours for 1000 games.

How important is the labeling part ?

I suppose that at very fast time controls using Stockfish + tablebases + cutechess game abjudication will work for most games that are already or almost decided ? but for other positions does it help to use more time ?

lithander · Post by **lithander** » Sun Mar 13, 2022 2:42 pm

Data generation is something I have not yet experimented with so maybe it's a stupid question but why can't you just use the static evaluation on the position to generate the labels? E.g. every position that stockfish rates above 100 cp is a win. Everything closer to zero is a draw. Sure that's not accurate but is *one* playout at such a fast time control as they are discussed here really more accurate?

op12no2 · Post by **op12no2** » Sun Mar 13, 2022 8:19 pm

lithander wrote: ↑Sun Mar 13, 2022 2:42 pm Data generation is something I have not yet experimented with so maybe it's a stupid question but why can't you just use the static evaluation on the position to generate the labels? E.g. every position that stockfish rates above 100 cp is a win. Everything closer to zero is a draw. Sure that's not accurate but is *one* playout at such a fast time control as they are discussed here really more accurate?

Experimentally I had success tuning sigmoid(eval) v sigmoid(sf eval) rather then sigmoid (eval ) v wdl label - but - it's kinda cheating and I don't use it in the wild. Currently trying sigmoid(eval) v sigmoid(search depth n) - which also seems to be working and no other engines involved and no labels needed. Can loop in reinforcement fashion then maybe.

Cooking Cheese with Texel's tuning method

Re: Cooking Cheese with Texel's tuning method

Re: Cooking Cheese with Texel's tuning method

Re: Cooking Cheese with Texel's tuning method

Re: Cooking Cheese with Texel's tuning method

Re: Cooking Cheese with Texel's tuning method

Re: Cooking Cheese with Texel's tuning method

Re: Cooking Cheese with Texel's tuning method

Re: Cooking Cheese with Texel's tuning method

Re: Cooking Cheese with Texel's tuning method