evaluation tuning - where to start?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

evaluation tuning - where to start?

Post by maksimKorzh »

Hi guys

Recently I've tested PeSTO piece square tables with my engine and if I did everything correctly it seems to gain 200! Elo points compared to my handcrafted tables.

Now I want to pick up some of the most basic techniques for automated evaluation tuning. I've been reading posts on texel's and gaviota'tuning here on talkchess but due to the lack of experience and overall dumbness didn't get that much to implement something similar in my engine.

I loved the idea of using existing games instead of my own for tuning purposes. I would like to get tables similar to those used in PeSTO but on my own and from scratch.

Is that possible?
What would you suggest?
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: evaluation tuning - where to start?

Post by jdart »

You could see https://www.chessprogramming.org/Automated_Tuning for starters.

The method most people are using is technically supervised learning using logistic regression. There is a very large literature on this, not limited to chess.
User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

Re: evaluation tuning - where to start?

Post by maksimKorzh »

I feel a bit overwhelmed after getting familiar with various options available, so here's a more concrete request:

Is this possible to obtain good material/PST values FROM SCRATCH using machine learning and existing data?
(e.g. running a script processing million PGNs and creating material/PST values based on that)

I would like to implement it in python as a separate project.
Maybe something similar already exists?

Dead simple noob's implementations are my most interest.
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: evaluation tuning - where to start?

Post by mvanthoor »

This might not help, but I have an opinion about moving too fast with development and trying to go places for which the program isn't ready yet. To be honest, I have a personal benchmark....

Fritz 11 on CCRL, Elo 2852

What does this mean?

Well, I've owned Fritz 11 (the program by Chessbase, engine by Frans Morsch) for 13 years, so I know this program pretty well. The GUI has been my go-to program since very recently. (And the reason I'm switching to Fritz 17, reluctantly, is because the GUI and Windows compatibility is starting to fall apart on newer computers.) This program is from 2007, so the engine was written somewhere in 2006. The engine does not support multiple processors, and it is from before the time that magic bit boards, Texel evaluation tuning, and some of the newer search functions where commonplace or even existed. Moreso, it's 32-bit, so compared to current programs, all other things being equal, it's slow.

Still, Fritz 11 holds a CCRL-rating of 2852.

For me, this means that I don't even have to look at some of the more advanced AI-like techniques until I reach at least 2750+ or even 2800+ with my engine. The only technique I can think of that Fritz 11 _may_ have used in some form is evaluation tuning.

However, doing this with a relatively new program is not of much use. The reason is that the evaluation tuning is done to make the program as strong as possible in its current state, so as soon as you add something new in either search or the evaluation function, you'll have to retune. This costs a lot of time. I think it would be more beneficial to first get all the 'classic' techniques right, tested, and verified, and THEN look into tuning, after your program hits 2750 Elo or thereabout.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: evaluation tuning - where to start?

Post by Daniel Shawul »

Yes, it is possible to get piece values from scratch using supervized learning.
Infact, the first step is to make sure you get resonable values (1:3:5:9 ratio) before attempting to train piece square tables.
So basically, write an eval() funtion that is based on piece values only i.e. eval = sum(W_i * n_i, i=1,5) where W_i are weights (or piece values)
Prepare a set of positions * for training that are labeled with scores either centi-pawn or winning percentage (W+D/2).
Then, you basically do a logistic regression to find the piece values.
The regression tried to minimize a loss function, which is either mean-squared-error (simpler) or
cross-entropy loss of your WDL probabilities. You can use gradient descent with minibatches, to update paramters iteratively.

* It is preferrable if your training positons are "quiet" (have no immediate tactics)
User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

Re: evaluation tuning - where to start?

Post by maksimKorzh »

mvanthoor wrote: Wed Sep 30, 2020 12:00 am This might not help, but I have an opinion about moving too fast with development and trying to go places for which the program isn't ready yet. To be honest, I have a personal benchmark....

Fritz 11 on CCRL, Elo 2852

What does this mean?

Well, I've owned Fritz 11 (the program by Chessbase, engine by Frans Morsch) for 13 years, so I know this program pretty well. The GUI has been my go-to program since very recently. (And the reason I'm switching to Fritz 17, reluctantly, is because the GUI and Windows compatibility is starting to fall apart on newer computers.) This program is from 2007, so the engine was written somewhere in 2006. The engine does not support multiple processors, and it is from before the time that magic bit boards, Texel evaluation tuning, and some of the newer search functions where commonplace or even existed. Moreso, it's 32-bit, so compared to current programs, all other things being equal, it's slow.

Still, Fritz 11 holds a CCRL-rating of 2852.

For me, this means that I don't even have to look at some of the more advanced AI-like techniques until I reach at least 2750+ or even 2800+ with my engine. The only technique I can think of that Fritz 11 _may_ have used in some form is evaluation tuning.

However, doing this with a relatively new program is not of much use. The reason is that the evaluation tuning is done to make the program as strong as possible in its current state, so as soon as you add something new in either search or the evaluation function, you'll have to retune. This costs a lot of time. I think it would be more beneficial to first get all the 'classic' techniques right, tested, and verified, and THEN look into tuning, after your program hits 2750 Elo or thereabout.
Marcel, I'd love to, I just don't want to take things like double pawn penalty or passed pawn bonus either from my head or blindly grabbing it from someone's code. I want to learn where the relations between evaluation params are coming from - that's it. And when I understand that I would then want try to compose my own instead of using someoneelses's.

I pretty like the idea of PeSTO - this is in my spirit, but I want to get my own PST values and not grabbing them from Ronald's engine like it is now.
User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

Re: evaluation tuning - where to start?

Post by maksimKorzh »

Daniel Shawul wrote: Wed Sep 30, 2020 12:09 am Yes, it is possible to get piece values from scratch using supervized learning.
Infact, the first step is to make sure you get resonable values (1:3:5:9 ratio) before attempting to train piece square tables.
So basically, write an eval() funtion that is based on piece values only i.e. eval = sum(W_i * n_i, i=1,5) where W_i are weights (or piece values)
Prepare a set of positions * for training that are labeled with scores either centi-pawn or winning percentage (W+D/2).
Then, you basically do a logistic regression to find the piece values.
The regression tried to minimize a loss function, which is either mean-squared-error (simpler) or
cross-entropy loss of your WDL probabilities. You can use gradient descent with minibatches, to update paramters iteratively.

* It is preferrable if your training positons are "quiet" (have no immediate tactics)
Thanks Daniel, sounds like exactly what I want. I love the idea of training piece values first.
Can I get an example implementation of logistic regression in regards to piece values? Preferably in python
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: evaluation tuning - where to start?

Post by mvanthoor »

maksimKorzh wrote: Wed Sep 30, 2020 1:35 am Marcel, I'd love to, I just don't want to take things like double pawn penalty or passed pawn bonus either from my head or blindly grabbing it from someone's code. I want to learn where the relations between evaluation params are coming from - that's it. And when I understand that I would then want try to compose my own instead of using someoneelses's.

I pretty like the idea of PeSTO - this is in my spirit, but I want to get my own PST values and not grabbing them from Ronald's engine like it is now.
Ah, like that... I also don't know what values should be in there. Maybe I'm going to try some pre-made PSQT tables from another engine to get the search and evaluation going, and then alter them. Maybe I'm going to create some "generic" ones with my own values. For example, for a white king (in the opening, with A1 at the top left):

Code: Select all

0 10 0 -10 0 5 10 0
0  0 0   0 0 0  0 0
0  0 0   0 0 0  0 0
0  0 0   0 0 0  0 0
0  0 0   0 0 0  0 0
0  0 0   0 0 0  0 0
0  0 0   0 0 0  0 0
0  0 0   0 0 0  0 0
If you don't do anything else, in a PSQT like that, the king on G1 will have a +20 value compared to one on E1, and +15 on C1 compared to E1. So the program will try to castle short; if it can't, it will castle long. If it goes long, B1 has a +5 compared to C1, so if there's nothing else to do (nothing that gains the program more than +5), it will make the Kc1-b1 move.

Making your own PSQT's and piece values is not hard; and you can always start with something from another engine and then start tinkering with those to make them to your liking.

This is what I meant in other threads: by playing with these values, you can give your program personality. You can, for example, urge it to castle long instead of short. Castling short is the most common move of the two. I wouldn't be surprised if some engines are tuned to play well against a king that is castled short, and go down the drain if the king happens to be on the queenside because those engine's don't know what to do in that case.

Same for bishops and knights; you can make your program prefer knights, and then give it a bonus if those knights cover one another. This will probably create a pair of knights that, if one moves, the other moves next to cover it again. Is that good chess? I don't know, but it would give the engine a distinctive quirk that may lead to interesting games. As an experiment, you could try to make PSQT's for the bishops that avoid the center; you could make PSQT's for any piece to not BE in the center, but to control it from one of the wings, trying for a 1920's hypermodern style.

You could try to add an evaluation called "Steinitz mode", that looks where knights can move, and then give a bonus if such a square can be controlled by a pawn. (Also give it bonuses when you can bash a knight onto a weak square.) A very fun mode would be "Anderssen mode": You just give away all of your pieces, and then hope your opponent will give you the chance for a spectacular mate :D

If you tune your evaluation function automatically, your engine probably will become stronger, but it won't have any style or personality left.

That's the reason why I'm planning on having either two evaluation functions, or one evaluation that is tuned by default, but can load a personality file with its own values, just like Rodent can.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: evaluation tuning - where to start?

Post by jdart »

Can I get an example implementation of logistic regression in regards to piece values? Preferably in python
For some (mostly) simple implementations of logistic regression see https://github.com/search?q=logistic+regression+python

For a library that implements this and other machine learning methods for you, see https://pytorch.org/.

The problem with Python is, the regression will have to implement your evaluation function, so unless your engine is in Python, you'd have to call your code from Python. Therefore you might consider a C++ library such as mlpack (https://mlpack.org/), although implementing something yourself is not too hard, see for example: https://github.com/jdart1/arasan-chess/ ... /tuner.cpp (methods learn and adjust_params in particular).

--Jon
Tord
Posts: 31
Joined: Tue Feb 27, 2018 11:29 am

Re: evaluation tuning - where to start?

Post by Tord »

jdart wrote: Wed Sep 30, 2020 3:37 pmThe problem with Python is, the regression will have to implement your evaluation function, so unless your engine is in Python, you'd have to call your code from Python. Therefore you might consider a C++ library such as mlpack (https://mlpack.org/), although implementing something yourself is not too hard, see for example: https://github.com/jdart1/arasan-chess/ ... /tuner.cpp (methods learn and adjust_params in particular).
I use Julia rather than Python (the popularity of Python has always been a mystery to me). Calling C++ code from Julia is completely trivial. Can't Python be used the same way? I thought one of Python's selling points was to be a nice high-level frontend for C and C++ code.