Why using the game result instead of evaluation scores

Desperado · Post by **Desperado** » Tue Jan 12, 2021 8:11 pm

Hello everybody,

currently some threads for tuning are ongoing. Here is another question i am interested in.

When creating the training data it wouldn't be a big deal to use a trainer (any engine) that computes
a score for each of the positions and stores them. Instead of computing the error with the result,
you can compute the score difference as error.

What will be the major difference, i think of smaller batchsizes, better correlation between training score
and the position (think of mg position), any other idea ?

Who has experience with that kind of tuning.

As always, any feedback is welcome

Gerd Isenberg · Post by **Gerd Isenberg** » Tue Jan 12, 2021 8:46 pm

Desperado wrote: ↑Tue Jan 12, 2021 8:11 pm Hello everybody,

currently some threads for tuning are ongoing. Here is another question i am interested in.

When creating the training data it wouldn't be a big deal to use a trainer (any engine) that computes
a score for each of the positions and stores them. Instead of computing the error with the result,
you can compute the score difference as error.

What will be the major difference, i think of smaller batchsizes, better correlation between training score
and the position (think of mg position), any other idea ?

Who has experience with that kind of tuning.

As always, any feedback is welcome

Hi Michael,

I guess that is called Temporal Difference Learning with TD(λ) and TDLeaf(λ). KnightCap and CilkChess were the most prominent samples, already some time ago.

Oups, not exactly - since TD is pure RL, but what you propose is supervised learning.

Cheers,
Gerd

Desperado · Post by **Desperado** » Tue Jan 12, 2021 9:03 pm

Gerd Isenberg wrote: ↑Tue Jan 12, 2021 8:46 pm
Desperado wrote: ↑Tue Jan 12, 2021 8:11 pm Hello everybody,

currently some threads for tuning are ongoing. Here is another question i am interested in.

When creating the training data it wouldn't be a big deal to use a trainer (any engine) that computes
a score for each of the positions and stores them. Instead of computing the error with the result,
you can compute the score difference as error.

What will be the major difference, i think of smaller batchsizes, better correlation between training score
and the position (think of mg position), any other idea ?

Who has experience with that kind of tuning.

As always, any feedback is welcome
Hi Michael,

I guess that is called Temporal Difference Learning with TD(λ) and TDLeaf(λ). KnightCap and CilkChess were the most prominent samples, already some time ago.

Oups, not exactly - since TD is pure RL, but what you propose is supervised learning.

Cheers,
Gerd

Hi Gerd,

nice to read from you!, but i think that is not what i am looking for. The positions that include a score are independant positions,
not connected as a game. TD updates learns from this dependency, at least i think so.

What i mean is replacing 1.0 / 0.5 / 0.0 with a real score like 0.65347 for example.
So the algorithm does not change in any way only the reference value does change.

derjack · Post by **derjack** » Tue Jan 12, 2021 9:15 pm

Isn't this what people do for NNUE already? On millions of positions they let the engine evaluate for some depth so that you can train the eval to be closer to eval + depth. I know originally Texel's tuning method used WDL instead of eval, but personally I had better results using the latter.

Gerd Isenberg · Post by **Gerd Isenberg** » Tue Jan 12, 2021 9:20 pm

I got that a little bit later. One needs to scale the engine specific score to a win percentage sigmoid for a 0-1 or -1,0,1 range.
Similar to TD(λ) one may even try to interpolate the final result into that score. I don't know whether this was tried before.

hgm · Post by **hgm** » Tue Jan 12, 2021 9:47 pm

The main point is that with this training you would learn your evaluation to mimic a certain engine. That would also make it copy the errors of that engine. There is no arguing with game results, though. Even engines that misevaluate a dead draw as a win will eventually draw, when they have to finish the game, rather than just boast how good they are.

Desperado · Post by **Desperado** » Tue Jan 12, 2021 9:51 pm

Gerd Isenberg wrote: ↑Tue Jan 12, 2021 9:20 pm I got that a little bit later. One needs to scale the engine specific score to a win percentage sigmoid for a 0-1 or -1,0,1 range.
Similar to TD(λ) one may even try to interpolate the final result into that score. I don't know whether this was tried before.

Well, something like rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - ce 23
instead of rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - "1/2-1/2"

then using

sigmoidInverse(23,400) as error reference.

Desperado · Post by **Desperado** » Tue Jan 12, 2021 10:11 pm

hgm wrote: ↑Tue Jan 12, 2021 9:47 pm The main point is that with this training you would learn your evaluation to mimic a certain engine. That would also make it copy the errors of that engine. There is no arguing with game results, though. Even engines that misevaluate a dead draw as a win will eventually draw, when they have to finish the game, rather than just boast how good they are.

I agree with you to some extent, but it is also true that good positions are lost because the participant is 100 Elo weaker than the other,or a simple mistake is made 30 moves away from the position, or many other possible reasons.
There are many reasons that influence the result. So, a score might be much closer to the situation than a result.

If you want to avoid that a scoring function of one engine is mapped to another, you can also use many different engines,
there are also many ideas.There are many reasons that influence the result,

just because a game is lost does not mean that the position is bad.

Transferring a value to its own evaluation function does not lead to the same style or even the same moves, as long as the function is not identical.

Nice to read from you...
Regards

Pio · Post by **Pio** » Wed Jan 13, 2021 12:03 am

Desperado wrote: ↑Tue Jan 12, 2021 10:11 pm
hgm wrote: ↑Tue Jan 12, 2021 9:47 pm The main point is that with this training you would learn your evaluation to mimic a certain engine. That would also make it copy the errors of that engine. There is no arguing with game results, though. Even engines that misevaluate a dead draw as a win will eventually draw, when they have to finish the game, rather than just boast how good they are.
I agree with you to some extent, but it is also true that good positions are lost because the participant is 100 Elo weaker than the other,or a simple mistake is made 30 moves away from the position, or many other possible reasons.
There are many reasons that influence the result. So, a score might be much closer to the situation than a result.

If you want to avoid that a scoring function of one engine is mapped to another, you can also use many different engines,
there are also many ideas.There are many reasons that influence the result,

just because a game is lost does not mean that the position is bad.

Transferring a value to its own evaluation function does not lead to the same style or even the same moves, as long as the function is not identical.

Nice to read from you...
Regards

Hgm is usually right (except when he disagrees with me

). The problem with mimicking another evaluation function is exactly what hgm says. You will learn the other engines faults so you can never get an eval better than the engine you are mimicking (except by a tiny amount of pure luck). I guess mimicking another better engine is extremely good in the beginning since you can learn things a lot faster but you can never reach the same heights as you could do by learning it by yourself. It’s like having an okay teacher but you don’t think by yourself. Having lots of okay teachers/engines makes most sense if you are smart enough to see which teacher is good at what (but of course you can learn a little bit of just following the stream reducing the big errors but also the big gains). One good way in the middle is to start out with lots of teaching and reduce the teaching after a while. What I mean is that you could interpolate the teacher’s ideas/scoring with the real outcome. You could do it by using tapered LEARNING_FROM_TEACHER and TRY_IT_YOURSELF where you start with the teacher phase and then increase the trying part.

I think it is always good to focus on the end first and that is why I had the idea of weighing positions close to the end a lot more at least in the beginning of your training. It is hard to learn how to play an opening if you don’t know how to checkmate. After you learn how to checkmate you will learn that queens are good so after a while you will learn that promotion is good, then you will learn that pawns higher up are better and then that they are better if they come in pairs...

Desperado · Post by **Desperado** » Wed Jan 13, 2021 8:44 am

Pio wrote: ↑Wed Jan 13, 2021 12:03 am
Desperado wrote: ↑Tue Jan 12, 2021 10:11 pm
hgm wrote: ↑Tue Jan 12, 2021 9:47 pm The main point is that with this training you would learn your evaluation to mimic a certain engine. That would also make it copy the errors of that engine. There is no arguing with game results, though. Even engines that misevaluate a dead draw as a win will eventually draw, when they have to finish the game, rather than just boast how good they are.
I agree with you to some extent, but it is also true that good positions are lost because the participant is 100 Elo weaker than the other,or a simple mistake is made 30 moves away from the position, or many other possible reasons.
There are many reasons that influence the result. So, a score might be much closer to the situation than a result.

If you want to avoid that a scoring function of one engine is mapped to another, you can also use many different engines,
there are also many ideas.There are many reasons that influence the result,

just because a game is lost does not mean that the position is bad.

Transferring a value to its own evaluation function does not lead to the same style or even the same moves, as long as the function is not identical.

Nice to read from you...
Regards
Hgm is usually right (except when he disagrees with me ). The problem with mimicking another evaluation function is exactly what hgm says. You will learn the other engines faults so you can never get an eval better than the engine you are mimicking (except by a tiny amount of pure luck). I guess mimicking another better engine is extremely good in the beginning since you can learn things a lot faster but you can never reach the same heights as you could do by learning it by yourself. It’s like having an okay teacher but you don’t think by yourself. Having lots of okay teachers/engines makes most sense if you are smart enough to see which teacher is good at what (but of course you can learn a little bit of just following the stream reducing the big errors but also the big gains). One good way in the middle is to start out with lots of teaching and reduce the teaching after a while. What I mean is that you could interpolate the teacher’s ideas/scoring with the real outcome. You could do it by using tapered LEARNING_FROM_TEACHER and TRY_IT_YOURSELF where you start with the teacher phase and then increase the trying part.

I think it is always good to focus on the end first and that is why I had the idea of weighing positions close to the end a lot more at least in the beginning of your training. It is hard to learn how to play an opening if you don’t know how to checkmate. After you learn how to checkmate you will learn that queens are good so after a while you will learn that promotion is good, then you will learn that pawns higher up are better and then that they are better if they come in pairs...

Very well explained!

It is not about being right or wrong for me. I value the opinion of HG very much. Sometimes i discuss because to go deeper into the
topic or to learn more about another idea that does not come to my mind.

So, what do you think would happen if it is about self training, i mean the score is the result of a deeper search?
Like better prediction of the search outcome, may that work, it feels somehow flawed.

To combine a result and a score to some degree seems to be a nice idea too, i thought about it already.

Beginning the training with endgame positions looks interesting as you described it. You could try to learn from
endgame databases with perfect result ?!

Of course everything needs to be tested, doesn't matter which arguments are right, as long there is comprehensible idea.

Why using the game result instead of evaluation scores

Why using the game result instead of evaluation scores

Re: Why using the game result instead of evaluation scores

Re: Why using the game result instead of evaluation scores

Re: Why using the game result instead of evaluation scores

Re: Why using the game result instead of evaluation scores

Re: Why using the game result instead of evaluation scores

Re: Why using the game result instead of evaluation scores

Re: Why using the game result instead of evaluation scores

Re: Why using the game result instead of evaluation scores

Re: Why using the game result instead of evaluation scores