Page 1 of 1

Unsupervised reinforcement tuning from zero

Posted: Fri Oct 16, 2020 4:20 pm
by Madeleine Birchfield
Is it possible to tune a handcrafted evaluation function whose values all start at zero using unsupervised reinforcement tuning? And if yes, how does this differ from training a neural network from zero using unsupervised reinforcement training?

Re: Unsupervised reinforcement tuning from zero

Posted: Fri Oct 16, 2020 4:52 pm
by maksimKorzh
Madeleine Birchfield wrote: Fri Oct 16, 2020 4:20 pm Is it possible to tune a handcrafted evaluation function whose values all start at zero using unsupervised reinforcement tuning? And if yes, how does this differ from training a neural network from zero using unsupervised reinforcement training?
Youre asking " start at zero using unsupervised reinforcement tuning" & "from zero using unsupervised reinforcement training" is it a typo? Maybe you've meant unsupervised reinforcement vs supervised reinforcement?

Anyway:
Is it possible to tune a handcrafted evaluation function whose values all start at zero using unsupervised reinforcement tuning?
Leela does it, but if we're dealing with unsupervised reinforcement learning it's hard to call it tuning and in this case we don't need handcrafted evaluation, on the other hand "tuning" assumes handcrafted evaluation which assumes itself that the value in eval are non-zeros but some basic values, e.g. material and PST weights. In this case tuning allows to make those handcrafted values more precise.

Maybe you want to create a handcrafted eval but instead of taking values from other engines try to obtain own using supervised learning. In this case a large set of PGN games can be taken and be used to obtain eval parameters using logistic regression. I tried this for piece weights and it's more less working, but for PST I didn't succeed and decided to consider NNUE instead.

Re: Unsupervised reinforcement tuning from zero

Posted: Fri Oct 16, 2020 5:42 pm
by Madeleine Birchfield
maksimKorzh wrote: Fri Oct 16, 2020 4:52 pm Youre asking " start at zero using unsupervised reinforcement tuning" & "from zero using unsupervised reinforcement training" is it a typo? Maybe you've meant unsupervised reinforcement vs supervised reinforcement?
Nope, I was wondering about on one hand the distinction between tuning and learning/training. In this case, if one could start with a handcrafted evaluation function whose values all start at zero, and use self play to generate data to adjust the values of the evaluation function. But if I read what you wrote below correctly, tuning starts from non-zero values and training doesn't use handcrafted evaluation functions, so in which case there doesn't yet exist a name for what I am trying to do.
maksimKorzh wrote: Fri Oct 16, 2020 4:52 pm Anyway:
Is it possible to tune a handcrafted evaluation function whose values all start at zero using unsupervised reinforcement tuning?
Leela does it, but if we're dealing with unsupervised reinforcement learning it's hard to call it tuning and in this case we don't need handcrafted evaluation, on the other hand "tuning" assumes handcrafted evaluation which assumes itself that the value in eval are non-zeros but some basic values, e.g. material and PST weights. In this case tuning allows to make those handcrafted values more precise.

Re: Unsupervised reinforcement tuning from zero

Posted: Fri Oct 16, 2020 5:51 pm
by Hrvoje Horvatic
Madeleine Birchfield wrote: Fri Oct 16, 2020 4:20 pm Is it possible to tune a handcrafted evaluation function whose values all start at zero using unsupervised reinforcement tuning? And if yes, how does this differ from training a neural network from zero using unsupervised reinforcement training?
yes... it was done before, and it works... but you usually get better results when you start from good values, because it is easy to get trapped in local minima, so you get sub-optimal values for your eval function... the framework that is used most often right now is so-called "Texel" tuning and it is a form of supervised learning (you learn from game outcomes), and here nobody starts from zero either... :)

but it is completely different thing with neural networks, you don't tune existing features, you "discover" micro-features and tune them at the same time...

So it's apples and oranges.

Re: Unsupervised reinforcement tuning from zero

Posted: Fri Oct 16, 2020 6:02 pm
by Madeleine Birchfield
Hrvoje Horvatic wrote: Fri Oct 16, 2020 5:51 pm yes... it was done before, and it works... but you usually get better results when you start from good values, because it is easy to get trapped in local minima, so you get sub-optimal values for your eval function... the framework that is used most often right now is so-called "Texel" tuning and it is a form of supervised learning (you learn from game outcomes), and here nobody starts from zero either... :)
But Texel tuning as you said is supervised; is it possible to use Texel tuning and other tuning without supervision (hence the 'unsupervised' in the thread title)?

Also I believe that the local minima issue is also an issue with training neural networks as well.

Re: Unsupervised reinforcement tuning from zero

Posted: Fri Oct 16, 2020 6:24 pm
by chrisw
Madeleine Birchfield wrote: Fri Oct 16, 2020 6:02 pm
Hrvoje Horvatic wrote: Fri Oct 16, 2020 5:51 pm yes... it was done before, and it works... but you usually get better results when you start from good values, because it is easy to get trapped in local minima, so you get sub-optimal values for your eval function... the framework that is used most often right now is so-called "Texel" tuning and it is a form of supervised learning (you learn from game outcomes), and here nobody starts from zero either... :)
But Texel tuning as you said is supervised; is it possible to use Texel tuning and other tuning without supervision (hence the 'unsupervised' in the thread title)?
Of course it is.
You need to start with some code that can distinguish features.
And a bunch of games with result.

Loop:
Train/tune (what’s the difference?)
Make loadsa self-play games
Round and round you go


Also I believe that the local minima issue is also an issue with training neural networks as well.

Re: Unsupervised reinforcement tuning from zero

Posted: Fri Oct 16, 2020 7:07 pm
by Madeleine Birchfield
chrisw wrote: Fri Oct 16, 2020 6:24 pm Of course it is.
You need to start with some code that can distinguish features.
And a bunch of games with result.

Loop:
Train/tune (what’s the difference?)
Make loadsa self-play games
Round and round you go
And so the only real difference between handcrafted evaluation and neural networks is handcrafted features discovered/implemented by humans vs automated feature discovery/implementation.

Re: Unsupervised reinforcement tuning from zero

Posted: Fri Oct 16, 2020 8:39 pm
by chrisw
Madeleine Birchfield wrote: Fri Oct 16, 2020 7:07 pm
chrisw wrote: Fri Oct 16, 2020 6:24 pm Of course it is.
You need to start with some code that can distinguish features.
And a bunch of games with result.

Loop:
Train/tune (what’s the difference?)
Make loadsa self-play games
Round and round you go
And so the only real difference between handcrafted evaluation and neural networks is handcrafted features discovered/implemented by humans vs automated feature discovery/implementation.
Depends how you want to look at it and so on.
Handcrafted means identifying sub features and writing functions that recognise them. Weighting them and adding them up. Whichever which way you want to analyse it, it’s not a holistic solution.

Neural network is non linear and holistic. It’s not chopping the thing into features and then adding them. It’s not even meaningful to talk “features”, there aren’t any, there’s just the whole position.

Chess in inherently non-linear and definitely holistic and all those tactical bean-countery people were wrong. At everything.

Re: Unsupervised reinforcement tuning from zero

Posted: Fri Oct 16, 2020 8:49 pm
by Hrvoje Horvatic
Madeleine Birchfield wrote: Fri Oct 16, 2020 7:07 pm
And so the only real difference between handcrafted evaluation and neural networks is handcrafted features discovered/implemented by humans vs automated feature discovery/implementation.
this is approximately true...

But it's a difference that makes the difference... a BIG difference...