zullil wrote: ↑Tue Mar 10, 2020 11:12 am
And what I intended was to train a neural net to play White from 1. g4.
One problem is that it might depend critically on complex endgames, where the NN engine has no clue. That would corrupt the training of the NN.
e.g. in the current game, there are 16 pieces on the board but mmt is finding Leela useless now.
If we wanted to let SF take over at some point in each training game, that'd add further complications and decisions to make.
zullil wrote: ↑Tue Mar 10, 2020 11:12 am
And what I intended was to train a neural net to play White from 1. g4.
One problem is that it might depend critically on complex endgames, where the NN engine has no clue. That would corrupt the training of the NN.
e.g. in the current game, there are 16 pieces on the board but mmt is finding Leela useless now.
If we wanted to let SF take over at some point in each training game, that'd add further complications and decisions to make.
Train a net (as White) against Stockfish, and call a score of -300 cp a win for Black. Perhaps use the duration of the game as a measure of success. I'm not looking to solve chess. I'm just curious what line(s) the net would converge toward. (Probably obvious flaws in my training design, which was thrown together in the amount of time needed to type it.)
Ovyron wrote: ↑Tue Mar 10, 2020 6:00 pm
Training Leela isn't going to play 1.g4 better than an experienced centaur so you're probably better hiring one and save your time
No, I'd much prefer no input from humans or from mythological creatures.
jp wrote: ↑Tue Mar 10, 2020 11:43 am
One problem is that it might depend critically on complex endgames, where the NN engine has no clue.
My theory is that SF outperforms LC0 significantly when its lines reach EGTBs for a good percentage of nodes, since LC0 could easily not look deep enough to reach them. I'll run some tests. Maybe a hybrid engine like this is worth making but I think there are better ways to integrate them.
mmt wrote: ↑Tue Mar 10, 2020 9:16 pm
My theory is that SF outperforms LC0 significantly when its lines reach EGTBs for a good percentage of nodes, since LC0 could easily not look deep enough to reach them.
There was an example in the past where Leela was hopeless in an endgame with one more piece than the TBs it had, but we'd hope it's improved since then.
zullil wrote: ↑Tue Mar 10, 2020 1:38 pm
Train a net (as White) against Stockfish, and call a score of -300 cp a win for Black. Perhaps use the duration of the game as a measure of success.
I'm not sure how the NN will improve at the start if it gets wiped out every game. Maybe train against SF with SF depth starting low and increasing as training continues.
jp wrote: ↑Tue Mar 10, 2020 9:28 pm
I'm not sure how the NN will improve at the start if it gets wiped out every game. Maybe train against SF with SF depth starting low and increasing as training continues.
It won't get wiped out every game, I've seen some draws in self-play. But it's true that some scoring system that reflects how long it lasted could be better.
mmt wrote: ↑Tue Mar 10, 2020 9:38 pm
It won't get wiped out every game, I've seen some draws in self-play. But it's true that some scoring system that reflects how long it lasted could be better.
Self-play is a lot different. This would be training a NN from scratch with SF as supervisor and with the Black pieces. Even SF with White would wipe out a newer NN.
jp wrote: ↑Tue Mar 10, 2020 9:40 pm
Self-play is a lot different. This would be training a NN from scratch with SF as supervisor and with the Black pieces. At the start, even SF with White would wipe out a fresh NN.
You might have trouble getting enough training games this way and if you give SF very little time per move also, it won't win every time. Just my guess about that, worth testing.
mmt wrote: ↑Tue Mar 10, 2020 9:38 pm
It won't get wiped out every game, I've seen some draws in self-play. But it's true that some scoring system that reflects how long it lasted could be better.
Self-play is a lot different. This would be training a NN from scratch with SF as supervisor and with the Black pieces. Even SF with White would wipe out a newer NN.
As I said, I certainly haven't thought carefully about a training strategy. Some reinforcement for endurance seems sensible. But I know very little about neural nets and reinforcement learning, so I'm mostly just floating an idea, maybe a dopey one.