So I just read up on the basic ideas of NNs and thought this might be a nice testbed to try it out in practice.
Sorry for the stupid question, but how do you input the position? (I got a bit lost in the code... ) 64 * 13 input neurons for each possible piece on each square? (that would seem a bit weird to me but I don't see anything cleaner)
And another question, it's probably a bad idea but could one also let the NN learn the rules by itself? By letting it try random square combinations and have the cost function e.g. X - legal moves played at the start? (and by minimizing the cost function it learns to only play legal moves)
That might be slow but like that one doesn't force it to take moves in whatever format and instead let it decide itself how it wants to generate them. (and in what order if that makes sense)
Nice Project,the opening of that game looks like a 5 year old was playing it(Not 5 year old Capablanca)but we are at the beginning!If I can contribute CPU power let me know.Lets watch this grow.Is any of the implication from the A0 papers as it is not open source?
A small progress update as well. From training so far, the network seems to be learning to draw incredibly well, but then nearly all the training samples end up being draws. This leads to a situation where predicting a draw as the value of the position is all the training learns. One potential way around the draw problem is including 33% wins, 33% losses, and 33% draws in the training set.
First though, taking a step back and going to train a net based on Stockfish self-play games (thanks gcp!), and we'll see how strong the engine is. That will hopefully flush out any major bugs in the search.
Here are three games against ScorpioMCTS, which uses monte-carlo-tree-search but replaces the value network with a qsearch. It doesn't have a policity network.
It seems lczero is far weeker at the moment mainly due to tactics issues i suppose.
It's working!! The first dozen moves are no random anymore! Proof is that whites dark Bishop retreated twice when threatened by blacks h6 and g5 pawn moves and so does the Knight at Nf3 moved to Ne5 when threatened by g4.
Interesting is that the remaining moves goes back to random proving that the learning concentration starts at the opening phase then moving forward meaning it may master the endgame last.
Daniel Shawul wrote:Here are three games against ScorpioMCTS, which uses monte-carlo-tree-search but replaces the value network with a qsearch. It doesn't have a policity network.
It seems lczero is far weeker at the moment mainly due to tactics issues i suppose.
Thanks for playing some games! Yes, it definitely doesn't have much tactical awareness yet . I think part of that comes from training on SF games. In self-play, it would learn the probabilities of exploring each move from UCT, with SF games, it only has the best-played move to learn from.