First success with neural nets

Kieren Pearson · Post by **Kieren Pearson** » Thu Sep 24, 2020 4:12 pm

jp wrote: ↑Thu Sep 24, 2020 1:53 pm
jonkr wrote: ↑Thu Sep 24, 2020 12:16 am So at this point I downloaded tensorflow and exported the training data in an inputs and target file, then imported the weights back. It did turn out easier to have already debugged and optimized training, out of curiosity I may go back and try to make my own work and use it, but it does seem likely that tensorflow would just be better in every way except the import/export workflow step.
So how much will tensorflow "do for you", if you want to get away with as little programming as possible?

You still need to write the code to use the network in your evaluation function and incrementally update the input -> first hidden layer connections. Its not a trivial task.

AndrewGrant · Post by **AndrewGrant** » Thu Sep 24, 2020 4:38 pm

Is your network outputting a single centipawn score, or are you outputting both an MG and an EG score?

For some reason, outputting only one score in training, but then only applying it to the MG works best for me. I'm trying desperately to remedy this, but have been failing over and over for the last week. I'm damn close to removing the network and taking the +20 elo loss. I'm not willing to let incorrect mathematics sit in the engine.

jonkr · Post by **jonkr** » Thu Sep 24, 2020 9:52 pm

jp :
Tensorflow provides a library that can be used to train the neural net weights. You still write a small python script to tell it what to do and import/export your data, but the heavy-lifting / lower level stuff all just works. The other nice thing about tensorflow is you can have it show the values for some of the training positions to make sure you aren't screwing something up in implementation, which I did several times.

Implementing the learning of the weights to me seemed clearly the hardest part of making neural nets (though my learning eventually worked for some things sometimes and I also did thread it, so maybe it was almost there, but maybe it is long way off, won't know unless I finish it.)

Dann : Right now not planning on GPU version, possible in future, but I like the ease of use and testing with many concurrent games running with everything on the CPU. I doubt I'm doing anything smart right now, but I do think it's definitely more interesting to have some people explore the space rather than pasting something in.

Andrew : I'm outputting a single score, just using it in rook endgames and starting simple. I'm not sure yet if I'll try to use multiple outputs, depends on how I try to fit neural nets into the rest of the game.

Yesterday I switched to fixed-point. I ended up setting everything 32-bit now since was getting bad values trying to have some 16-bit parts, probably from overflow. Now that it works just weights as 16-bit should be fine at least. Even without SIMD it's faster than floats on cpu.

So next step is to finish the optimizations, then test against SF11 in the rook&pawn endgame suite, I'm curious if after running several training runs against it it will be able to beat it. (SF12 seems too strong a goal.) After that I will have to decide whether or not to make more special purpose nets (or try more specific endgame types see how quickly that goes) or try to make them more general. I want to experiment with some special ones in eval before trying out generalizing, like what if I sent my King Safety or related features to a small neural net instead of just tuning a few weights with texel tuning.

AndrewGrant · Post by **AndrewGrant** » Thu Sep 24, 2020 10:13 pm

jonkr wrote: ↑Thu Sep 24, 2020 9:52 pm Andrew : I'm outputting a single score, just using it in rook endgames and starting simple. I'm not sure yet if I'll try to use multiple outputs, depends on how I try to fit neural nets into the rest of the game.

You are going down the very line I am planning. First Pawn+King, then Material.

Then special nets that get triggered at root for 20 different endgames. I continue to believe -- and want to believe -- that NNUE replacing the entire eval is suboptimal. And that an equal or better result can be achieved with a handful of small, quickly computed and well hashed NNs, without the speedloss.

Piping the existing eval through an NN is an option I'm looking into as well. Safety is the primary place, as it is already done via non-linear methods.

econo · Post by **econo** » Thu Sep 24, 2020 11:55 pm

I have also been thinking about using these NNUEs as a way to experiment + learn about this exciting field. One question I have is about the length of time to train a net. For example, assuming you already had done the heavy lifting of all the code, how long does it take a fast machine to build a big cpu-size net like Sergio’s? I ask because I expect to make many mistakes and would still like to have time to try a lot of things that ultimately work.

I originally wanted to do this for the gpu nets, but there it is prohibitive for an individual or small team. Only Google or the Leela project can do it. But it seems like normal people might be able to have fun with NNUEs.

AndrewGrant · Post by **AndrewGrant** » Fri Sep 25, 2020 12:00 am

econo wrote: ↑Thu Sep 24, 2020 11:55 pm I have also been thinking about using these NNUEs as a way to experiment + learn about this exciting field. One question I have is about the length of time to train a net. For example, assuming you already had done the heavy lifting of all the code, how long does it take a fast machine to build a big cpu-size net like Sergio’s? I ask because I expect to make many mistakes and would still like to have time to try a lot of things that ultimately work.

I originally wanted to do this for the gpu nets, but there it is prohibitive for an individual or small team. Only Google or the Leela project can do it. But it seems like normal people might be able to have fun with NNUEs.

I cannot answer your question for NNUEs/Sergio, but I can answer it for Ethereal.

Ethereal's net right now is 224x32x1. It is trained on ~35 million samples. The training process takes ~8 hours, using a 16-thread CPU. One could speed it up by using a GPU (code would work just the same, I just don't happen to have a CUDA GPU)

econo · Post by **econo** » Fri Sep 25, 2020 4:01 am

That is a useful benchmark, thank you. I have been googling around trying to get simple training time formulas as a function of net parameters, but so many places just say “It depends” that I have given up getting a general answer and have reached that data-gathering stage of just asking people with specific experience training chess-specific nets.

jonkr · Post by **jonkr** » Sun Sep 27, 2020 6:06 pm

I've had another successful result with the rook & pawn endgame neural net. It was touch and go with a random regression, but finally got SlowDev scoring +4 elo versus Stockfish 11 in my rook endgame suite (so that's well outside the 95% error bars.) So I can say with a good degree of confidence that if your engine is SF11 level or weaker, doing something like this is a viable path to see some amount of elo gain. My next step will be to generalize and improve the related code, and do some other endgames.

First try was -63.5 elo vs SF11, then slightly smaller net and more training -55 elo, fixed-point speedup -42.2, more training -44.6 (random regression), more training -32.4, first pass SIMD -23 elo, more fully SIMD + training -9 elo, use stm relative for additional symmetry and more training +4 elo
I never finished the incremental updates, not the biggest deal just in endgames but maybe more important than I think. The specific nets means a bit more tracking necessary, and will have to figure out best way to handle the horizontal + vertical&stm symettry. My current net is 320 inputs and 184x32x32x1.

Much less certain, but I would guess that if the Stockfish team wanted to squeeze some additionaly elo out, this is might help even in SF12. (Or any other strong program with computing resources.) Given the huge SF search advantage, and that my cpu net computation not as optimized, its possible the eval itself might already be in the ballpark or better. How much it's worth might not be that significant though for the added complication.

Andrew :
I do think adding specific nets may surprise some people with amount of elo gain for programs not already at the very strongest level. Overall I am resigned that as a hobby I will never get that close to start-of-art so I won't know what's truly a good idea. One net does seem cleaner but multiple shouldn't be big issue once I clean up my code to generalize.

econo :
For computer time most of it by far was spent on the playing test games portion which I feed back in to generate positions for training data (and of course check results to see elo progress.) I'd say maybe 5 days computer time on Ryzen 12-core if it was continuously running. Since I was developing too some of that training was non-optimal, certainly the games where I was accidentally only setting inputs for 1 side of the board are questionable.

For time to train the net in tensor flow, loading data is like a minute or two, then about 5 minutes to train the neural net, although when to stop is arbitrary, I could have stopped much sooner.
To extract and export the training data (position inputs, position value) from the pgns maybe 4 mins (done in SlowDev).
I'm using 2 million positions to train.

jonkr · Post by **jonkr** » Sun Sep 27, 2020 9:05 pm

After thinking about this more and watching some games I'm slightly less excited about it, although I think my conclusions still hold. Initial excitement was from the process working well enough to meet my goal and beating SF11 in any test at all.

I was thinking about how many nets would be needed, would need to get test positions and run training for all of them, generalize the code in many places and fix issues fitting everything together, and then that's only late endgame. Easily doable but the amount of work remaining compared to amount of elo gain is less exciting. There probably is a balance though where some nets could be more generic. It is good learning experience either way and still what I plan to do.

Also in one game against a weaker opponent Slow in a worse position traded into a lost rook endgame, still need to make sure eval fits nicely with others, so it recognizes and scales the eval for won/lost positions higher and knows when to trade into them better, but not so high I get neural net trolling. On positives there was one game where it was reporting near 0 for a drawn rook ending where the opponent was up an outside pawn so was better there. But reaching rook endgames on the board was very rare, although big elo diff means not a good test for that. With better play for all late endgame positions and balanced evals should help earlier in game too.

mvanthoor · Post by **mvanthoor** » Sun Sep 27, 2020 10:06 pm

Just for my own clarity about this: You are not using NNUE, but you have written your own neural network code, and have trained it yourself? I can write a neural network if I put my mind to it (I know how it's done), for something such as an image classifier, but in the case of chess, I have absolutely no *** clue on how to train it.

First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets