Introducing the "Cerebrum" library (NNUE-like trainer and inference code)

mar · Post by **mar** » Tue Dec 08, 2020 11:24 am

David Carteau wrote: ↑Tue Dec 08, 2020 9:30 am I also use tabs for all the good reasons you mentioned, but I must admit that in fact the source code on GitHub wasn't so readable, that's why I made the change. Is there a way to configure GitHub to visually shorten tabs ?

Not that I'm aware of - anyway, thank you for sharing the code.

Madeleine Birchfield · Tue Dec 08, 2020 5:39 pm

mar wrote: ↑Tue Dec 08, 2020 9:12 am Noo Don't listen to this troll. I prefer tabs, because this way everybody can set any indentation level he wants - some indent to 2, 4 or 8 characters.

The tabs on Github are 8 character widths by default with no way of changing it that I know of, so in order to maintain code readability I usually have to manually replace them with four spaces to make the code readable on Github. Of course, if David Carteau doesn't care about readability of his NNUE code on Github then he could leave the tabs as is. But my comment was in response to somebody who commented specifically on the readability of the code, and with tabs it didn't look very readable to me.

David Carteau · Post by **David Carteau** » Tue Dec 08, 2020 6:13 pm

I think (hope) I solved the problem. One has to put a special file (".editorconfig") in the repository to specify the tab format.

So I pushed back tabs, and you should see now 4 spaces !

https://github.com/david-carteau/cerebr ... itorconfig

I wonder now if it's possible to "hide" this ugly file

Madeleine Birchfield · Tue Dec 08, 2020 7:06 pm

David Carteau wrote: ↑Tue Dec 08, 2020 6:13 pm I think (hope) I solved the problem. One has to put a special file (".editorconfig") in the repository to specify the tab format.

So I pushed back tabs, and you should see now 4 spaces !

https://github.com/david-carteau/cerebr ... itorconfig

I wonder now if it's possible to "hide" this ugly file

That editorconfig file looks like a very useful thing to have around. I might have to start using it in the future.

jshriver · Post by **jshriver** » Wed Dec 09, 2020 10:12 pm

Which is better for the positions.txt, more positions with evals of shorter depth or less positions at deeper depth? Trying to find the trade off, as of now I spun up a server and at depth 22 I can do about 89k positions a day.

David Carteau · Post by **David Carteau** » Thu Dec 10, 2020 7:22 am

jshriver wrote: ↑Wed Dec 09, 2020 10:12 pm Which is better for the positions.txt, more positions with evals of shorter depth or less positions at deeper depth? Trying to find the trade off, as of now I spun up a server and at depth 22 I can do about 89k positions a day.

The idea if for sure to have a maximum of positions to 1) be sure to cover all the 40960 combinations of king_square x piece_type x piece_square (in fact there are fewer combinations because pawns can only be on 48 squares, not 64) - and 2) multiply evaluations examples, so that accuracy will increase.

As I understood, Stockfish team is using at least... one billion positions to train their networks. I used 360 million for Orion. My feeling is that what is super important is to have a lot of variety (i.e. positions from all game phases). Apparently, and as stated on the Stockfish Discord, it is not clear whether high depths provide or not better results. My advice would be to try to generate at least 200-300 million positions at a lower depth than 22 (i.e. 8-10 for a first try).

A last important point (I will add a comment on that on Github) is that the text file you provide to the trainer should have its lines shuffled. Training is performed by splitting given positions into small batches/chunks, and then process them in a random order, but the positions inside each batch are not shuffled (it's not useful as they are processed together : thus, shuffling as to be done before, on the entire file).

David Carteau · Post by **David Carteau** » Thu Dec 10, 2020 8:30 am

[Warning] : if needed, resynchronize your local copy of the Cerebrum repository !

While adding a comment on the necessity to shuffle positions when preparing the text file used by the trainer, I saw that the size of the network was improperly configured in the trainer. The previous value was from an attempt to build a really small network. The new value is aligned with the inference code, and with what is actually used in Orion (for sure, this can be changed for your own experiments/projects).

Some figures from the Orion v0.8 training :
- 360 million unique positions (quiet and non-quiet positions, no filtering except when in check positions) ;
- 2 days for the first epoch to complete (due to txt to bin conversion of data) ;
- 50 minutes per epoch for epochs >= 2 (using a NVidia GPU : GTX 1660 Ti) ;
- 5 days to complete 150 epochs, and produce the network released with Orion v0.8 ;
- 40 GB of disk space used at the end of the first epoch (huge !) ;
- a few GB of RAM used while training (I don't remember precisely how much, but worked without problem on a 16 GB RAM machine).

elcabesa · Post by **elcabesa** » Tue Dec 15, 2020 7:54 pm

what is the speed difference between cpu and gpu training for those kind of neural network?

David Carteau · Post by **David Carteau** » Wed Dec 16, 2020 7:59 pm

elcabesa wrote: ↑Tue Dec 15, 2020 7:54 pm what is the speed difference between cpu and gpu training for those kind of neural network?

Hi elcabesa,

Sorry for the delay, I'm very busy these days. As I didn't remember precisely how much was the gain using GPU instead of CPU, I've just ran a small test : starting with epoch number 2 (first epoch is much longer due to txt -> bin conversion of data), there is a factor of 10 ! On a i5 9400 + GTX 1660 Ti, a mini-batch is processed in 10 seconds when using CPU, and less than 1 second when using GPU.

Important note : I've just pushed a fix on the GitHub repository for the trainer (I let in capital letters a variable name when I cleaned the Python script to release it). Now everything should work, as I actually launched the script which is publicly available to run this little test. Sorry for that !

Madeleine Birchfield · Fri Dec 18, 2020 8:29 am

jdart wrote: ↑Mon Dec 07, 2020 4:12 pm Thanks for making this available, especially under MIT License.

Do you plan on using a neural network for Arasan?

Introducing the "Cerebrum" library (NNUE-like trainer and inference code)

Re: Introducing the "Cerebrum" library (NNUE-like trainer and inference code)

Re: Introducing the "Cerebrum" library (NNUE-like trainer and inference code)

Re: Introducing the "Cerebrum" library (NNUE-like trainer and inference code)

Re: Introducing the "Cerebrum" library (NNUE-like trainer and inference code)

Re: Introducing the "Cerebrum" library (NNUE-like trainer and inference code)

Re: Introducing the "Cerebrum" library (NNUE-like trainer and inference code)

Re: Introducing the "Cerebrum" library (NNUE-like trainer and inference code)

Re: Introducing the "Cerebrum" library (NNUE-like trainer and inference code)

Re: Introducing the "Cerebrum" library (NNUE-like trainer and inference code)

Re: Introducing the "Cerebrum" library (NNUE-like trainer and inference code)