Orion 0.7 : NNUE experiment

David Carteau · Post by **David Carteau** » Wed Aug 19, 2020 7:35 am

Here is the result of my little experiment consisting in integrating NNUE evaluation concept in my engine Orion :

+--------------------+-------+-----------|-------+-------+-------+-------+
| ENGINE             |   ELO |       +/- | GAMES | SCORE |  AvOp | DRAWS |
+--------------------+-------+-----------|-------+-------+-------+-------+
| Orion 0.7.nnue x64 |  2953 |  +22  -22 |  1000 |   80% |  2712 |   19% |
| Orion 0.7 x64      |  2762 |  +19  -18 |  1000 |   57% |  2711 |   33% |
+--------------------+-------+-----------|-------+-------+-------+-------+

Around +190 elo !!

For this experiment, I used exactly the same test conditions than for my recent release (v0.7), with same opponents (total: 10), same number of games (total: 1000), same opening book, same time controls (40/1), etc.

I didn't want to simply copy/paste available C++ code, but rather to understand the network architecture and the way final evaluation is computed, so I decided to write my own NNUE implementation in C, compatible with the current Stockfish's networks.

Network loading and evaluation represent around 250 lines of code. The only thing which is not yet implemented is the "capture" feature. I must admit that, for the moment, I found this quite strange : how the associated weights are computed since learning is performed using only fens (so from a static view of the board) ?

For the test, I used the current 'best' network available at https://tests.stockfishchess.org/nns (which is to date 'nn-82215d0fd0df.nnue').

So, what are my thoughts ?

I'm super happy ! I knew that Orion's evaluation was weak, so I'm not surprised. I have now a better idea of what remains to be done regarding other parts of the engine, especially search, if I want to make progress. Having a 'fixed' (and known as one of the best) evaluation should give a good basis to improve the rest of the engine.

But, wait, am I saying that next releases of Orion will now use Stockfich evaluation networks ?!

No ! It wouldn't be satisfactory from an intellectual perspective. My goal has always been - and remains - to understand concepts, try to implement them on my side, and then start to play with, in the sense of "try to improve if possible" !

So what's next ?

Next weeks will be busy, and I cannot imagine releasing another version before Orion v0.7 has been tested on CCRL 40/15 list (I put a lot of efforts on this version !). And, as said above, I cannot imagine releasing a version relying on a network not being built by myself !

If it was the case, what will be the benefit for me ? For sure, a better ranking. But it doesn't correspond at all to my wishes. I want to understand and experiment by myself !

Next steps for Orion will be : try to understand how to train networks, build my own trainer, try to mix concepts (the return of PBIL ?!), try to play with network architectures, implement SMP (!).

Then, a new version may see the light of day. I think, now that the code exists, that the next version will embed the capacity to use Stockfish networks, but not by default. This will offer the possibility for testers to play and compare both Orion's evaluations, or compare Orion with other engines - both using same networks : this could be a unexplored and new way to test (and rank ?) engines

One thing is sure : Orion won't use by default the evaluation of another engine. It shall remain a 100% original work (with the notable exception - for the moment !? - of Syzygy support introduced in the last release). What would be the a world where all engines would use the same eval ?! Competition requires to try different and distinct approaches !

To all engine developers : what are your plans ? Do you also currently work on NNUE integration in your engine ? Do you plan to replace (or mix) your engine evaluation by NNUE in the next future ? Does anyone has thoughts on the "capture" feature ?

Final note : for whom who are interested, here is an idea of the nps drop (using the Orion's builtin 'bench' command):

Code: Select all

+--------------------+------------------|----------------------------+
| ENGINE             | popcount version | popcount+avx2+bmi2 version |
+--------------------+------------------|----------------------------+
| Orion 0.7.nnue x64 |     128 kn/s     |          160 kn/s          |
| Orion 0.7 x64      |     818 kn/s     |          861 kn/s          |
+--------------------+------------------|----------------------------+

Speed is around 15-18% of 'classic' version, but current implementation is simple and straightforward, i.e. no use of intrinsics, so there is still room for improvement !

cdani · Post by **cdani** » Wed Aug 19, 2020 7:53 am

Nice effort! Congratulations!

Gabor Szots · Post by **Gabor Szots** » Wed Aug 19, 2020 8:21 am

David, that's splendid.

I very much like that you're going to develop your own networks although I don't see any objection to using SF networks giving due credit.

Looking forward to the next version. Regrettably, owing to hardware limitation I cannot contribute to the 40/15 list to accelerate testing, however much I would like.

Best wishes,
Gabor

David Carteau · Post by **David Carteau** » Wed Aug 19, 2020 7:33 pm

cdani wrote: ↑Wed Aug 19, 2020 7:53 am Nice effort! Congratulations!

Gabor Szots wrote: ↑Wed Aug 19, 2020 8:21 am David, that's splendid.

I very much like that you're going to develop your own networks although I don't see any objection to using SF networks giving due credit.

Looking forward to the next version. Regrettably, owing to hardware limitation I cannot contribute to the 40/15 list to accelerate testing, however much I would like.

Best wishes,
Gabor

Thank you to both of you for your kind words !

@Gabor: no hurry for the CCRL 40/15 testing ! Please let me thank you - and all other testers - for the resources and the time you offer to us, engine authors. Without all of you, I'm not sure we would invest the same efforts on improving our engines !

David Carteau · Post by **David Carteau** » Thu Aug 20, 2020 10:30 am

I managed this morning to obtain significant speed-ups :

Code: Select all

+--------------------+------------------|----------------------------+
| ENGINE             | popcount version | popcount+avx2+bmi2 version |
+--------------------+------------------|----------------------------+
| Orion 0.7.nnue x64 |     202 kn/s     |          288 kn/s          |
| Orion 0.7 x64      |     818 kn/s     |          861 kn/s          |
+--------------------+------------------|----------------------------+

Currently, nps of 'nnue' version is around 25-33% of the 'classic' version ! I still not use intrinsics, and rely only on compiler's own optimisations. I hope I will gain more if I can manage to manually add intrinsics instructions

RubiChess · Post by **RubiChess** » Fri Aug 21, 2020 9:45 am

David Carteau wrote: ↑Wed Aug 19, 2020 7:35 am I didn't want to simply copy/paste available C++ code, but rather to understand the network architecture and the way final evaluation is computed, so I decided to write my own NNUE implementation in C, compatible with the current Stockfish's networks.

...

But, wait, am I saying that next releases of Orion will now use Stockfich evaluation networks ?!

No ! It wouldn't be satisfactory from an intellectual perspective. My goal has always been - and remains - to understand concepts, try to implement them on my side, and then start to play with, in the sense of "try to improve if possible" !

Congrats and kudos for that!

You already went the way I will try to go for Rubi: Rewriting the complete NNUE code (to something I understand much better than this highly evolved C++ of the original) and within this learn the basics of NN.
This will be a hard and long way cause I don't have any knowledge about machine learning and NN yet (just a book that waits for reading) but it will be more worth than the "copy-and-paste in a rush" of some others.

Regards, Andreas

mvanthoor · Post by **mvanthoor** » Fri Aug 21, 2020 10:29 am

RubiChess wrote: ↑Fri Aug 21, 2020 9:45 am This will be a hard and long way cause I don't have any knowledge about machine learning and NN yet (just a book that waits for reading) but it will be more worth than the "copy-and-paste in a rush" of some others.

Same here... but I even have to finish my engine first. Someday.

Because it's possible to reach at least 3400 ELO in a single-threaded alpha/beta search, I don't think that I'll be looking into techniques such as SMP and neural networks until my engine reaches at least around 2850 or even 3000, in a single-threaded a/b-search. That will take quite enough time already, after actually finishing it.

David Carteau · Post by **David Carteau** » Sun Aug 23, 2020 3:29 pm

Thanks Andreas and Marcel for your feedback. Trying to understand concepts and then to implement them is a super challenge !

In the meanwhile, I carrefully looked at the Stockfish code to learn how intrinsics could speed up dot products computation. This picture helped me a lot to understand what was behind the obscure terms used :

I also implemented intrinsics for the ReLU layers, but with no real advantage in term of speed (so I decided to leave code commented).

I'm super excited with the results !!

Code: Select all

+--------------------+------------------|----------------------------+
| ENGINE             | popcount version | popcount+avx2+bmi2 version |
+--------------------+------------------|----------------------------+
| Orion 0.7.nnue x64 |     424 kn/s     |          578 kn/s          |
| Orion 0.7 x64      |     818 kn/s     |          861 kn/s          |
+--------------------+------------------|----------------------------+

The 'popcount' version now requires -mssse3 GCC's flag, which should not be a problem since 'popcount" instruction came with sse4 sets of instructions.

Speed of 'nnue' version is now around 51-67% of the 'classic' version ! On actual games, nps is more or less halved between two versions.

I'm going to launch a tournament (with always the same other engines) to see how it will impact the elo performance.

David Carteau · Post by **David Carteau** » Mon Aug 24, 2020 10:16 am

Here is the (spectacular !) result of the tournament :

Code: Select all

+--------------------+-------+-----------|-------+-------+-------+-------+
| ENGINE             |   ELO |       +/- | GAMES | SCORE |  AvOp | DRAWS |
+--------------------+-------+-----------|-------+-------+-------+-------+
| Orion 0.7.nnue x64 |  3091 |  +29  -27 |  1000 |   90% |  2712 |   10% |
| Orion 0.7 x64      |  2762 |  +19  -18 |  1000 |   57% |  2711 |   33% |
+--------------------+-------+-----------|-------+-------+-------+-------+

Which is... more than +300 elo !!

The rating must however be inflated due to the set of opponents, and the high score obtained against them (90%). Note that the tournament was run with the 'popcount' version, which is not the fastest implementation.

I could launch a new tournament with 3000+ elo engines, but for the moment, I will consider that my NNUE implementation is fast enough to switch to the training part. I will try to implement my own trainer, with the (first) objective to train a similar but smaller neural network than the Stockfish's one. The idea is not to get the highest possible ranking, but rather to try to see whether I can improve or not Orion's evaluation function by my own means

[ Edit to my previous post : source of image is here. ]

Sylwy · Post by **Sylwy** » Mon Aug 24, 2020 2:47 pm

Waiting for your NNUE engine !

https://www.lrz.de/services/compute/cou ... tro_AI.pdf

Orion 0.7 : NNUE experiment

Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment