OliverBr wrote: ↑Thu Jun 12, 2025 11:23 am
First of all: Gratulations, because Rust ist a very interesting programming language.
Thank you. Rust was one of the main reasons I started this project; as a learning experience; but I think it's also one of the reasons I was able to see it through to the extent I have. Programming in C/C++ is simply much less enjoyable to me, and the fact that Rust is able to provide (me) with such a good developer experience while still maintaining very comparable performance, is very nice. Even inherently "unsafe" data structures, such as the lockless hashtable, don't add that much programmer overhead.
OliverBr wrote: ↑Thu Jun 12, 2025 11:23 am
though there it was v0.1.0, which was about 600 Elo weaker, mainly due to the abysmal HCE (now using NNUE).
So "bad" HCE vs NNUE creates 600 ELO points? It is really amazing.
How strong can a self-written engine become that uses neither HCE, nor NNUE?
No, that is not quite correct. v0.1.0 to v0.2.0 had a total of about 600 elo points gain, but while NNUE was the one largest improvement (maybe 400-450 Elo total? Still a very good chunk!), I also added other improvements like futility pruning, reverse futility pruning, move count based pruning (or something similar to it at least), aspiration windows, etc.
The HCE consisted of piece-square tables, + some terms for passed pawns, doubled pawns, different queenside/kingside pawn tables, that sort of things. The initial NNUE version was very barebones, CReLU with a 32-element accumulator and simple piece-square inputs. It was also trained on much less data than is typically encouraged, about 20 million positions gathered over a single day, by self-play with the HCE version. This gained about 200 Elo and was the largest single jump. I think this is actually the most interesting result here, because it shows that even with relatively limited resources there is quite a good gain to be made. Often this end of the spectrum gets overlooked, and the focus is on "how many billions of positions and gpu hours do you need to train a stockfish-quality net", but this proves that scaling this back many orders of magnitude is still worthwhile.
I then made incremental improvements while still running self-play with the latest version, so I now have a repository of about 180 million positions from depth-8 selfplay, mostly from Chess 324 starting positions with a few plies of random moves applied. The current architecture is 2-sided piece-square input, to a 512 element accumulator, to a single output (no output bucketing yet), with SCReLU activation.