rdhoffmann wrote: ↑Tue Jan 28, 2025 9:48 pm
The smallest network I trained was (768+64) x 28 x 1 and it is already much better than my HCE attempts.
Note the extra 64 inputs, I use it for the side to move (+/- 1) and trying out various ideas.
One question though, why would such small networks need so many positions? I don't see how (or why) that will improve performance. There is only so much a small network can learn?
Interesting!
We've actually discussed a lot of things in Discord
Starting with the fact that I didn't do sigmoid to calculate loss
There are results, there are improvements, but I can’t surpass my eval yet, maybe there are still some errors somewhere
One question though, why would such small networks need so many positions? I don't see how (or why) that will improve performance. There is only so much a small network can learn?
This number of positions is just an attempt to make sure that there are enough of them
Score of Zevra Self (Gen 7) vs Zevra Classic: 1641 - 1807 - 225 [0.477] 3673
... Zevra Self (Gen 7) playing White: 813 - 909 - 115 [0.474] 1837
... Zevra Self (Gen 7) playing Black: 828 - 898 - 110 [0.481] 1836
... White vs Black: 1711 - 1737 - 225 [0.496] 3673
Elo difference: -15.7 +/- 10.9, LOS: 0.2 %, DrawRatio: 6.1 %
SPRT: llr -2.95 (-100.2%), lbound -2.94, ubound 2.94 - H0 was accepted
But it much faster! Maybe not much, about x1.5.
So I think, tomorrow I'll adapt search params to NNUE. Maybe this will solution, because on larger tc I have lost about 300 elo points. And I feel, that my eval maybe scaled too much for my search params. Maybe I can even do other thing: calculate average evals on hce and on NNUE, after compare it's and do right scale.
Overall, it looks like I managed to catch up with HCE +-
Tomorrow there will be a new train dataset, I will train a new 768x64 on new data of a stronger version with a correctly working search
Right now I am already training 768x64 on the current data (200M train + 50M validate)