Testrun of Stockfish 16 HCE for my full UHO-Ratinglist (14000 games (1000 games vs 14 different opponents)), for a comparison with Stockfish 16 NNUE finished. Let's see, how much progress the nnue-nets give, compared with the old handwritten evaluation (HCE). Games are stored in the Archive006 pgn-file and the full list pgn-file.
Full UHO-Ratinglist, 3min+1sec, singlethread:
Program Celo + - Games Score Av.Op. Draws
6 Stockfish 16 NNUE : 3779 3 3 39000 74.4% 3583 45.1%
103 Stockfish 16 HCE : 3484 4 4 14000 44.6% 3522 42.1%
Celo: Stockfish 16 NNUE to Stockfish 16 HCE: +295 Celo
bad avg.win
Rank EAS-Score sacs shorts draws moves Engine/player
-------------------------------------------------------------------
8 193748 21.29% 26.40% 09.17% 70 Stockfish 16 NNUE
21 132709 17.13% 29.89% 21.85% 71 Stockfish 16 HCE
Conclusion: Stockfish gains nearly +300 Celo, using nnue-net evaluation, compared to the old HCE. The EAS-Score of both Stockfish differs most (as I expected) in the bad-draw-category: The deeper positional understanding of the nnue-net prevents a lot of bad draws.
pohl4711 wrote: ↑Mon Mar 17, 2025 1:43 pm
Testrun of Stockfish 16 HCE for my full UHO-Ratinglist (14000 games (1000 games vs 14 different opponents)), for a comparison with Stockfish 16 NNUE finished. Let's see, how much progress the nnue-nets give, compared with the old handwritten evaluation (HCE). Games are stored in the Archive006 pgn-file and the full list pgn-file.
Full UHO-Ratinglist, 3min+1sec, singlethread:
Program Celo + - Games Score Av.Op. Draws
6 Stockfish 16 NNUE : 3779 3 3 39000 74.4% 3583 45.1%
103 Stockfish 16 HCE : 3484 4 4 14000 44.6% 3522 42.1%
Celo: Stockfish 16 NNUE to Stockfish 16 HCE: +295 Celo
bad avg.win
Rank EAS-Score sacs shorts draws moves Engine/player
-------------------------------------------------------------------
8 193748 21.29% 26.40% 09.17% 70 Stockfish 16 NNUE
21 132709 17.13% 29.89% 21.85% 71 Stockfish 16 HCE
Conclusion: Stockfish gains nearly +300 Celo, using nnue-net evaluation, compared to the old HCE. The EAS-Score of both Stockfish differs most (as I expected) in the bad-draw-category: The deeper positional understanding of the nnue-net prevents a lot of bad draws.
Incorrect conclusion, imho.
Better: Stockfish gains nearly +300 Celo, using nnue-net evaluation, compared to the old HCE with search improvements based on using NNUE eval. It's quite questionable if these search improvements are also beneficial with the old HCE ...
Btw., is the code of this Stockfish 16 HCE available?