In the competition the participants were such SF+NNUE engines which got new NNUE-net.
The results:
1. SF+NNUE (201005) net: nn-baeb9ef2d183 points = 205.5 (400 games)
2. SF+NNUE (201014) net: nn-04cf2b4ed1da points = 203.5 (400 games)
3. SF+NNUE (201018) net: nn-eba324f53044 points = 199.5 (400 games)
4. SF+NNUE (200921) net: nn-03744f8d56d8 points = 197.0 (400 games)
5. SF+NNUE (200902) net :nn-82215d0fd0df points = 194.5 (400 games)
The chess power of SF+NNUE mainly depend on its net but the results reflect the development of Stockfish source also.
The enhancement in Elo from SF+NNUE (200902 - Stockfish 12) is about ~15 Elo.
The used PC was:
OS - Win10 64 bits
CPU - AMD Ryzen9 3950x 16 x 4.0 GHz, RAM 32 GB, HASH = 2 GB
TC - 1 min + 2 sec / move
Openings - My HalfMiniBook (50 positions) with alternated colors (100 games)
Endgame database - syzygy 6 men + Nalimov 5 men (for Fritz 14 GUI only)
RoundRobin competition among SF+NNUEs
Moderators: hgm, Rebel, chrisw
-
- Posts: 3657
- Joined: Wed Nov 18, 2015 11:41 am
- Location: hungary
-
- Posts: 1971
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: Round Robin competition among SF + NNUEs.
Hello:
The relative performance of each engine with respect to the winning engine should be:
YMMV if you use EloSTAT, BayesElo or Ordo. My estimate should be very similar to EloSTAT. Regarding error bars, since all the engines are close to 50%, a quick hack is that score within ± 2 standard deviations is 0.5 ± 2*sqrt{[score*(1 - score) - draw_ratio_of_the_engine/4]/games_per_engine} = 0.5 ± 0.05*sqrt(1 - draw_ratio_of_the_engine). Giving circa ± 7 Elo per ± 1% around 50% [± 16/ln(10) ~ ± 6.9487 Elo is a better estimate following Taylor series, but ± 7 Elo is more than enough for a quick hack]:
I expect draw ratios over 75% or even over 85% with those scores, so error bars would still overlap. Anyway, thank you for running the RR.
Regards from Spain.
Ajedrecista.
The relative performance of each engine with respect to the winning engine should be:
Code: Select all
Round Robin with 5 engines and 400 games per engine.
Total number of games: 1000 games.
Engines: Performance: Score:
SF+NNUE (201005) net: nn-baeb9ef2d183: 0.00 51.38 %
SF+NNUE (201014) net: nn-04cf2b4ed1da: -2.61 50.88 %
SF+NNUE (201018) net: nn-eba324f53044: -7.82 49.88 %
SF+NNUE (200921) net: nn-03744f8d56d8: -11.08 49.25 %
SF+NNUE (200902) net: nn-82215d0fd0df: -14.34 48.63 %
Code: Select all
Supposing similar draw ratios (D) for each engine:
D Error bars (2-sigma)
36% ± 28
64% ± 21
84% ± 14
96% ± 7
Regards from Spain.
Ajedrecista.
-
- Posts: 3657
- Joined: Wed Nov 18, 2015 11:41 am
- Location: hungary
Re: RoundRobin competition among SF+NNUEs
Thanks for the addition.
Robert
Robert
-
- Posts: 3657
- Joined: Wed Nov 18, 2015 11:41 am
- Location: hungary
-
- Posts: 3657
- Joined: Wed Nov 18, 2015 11:41 am
- Location: hungary
Re: RoundRobin competition among SF+NNUEs
I continued my test with a match between the winner of roundrobin (SF+NNUE 201005) and the last developed one (SF+NNUE 201102).
The result was 3 : 1 for the SF+NNUE 201102 with 96 draw (100 games)
The hardware and the test method was as the above.
The result was 3 : 1 for the SF+NNUE 201102 with 96 draw (100 games)
The hardware and the test method was as the above.