RoundRobin competition among SF+NNUEs

corres · Post by **corres** » Thu Oct 29, 2020 11:15 am

In the competition the participants were such SF+NNUE engines which got new NNUE-net.
The results:
1. SF+NNUE (201005) net: nn-baeb9ef2d183 points = 205.5 (400 games)
2. SF+NNUE (201014) net: nn-04cf2b4ed1da points = 203.5 (400 games)
3. SF+NNUE (201018) net: nn-eba324f53044 points = 199.5 (400 games)
4. SF+NNUE (200921) net: nn-03744f8d56d8 points = 197.0 (400 games)
5. SF+NNUE (200902) net :nn-82215d0fd0df points = 194.5 (400 games)
The chess power of SF+NNUE mainly depend on its net but the results reflect the development of Stockfish source also.
The enhancement in Elo from SF+NNUE (200902 - Stockfish 12) is about ~15 Elo.
The used PC was:
OS - Win10 64 bits
CPU - AMD Ryzen9 3950x 16 x 4.0 GHz, RAM 32 GB, HASH = 2 GB
TC - 1 min + 2 sec / move
Openings - My HalfMiniBook (50 positions) with alternated colors (100 games)
Endgame database - syzygy 6 men + Nalimov 5 men (for Fritz 14 GUI only)

Ajedrecista · Post by **Ajedrecista** » Thu Oct 29, 2020 7:32 pm

Hello:

The relative performance of each engine with respect to the winning engine should be:

Code: Select all

Round Robin with 5 engines and 400 games per engine.
Total number of games: 1000 games.
 
               Engines:                   Performance:     Score:
 
SF+NNUE (201005) net: nn-baeb9ef2d183:         0.00       51.38 %
SF+NNUE (201014) net: nn-04cf2b4ed1da:        -2.61       50.88 %
SF+NNUE (201018) net: nn-eba324f53044:        -7.82       49.88 %
SF+NNUE (200921) net: nn-03744f8d56d8:       -11.08       49.25 %
SF+NNUE (200902) net: nn-82215d0fd0df:       -14.34       48.63 %

YMMV if you use EloSTAT, BayesElo or Ordo. My estimate should be very similar to EloSTAT. Regarding error bars, since all the engines are close to 50%, a quick hack is that score within ± 2 standard deviations is 0.5 ± 2*sqrt{[score*(1 - score) - draw_ratio_of_the_engine/4]/games_per_engine} = 0.5 ± 0.05*sqrt(1 - draw_ratio_of_the_engine). Giving circa ± 7 Elo per ± 1% around 50% [± 16/ln(10) ~ ± 6.9487 Elo is a better estimate following Taylor series, but ± 7 Elo is more than enough for a quick hack]:

Code: Select all

Supposing similar draw ratios (D) for each engine:

 D    Error bars (2-sigma)
36%           ± 28
64%           ± 21
84%           ± 14
96%           ±  7

I expect draw ratios over 75% or even over 85% with those scores, so error bars would still overlap. Anyway, thank you for running the RR.

Regards from Spain.

Ajedrecista.

corres · Post by **corres** » Fri Oct 30, 2020 10:54 am

Thanks for the addition.
Robert

corres · Post by **corres** » Sun Nov 01, 2020 1:38 pm

you can download the .pgn from
http://www.wikisend.com
File ID: 321306
Password: nnue

corres · Post by **corres** » Wed Nov 04, 2020 10:10 am

I continued my test with a match between the winner of roundrobin (SF+NNUE 201005) and the last developed one (SF+NNUE 201102).
The result was 3 : 1 for the SF+NNUE 201102 with 96 draw (100 games)
The hardware and the test method was as the above.

RoundRobin competition among SF+NNUEs

RoundRobin competition among SF+NNUEs

Re: Round Robin competition among SF + NNUEs.

Re: RoundRobin competition among SF+NNUEs

Re: RoundRobin competition among SF+NNUEs

Re: RoundRobin competition among SF+NNUEs