SPCC: 3 topengines longtime tournament finished

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

User avatar
pohl4711
Posts: 2435
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

SPCC: 3 topengines longtime tournament finished

Post by pohl4711 »

Huge (450 games, 8 days) RoundRobin tournament of the 3 top-engines (SF 201022, KomodoDragon 1.0 and Lc0 0.26.3 J92-260), using my new Unbalanced Human Openings (UHO 1.0) finished. See the results and download the games in the "NN vs SF testing"- section (scroll down to the bottom of the site!). Long thinking-time (Hexacore Intel CPU and RTX 2060 GPU): Lc0: 5'+3'' and Stockfish/KomodoDragon: 7.5'+4.5'' (means a perfect Leela-Ratio of 1.0). Average game-duration: 20 minutes.

Code: Select all

1 Stockfish 201022 bmi2       : 3656 300 (+111,=149,- 40), 61.8 %
KomodoDragon 1.0 x64          : 150 (+ 58,= 73,- 19), 63.0 %
Lc0 0.26.3 J92-260 (30x384)   : 150 (+ 53,= 76,- 21), 60.7 %

2 Lc0 0.26.3 J92-260 (30x384) : 3573 300 (+ 55,=156,- 89), 44.3 %
Stockfish 201022 bmi2         : 150 (+ 21,= 76,- 53), 39.3 %
KomodoDragon 1.0 x64          : 150 (+ 34,= 80,- 36), 49.3 %

3 KomodoDragon 1.0 x64        : 3571 300 (+ 55,=153,- 92), 43.8 %
Stockfish 201022 bmi2         : 150 (+ 19,= 73,- 58), 37.0 %
Lc0 0.26.3 J92-260 (30x384)   : 150 (+ 36,= 80,- 34), 50.7 %
My Unbalanced Human Openings worked extremly well! The overall draw-rate was only 50.9%. The other longtime-testruns, I did before (Stockfish vs. Lc0) with the same conditions, but played with the Noomen lowdraw openings, had an overall draw-rate around 66% (!). A draw-rate of only 50.9% is an extremly low value, when 3 top-engines play with such long thinking-time! Using any classical opening set should give a draw-rate somewhere above 70%-75% here (at least!).


https://www.sp-cc.de/nn-vs-sf-testing.htm

(Perhaps you have to clear your browsercache or reload the website)
User avatar
Ozymandias
Posts: 1535
Joined: Sun Oct 25, 2009 2:30 am

Re: SPCC: 3 topengines longtime tournament finished

Post by Ozymandias »

Nice, looking forward to more games. BTW, it still says
Openings: 150 Noomen lowdraws openings
Pi4Chess
Posts: 253
Joined: Mon Nov 16, 2020 12:13 pm
Full name: Manuel Rivera

Re: SPCC: 3 topengines longtime tournament finished

Post by Pi4Chess »

. Nice tourney and Thx for your work on books for engine tourneys with better elo spreading !

I always use this unbalanced opening book to quickly compare near elo engines.
User avatar
pohl4711
Posts: 2435
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: SPCC: 3 topengines longtime tournament finished

Post by pohl4711 »

Ozymandias wrote: Sat Nov 21, 2020 10:02 am Nice, looking forward to more games. BTW, it still says
Openings: 150 Noomen lowdraws openings
I am still thinking about, if I will keep using these openings for the "normal" longtime testruns of SF vs. Lc0, because they are closer to TCEC-conditions...
User avatar
pohl4711
Posts: 2435
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: SPCC: 3 topengines longtime tournament finished

Post by pohl4711 »

Pi4Chess wrote: Sat Nov 21, 2020 4:19 pm . Nice tourney and Thx for your work on books for engine tourneys with better elo spreading !

I always use this unbalanced opening book to quickly compare near elo engines.
Good choice! (smile)

For doing so (quickly compare near elo engines), I strongly recommend to try out my Advanced Armageddon scoring system (tools for automatic rescoring are in the Armageddon_Tools-folder of the UHO-download - the "livescoring_advanced.bat"-tool can be used "live", when an engine-tournament is still running, if the GUI stores the played games in a pgn-file (except FritzGUI, most GUIs do so))...
The Advanced Armageddon scoring doubles all Elo-distances between engines (at least), so you will see much quicker, which engine is better, if they are close in strength.
Pi4Chess
Posts: 253
Joined: Mon Nov 16, 2020 12:13 pm
Full name: Manuel Rivera

Re: SPCC: 3 topengines longtime tournament finished

Post by Pi4Chess »

pohl4711 wrote: Sat Nov 21, 2020 4:37 pm
Good choice! (smile)

For doing so (quickly compare near elo engines), I strongly recommend to try out my Advanced Armageddon scoring system (tools for automatic rescoring are in the Armageddon_Tools-folder of the UHO-download - the "livescoring_advanced.bat"-tool can be used "live", when an engine-tournament is still running, if the GUI stores the played games in a pgn-file (except FritzGUI, most GUIs do so))...
The Advanced Armageddon scoring doubles all Elo-distances between engines (at least), so you will see much quicker, which engine is better, if they are close in strength.
Oh... I read about your armageddon system but didn't know you has special utilities to handle the scoring. I did not try because of the hassle of manually converting the scores 😇 but now I will look into it and give it a try (at least for my Windows pc but i bet some script can be done aswell for Linux users).

Thx again for your work and reliable results ☺️👍
Alayan
Posts: 550
Joined: Tue Nov 19, 2019 8:48 pm
Full name: Alayan Feh

Re: SPCC: 3 topengines longtime tournament finished

Post by Alayan »

pohl4711 wrote: Sat Nov 21, 2020 8:06 am My Unbalanced Human Openings worked extremly well! The overall draw-rate was only 50.9%. The other longtime-testruns, I did before (Stockfish vs. Lc0) with the same conditions, but played with the Noomen lowdraw openings, had an overall draw-rate around 66% (!). A draw-rate of only 50.9% is an extremly low value, when 3 top-engines play with such long thinking-time! Using any classical opening set should give a draw-rate somewhere above 70%-75% here (at least!).
The best measure is not draw rate but "share of games won without losing the reverse". I expect UHO would still dominate the Noomen openings and the classical opening set in this measure, but I'd be interested in the numbers with all three opening sets...
User avatar
pohl4711
Posts: 2435
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: SPCC: 3 topengines longtime tournament finished

Post by pohl4711 »

Alayan wrote: Sat Nov 21, 2020 5:41 pm
pohl4711 wrote: Sat Nov 21, 2020 8:06 am My Unbalanced Human Openings worked extremly well! The overall draw-rate was only 50.9%. The other longtime-testruns, I did before (Stockfish vs. Lc0) with the same conditions, but played with the Noomen lowdraw openings, had an overall draw-rate around 66% (!). A draw-rate of only 50.9% is an extremly low value, when 3 top-engines play with such long thinking-time! Using any classical opening set should give a draw-rate somewhere above 70%-75% here (at least!).
The best measure is not draw rate but "share of games won without losing the reverse". I expect UHO would still dominate the Noomen openings and the classical opening set in this measure, but I'd be interested in the numbers with all three opening sets...
Thats because, in my testruns for my opening-sets compared to other sets, I measure the Elo-spreading of results. Because 1:1 pairs (both games of Engine A vs. Engine B with the same opening (repeated with reversed colors) are won for white (or black)) are as bad as 2 draws here for Elo-spreading (because both engines get 1 point of 2 (=50%), so the Elo-spreading is lowered). So, measuring the Elo-spreading is the real measuring the quality of openings, not the draw-rate.
In the UHO-download all played testgames are included. But I dont know, how to count 1:1 or double draw of one Engine-pairing playing one opening twice, automatically. But, if you want to look on these results - no problem. There are testgames of classical opening-sets, UHO sets-tests...


Overview possible 9 results of Engine A vs. Engine B, playing one opening in 2 games (repeated with reversed colors):

good result: Not a 50%-50% score (not 1-1 points): Increases Elo-spreading (or lower it, if the weaker engine scores more than 50%)
bad result: 50%-50% score (1-1 points): Always lowers Elo-spreading

1) 1-0, 1-0 : bad
2) 1-0, draw: good
3) 1-0, 0-1: good (very good! 2-0 for one Engine!)
4) draw, 1-0: good
5) draw, draw: bad
6) draw, 0-1: good
7) 0-1, 1-0: good (very good! 2-0 for one Engine!)
8) 0-1, draw: good
9) 0-1, 0-1: bad