SPCC: Testruns of Caissa 1.15 and Obsidian 9.0 finished

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
pohl4711
Posts: 2698
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

SPCC: Testruns of Caissa 1.15 and Obsidian 9.0 finished

Post by pohl4711 »

My UHO-Top15 Ratinglist is the world's first engine-ratinglist, using UHO-openings, and the world's first ratinglist offering additionally Gamepair-statistics.

Ratinglist-testruns of Caissa 1.15 and Obsidian 9.0 finished.

https://www.sp-cc.de

Also take a look at the EAS-Ratinglist, the world's first engine-ratinglist not measuring strength of engines but engines's style of play:
https://www.sp-cc.de/eas-ratinglist.htm


(Perhaps you have to clear your browsercache (press STRG+SHIFT+DEL) or reload the website))
Witek
Posts: 87
Joined: Thu Oct 07, 2021 12:48 am
Location: Warsaw, Poland
Full name: Michal Witanowski

Re: SPCC: Testruns of Caissa 1.15 and Obsidian 9.0 finished

Post by Witek »

Thanks for testing!
Though I think Obsidian's progress is more impressive than Caissa's :D
Author of Caissa Chess Engine: https://github.com/Witek902/Caissa
Modern Times
Posts: 3699
Joined: Thu Jun 07, 2012 11:02 pm

Re: SPCC: Testruns of Caissa 1.15 and Obsidian 9.0 finished

Post by Modern Times »

Hi Stefan,

I'm interested in creating 6 moves and 8 moves pgns -110 to -119 (advantage to black) from your UHO data but I'm not sure how to do it. Could you please give some basic instructions if you have time ? Thanks.
User avatar
pohl4711
Posts: 2698
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: SPCC: Testruns of Caissa 1.15 and Obsidian 9.0 finished

Post by pohl4711 »

Modern Times wrote: Tue Dec 26, 2023 6:21 am Hi Stefan,

I'm interested in creating 6 moves and 8 moves pgns -110 to -119 (advantage to black) from your UHO data but I'm not sure how to do it. Could you please give some basic instructions if you have time ? Thanks.
There are 2 ways, doing this, both very easy and quick from the UHO rawdata, which is included in the UHO_2022 download:

1) use pgn-extract. Download pgn-extract and put a .bat file in the same folder. In this .bat file you write:
pgn-extract --quiet --tagsubstr -Taeval=-11 rawdata_2022_8mvs.pgn --output games_8mvs.pgn
pgn-extract --quiet --tagsubstr -Taeval=-11 rawdata_2022_6mvs.pgn --output games_6mvs.pgn
pause

(If you run this .bat-file, it will create both files (for 6 and 8 mvs) in just a few seconds...)



2) Use FritzGUI or Chessbase and use the search-function (search for "eval=-11" in the Annotations). Put the found games in a new pgn-database...
Modern Times
Posts: 3699
Joined: Thu Jun 07, 2012 11:02 pm

Re: SPCC: Testruns of Caissa 1.15 and Obsidian 9.0 finished

Post by Modern Times »

Brilliant, that looks easy enough for me, I'll try a bit later.
Modern Times
Posts: 3699
Joined: Thu Jun 07, 2012 11:02 pm

Re: SPCC: Testruns of Caissa 1.15 and Obsidian 9.0 finished

Post by Modern Times »

pohl4711 wrote: Tue Dec 26, 2023 7:54 am
1) use pgn-extract. Download pgn-extract and put a .bat file in the same folder. In this .bat file you write:
pgn-extract --quiet --tagsubstr -Taeval=-11 rawdata_2022_8mvs.pgn --output games_8mvs.pgn
pgn-extract --quiet --tagsubstr -Taeval=-11 rawdata_2022_6mvs.pgn --output games_6mvs.pgn
pause
I created a UHO2022_8mvs-110_to_-129 book, two steps -Taeval=-11 and -Taeval=-12 and combined the result. 1460 lines.

I ran a small experiment, a round-robin with Stockfish, Torch, Dragon and Berserk. 100 game pairings with reversed sides, so 1200 games using 600 of the lines randomly selected by cutechess.

I got:

White wins: 7 (0.6%)
Black wins: 511 (42.6%)
Draws: 682 (56.8%)


Tiny sample of course, and with engines that have an Elo spread of only around 100 Elo.

I'm not sure what I expected to see. White wins is tiny, but if this was a Top15 with around 250 Elo spread then that number would be bigger.
Probably a similar sort of outcome as with White unbalanced openings I guess. I may extend the experiment a bit further out of curiosity.

Code: Select all

# PLAYER                  :  RATING  POINTS  PLAYED   (%)
   1 Stockfish 20230813      :  3840.4   376.5     600    63
   2 Torch v1                :  3783.0   313.0     600    52
   3 Dragon by Komodo 3.3    :  3742.8   268.0     600    45
   4 Berserk 12              :  3719.8   242.5     600    40

Code: Select all

Head to head statistics:

1) Stockfish 20230813   3840.4 :    600 (+208,=337,-55),  62.8 %

   vs.                         :  games (   +,   =,  -),   (%) :    Diff
   Torch v1                    :    200 (  62, 112, 26),  59.0 :   +57.4
   Dragon by Komodo 3.3        :    200 (  63, 123, 14),  62.3 :   +97.6
   Berserk 12                  :    200 (  83, 102, 15),  67.0 :  +120.6

2) Torch v1             3783.0 :    600 (+139,=348,-113),  52.2 %

   vs.                         :  games (   +,   =,   -),   (%) :    Diff
   Stockfish 20230813          :    200 (  26, 112,  62),  41.0 :   -57.4
   Dragon by Komodo 3.3        :    200 (  63, 110,  27),  59.0 :   +40.2
   Berserk 12                  :    200 (  50, 126,  24),  56.5 :   +63.2

3) Dragon by Komodo 3.3 3742.8 :    600 (+97,=342,-161),  44.7 %

   vs.                         :  games (  +,   =,   -),   (%) :    Diff
   Stockfish 20230813          :    200 ( 14, 123,  63),  37.8 :   -97.6
   Torch v1                    :    200 ( 27, 110,  63),  41.0 :   -40.2
   Berserk 12                  :    200 ( 56, 109,  35),  55.3 :   +23.0

4) Berserk 12           3719.8 :    600 (+74,=337,-189),  40.4 %

   vs.                         :  games (  +,   =,   -),   (%) :    Diff
   Stockfish 20230813          :    200 ( 15, 102,  83),  33.0 :  -120.6
   Torch v1                    :    200 ( 24, 126,  50),  43.5 :   -63.2
   Dragon by Komodo 3.3        :    200 ( 35, 109,  56),  44.8 :   -23.0

User avatar
pohl4711
Posts: 2698
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: SPCC: Testruns of Caissa 1.15 and Obsidian 9.0 finished

Post by pohl4711 »

Modern Times wrote: Fri Dec 29, 2023 2:46 am
I created a UHO2022_8mvs-110_to_-129 book, two steps -Taeval=-11 and -Taeval=-12 and combined the result. 1460 lines.
If you do so, use SCID to sort the new database by Elo. Then you can use the new pgn-file normally = sequentially. No need for random choice by cutechess (because this can lead to double games, cutechess doe not consider, if the new line chosen randomly, was not chosen before already...)