SPCC: Drawkiller openings updated to V2.00

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
pohl4711
Posts: 2435
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

SPCC: Drawkiller openings updated to V2.00

Post by pohl4711 »

The Drawkiller Openings were updated to V2.00. What's new?

https://www.sp-cc.de/drawkiller-openings.htm

I added 2 sets: Drawkiller balanced and Drawkiller balanced small500, both contain lines, only, which are within a very small eval-interval in the endpositions of [-0.09;+0.09]. That leads to a little bit higher draw-rates, but also to wider Elo-spreadings of the engine-results.

Here the testrun of the new Drawkiller balanced set (and testruns of Drawkiller tournament, Stockfish Framework 8moves and GM-4moves sets for comparsion).
3 engines played a RoundRobin (Stockfish 10, Houdini 6 and Komodo 12), with 500 games in each head-to-head, so each engine played 1000 games. For each game one opening-line was chosen per random by the LittleBlitzerGUI.
Singlecore, 3'+1'', LittleBlitzerGUI, no ponder, no bases, 256 MB Hash, i7-6700HQ 2.6GHz Notebook (Skylake CPU), Windows 10 64bit
In the Drawkiller balanced sets, all endposition-evals (analyzed by Komodo) of the opening lines are in a very small interval of [-0.09;+0.09]. The idea is, that this should lead to wider Elo-spreading of the Engine ratings, which makes the Engine rankings much more statistically reliable (or a much lower number of played games is needed, to get the results out of the errorbar-arrays). Of course, on the other hand, this concept leads to little bit higher draw-rates...
Let's see, if it worked:

Drawkiller balanced:

Code: Select all

     Program                Elo    +    -   Games   Score   Av.Op.  Draws
   1 Stockfish 10 bmi2    : 3506   11   11  1000    70.9 %   3347   36.2 %
   2 Houdini 6 pext       : 3392   11   11  1000    48.5 %   3404   40.8 %
   3 Komodo 12 bmi2       : 3302   11   11  1000    30.6 %   3449   36.6 %

Elo-spreading (1st to last): 204 Elo
Draws: 37.9%

Drawkiller tournament:

Code: Select all

     Program                Elo    +    -   Games   Score   Av.Op.  Draws
   1 Stockfish 10 bmi2    : 3494   11   11  1000    68.9 %   3353   34.2 %
   2 Houdini 6 pext       : 3387   11   11  1000    47.3 %   3407   38.2 %
   3 Komodo 12 bmi2       : 3320   11   11  1000    33.8 %   3440   36.0 %
 
Elo-spreading (1st to last): 174 Elo
Draws: 36.1%

GM_4moves:

Code: Select all

     Program                Elo    +    -   Games   Score   Av.Op.  Draws
   1 Stockfish 10 bmi2    : 3475   11   11  1000    65.4 %   3363   53.2 %
   2 Houdini 6 pext       : 3381   10   10  1000    46.0 %   3410   59.9 %
   3 Komodo 12 bmi2       : 3345   10   10  1000    38.5 %   3428   55.9 %
 
Elo-spreading (1st to last): 130 Elo
Draws: 56.3%

Stockfish framework 8moves:

Code: Select all

     Program                Elo    +    -   Games   Score   Av.Op.  Draws
   1 Stockfish 10 bmi2    : 3463   11   11  1000    63.0 %   3369   59.7 %
   2 Houdini 6 pext       : 3388   10   10  1000    47.5 %   3406   64.2 %
   3 Komodo 12 bmi2       : 3349   10   10  1000    39.5 %   3425   60.1 %
 
Elo-spreading (1st to last): 114 Elo
Draws: 61.3%
Conclusions:

1) The Drawkiller balanced idea was a success. The draw-rate is a little bit higher, than Drawkiller tournament (that is price, we have to pay for 2)), but look at point 2) and mention, that even this little higher draw-rate is still much, much lower, than the draw-rate of any other non-Drawkiller openings set...

2) The Elo-spreading, using Drawkiller balanced, was measureable higher, than with any other openings-set. That makes the Engine rankings much more statistical reliable. Or a much lower number of played games is needed, to get the results out of the errorbar-arrays:
Example: Compared to the result of Stockfish framework 8moves openings, the Elo-spreading of Drawkiller balanced is nearly doubled, which means, you can have a doubled errorbar-array size for the same statistical reliability of the Engine rankings in a tournament / ratinglist. Mention, that you have to play 4x more games to half the size of an errorbar! That means, if you are using Drawkiller balanced openings, you have to play only 25%-30% amount of games, which you have to play, when using Stockfish Framework 8move openings for the same statistical result-quality of engine rankings (!!!) - how awesome is that?!?