Drawkiller Openings Project

pohl4711 · Post by **pohl4711** » Sat Nov 24, 2018 11:45 am

ThatsIt wrote: ↑Sat Nov 24, 2018 11:16 am I've done nearly the same more than 10 years ago.
Not for an openingbook, but for starting positions.
After a few days i've cancelled the tests because
i got way to much similar ECO-Codes.
Whats your spread of the ECO-Codes in test-matches ?

Best wishes,
G.S.
(CEGT team)

I am convinced, that ECO-codes have no meaning in Computerchess and for engine-ratings and testings. And in all Drawkiller openings, the kings are on (a1 and h8) or (h1 and a8) - no human chess ECO-code at all, I think. So what?

ThatsIt · Post by **ThatsIt** » Sat Nov 24, 2018 12:02 pm

pohl4711 wrote: ↑Sat Nov 24, 2018 11:45 am I am convinced, that ECO-codes have no meaning in Computerchess and for engine-ratings and testings. And in all Drawkiller openings, the kings are on (a1 and h8) or (h1 and a8) - no human chess ECO-code at all, I think. So what?

So what ?
It's all a question of (the) perspective.

Thanks for your answer anyway.

G.S.
(CEGT team)

Guenther · Post by **Guenther** » Sat Nov 24, 2018 12:14 pm

pohl4711 wrote: ↑Sat Nov 24, 2018 11:45 am
ThatsIt wrote: ↑Sat Nov 24, 2018 11:16 am I've done nearly the same more than 10 years ago.
Not for an openingbook, but for starting positions.
After a few days i've cancelled the tests because
i got way to much similar ECO-Codes.
Whats your spread of the ECO-Codes in test-matches ?

Best wishes,
G.S.
(CEGT team)
I am convinced, that ECO-codes have no meaning in Computerchess and for engine-ratings and testings. And in all Drawkiller openings, the kings are on (a1 and h8) or (h1 and a8) - no human chess ECO-code at all, I think. So what?

Sorry, I don't think it measures chess capabilities, but sth else.
King safety and castling, especially when to castle or delay it, are a big part of chess knowledge and how a game develops.

I have no doubts about the lower draw rate. but IMHO it is a different game which is measured.

lucasart · Post by **lucasart** » Sat Nov 24, 2018 12:35 pm

pohl4711 wrote: ↑Sat Nov 24, 2018 11:22 am
lucasart wrote: ↑Sat Nov 24, 2018 9:16 am
It was generated as follows:

Round(X%)

generate all legal moves, and retain the positions

randomly weed out X% of positions

filter the bad ones: play Critter 1.6 vs. itself quick blitz game. If the Critter scores stays within bounds +/- 0.5 for 5 moves (10 plies)

From the starting position, apply recursively:

Round(0%) x 2

Round(75%) x 3

This results in 5 ply positions (2.5 moves), which have nice properties:

fairness: no position should be clearly favoring either side. At least good enough for bullet games used in engine dev. There may be some shenanigans revealed by deeper search, but I don't care for engine dev (ie. 4"+0.04" games where tens of thousands of games are played to test each patch, and it's all about statistics, not chess).

coverage: due to the random nature of the positions, we get a very wide shallow tree, which means we explore much more than the usual chess opening theory. we force the engines to go into deep woods, which is good to avoid the over fitting problem.

low draw rate. But this is not done by cheating like TCEC does. I don't allow largely biaised positions where one side has a practically won game out of the opening. This is achieved by keeping the book extremely shallow and forcing engines to explore deep woods of unknown openings.

good information: recombination risk is kept to a minimum, thanks to the 3 rounds of 25% random selection.

Interesting approach! I started the testrun right now. Will take around 4 days. I will report the results here.

I think I added another condition in the rounds. That each move played does not decrease the PST eval (PST = Piece Square Table). This is to ensure we are playing developing moves. It avoids pieces moving back. For example, a valid 5 ply random opening sequence could include some silly back and forth like 1. Nf3 (developping), followed by 2. Ng1 (un-developping). This also helps reduce the recombination risk.

pohl4711 · Post by **pohl4711** » Sat Nov 24, 2018 2:46 pm

I really like your approach. The start of the testrun of your openings EPD was really good...
Is it possible, to make the pgn of your openings downloadable? Only with pgn, which contain the moves, which are leading to the endpositions, it is possible to build an opening book for Fritz, Shredder and Arena. And I think, that would be very cool!

lucasart · Post by **lucasart** » Sat Nov 24, 2018 3:07 pm

pohl4711 wrote: ↑Sat Nov 24, 2018 2:46 pm I really like your approach. The start of the testrun of your openings EPD was really good...
Is it possible, to make the pgn of your openings downloadable? Only with pgn, which contain the moves, which are leading to the endpositions, it is possible to build an opening book for Fritz, Shredder and Arena. And I think, that would be very cool!

No, I did not keep the inner nodes of the tree. All I want are the leaves. And I don't think it makes sense to use this procedure to build an opening book. It's really an opening set, that's it. The moves leading to the leaves are quasi random. They're just not game losing blunders, but they are otherwise random and not positionally precise or logical. An engine playing with such a book would likely have a disadvantage against a bookless engine (besides the small clock disadvantage that comes with booklessness).

pohl4711 · Post by **pohl4711** » Wed Nov 28, 2018 11:37 am

lucasart wrote: ↑Sat Nov 24, 2018 4:46 am
pohl4711 wrote: ↑Fri Nov 23, 2018 12:58 pm My Drawkiller Openings Project is finished.

Never before any openings-set gave such low draw-rates without crunching the scores of the engines towards 50%, but instead pushing the scores away from 50%. The Drawkiller Normal- and Tournament sets nearly halve the draw-rate, compared to FEOBOS or the Stockfish Framework 8-move openings. I would never have expected, that this was possible – the Drawkiller project is really a breakthrough into another dimension of Computerchess. Look at my testing results:

(asmFish 170426 vs. Komodo 10.4, 5'+3'' time-control, singlecore, no ponder, no endgame-bases, LittleBlitzerGUI, 1000 games each testrun(!) except Noomen Gambit-lines (only 246 positions, so 492 games were played) and Noomen TCEC Superfinal (only 100 positions, so 200 games were played))

Stockfish Framework standard 8 move openings: Score 60.3% – 39.7%, draws: 63.4%
FEOBOS v20 contempt 5 top 500 openings: Score 58.7% - 41.3%, draws: 64.1%
HERT 500 set: Score: 60.6% - 39.4%, draws: 60.4%
Noomen Gambit-Lines: Score 59.1% - 40.9%, draws: 59.3
4 GM-moves short book: Score 60.5% - 39.5%, draws: 57.1%
Noomen TCEC Superfinal (Season 9+10): Score: 62.5% - 37.5%, draws: 50.0%
SALC V5 half-closed: Score 61.6% - 38.4%, draws: 49.2%
SALC V5 full-closed 500 positions: Score 66.5% - 33.5%, draws: 47.7%

NEW:

Drawkiller (Big set): Score 63.8% - 36.2%, draws: 39.5%
Drawkiller (Normal set): Score: 65.3% - 34.7%, draws: 33.5%
Drawkiller (Tournament set): Score: 65.3% - 34.7%, draws: 33.5%

(no mistake by me: the results of the normal-set and the tournament-set were exactly the same after 1000 played games in my testruns)

Learn more about Drawkiller openings in the "Drawkiller openings"- section on my website and download them!

https://www.sp-cc.de
Interesting. Opening sets that maximize the amount of information in this way are precious for engine development.

I use this one for developing Demolito (which I generated myself):
https://github.com/zamar/spsa/blob/master/book.epd

Would you mind giving it a spin to see how it compares with yours ?

PS: For engine dev you need big sets (ie. 30k at least), and you also need to verify recombination risk (how frequently do games from opening X transpose into games from opening Y, this reduces information and you won't notice it by just measuring scores and draw rates).

Done. 1000 games played with your openings. Exactly the same conditions as for the other testruns (above)

Lucasart short-openings set: 64.2% - 35.8% , draws: 52.2%

A quite good result, the score of asmFish (64.2%) is really good, the draw-rate is OK. What is strange, is the bad white-score (a white score below 50% never happened in my testruns before):
White Wins : 210 (21.0 %)
Black Wins : 268 (26.8 %)
Draws : 522 (52.2 %)
White Score : 47.1 %
Black Score : 52.9 %

I wrote a PM to you, with a download-link to my goole-drive, where you can download the 1000 played games.pgn

Regards - Stefan (SPCC)

lucasart · Post by **lucasart** » Wed Nov 28, 2018 12:30 pm

pohl4711 wrote: ↑Wed Nov 28, 2018 11:37 am
lucasart wrote: ↑Sat Nov 24, 2018 4:46 am
pohl4711 wrote: ↑Fri Nov 23, 2018 12:58 pm My Drawkiller Openings Project is finished.

Never before any openings-set gave such low draw-rates without crunching the scores of the engines towards 50%, but instead pushing the scores away from 50%. The Drawkiller Normal- and Tournament sets nearly halve the draw-rate, compared to FEOBOS or the Stockfish Framework 8-move openings. I would never have expected, that this was possible – the Drawkiller project is really a breakthrough into another dimension of Computerchess. Look at my testing results:

(asmFish 170426 vs. Komodo 10.4, 5'+3'' time-control, singlecore, no ponder, no endgame-bases, LittleBlitzerGUI, 1000 games each testrun(!) except Noomen Gambit-lines (only 246 positions, so 492 games were played) and Noomen TCEC Superfinal (only 100 positions, so 200 games were played))

Stockfish Framework standard 8 move openings: Score 60.3% – 39.7%, draws: 63.4%
FEOBOS v20 contempt 5 top 500 openings: Score 58.7% - 41.3%, draws: 64.1%
HERT 500 set: Score: 60.6% - 39.4%, draws: 60.4%
Noomen Gambit-Lines: Score 59.1% - 40.9%, draws: 59.3
4 GM-moves short book: Score 60.5% - 39.5%, draws: 57.1%
Noomen TCEC Superfinal (Season 9+10): Score: 62.5% - 37.5%, draws: 50.0%
SALC V5 half-closed: Score 61.6% - 38.4%, draws: 49.2%
SALC V5 full-closed 500 positions: Score 66.5% - 33.5%, draws: 47.7%

NEW:

Drawkiller (Big set): Score 63.8% - 36.2%, draws: 39.5%
Drawkiller (Normal set): Score: 65.3% - 34.7%, draws: 33.5%
Drawkiller (Tournament set): Score: 65.3% - 34.7%, draws: 33.5%

(no mistake by me: the results of the normal-set and the tournament-set were exactly the same after 1000 played games in my testruns)

Learn more about Drawkiller openings in the "Drawkiller openings"- section on my website and download them!

https://www.sp-cc.de
Interesting. Opening sets that maximize the amount of information in this way are precious for engine development.

I use this one for developing Demolito (which I generated myself):
https://github.com/zamar/spsa/blob/master/book.epd

Would you mind giving it a spin to see how it compares with yours ?

PS: For engine dev you need big sets (ie. 30k at least), and you also need to verify recombination risk (how frequently do games from opening X transpose into games from opening Y, this reduces information and you won't notice it by just measuring scores and draw rates).
Done. 1000 games played with your openings. Exactly the same conditions as for the other testruns (above)

Lucasart short-openings set: 64.2% - 35.8% , draws: 52.2%

A quite good result, the score of asmFish (64.2%) is really good, the draw-rate is OK. What is strange, is the bad white-score (a white score below 50% never happened in my testruns before):
White Wins : 210 (21.0 %)
Black Wins : 268 (26.8 %)
Draws : 522 (52.2 %)
White Score : 47.1 %
Black Score : 52.9 %

I wrote a PM to you, with a download-link to my goole-drive, where you can download the 1000 played games.pgn

Regards - Stefan (SPCC)

Thanks.

Yes, white has no advantage here, by construction. It should be fairly equal for both sides. Even slightly favorable for black who only played 2 dubious (quasi random) moves, whereas white played 3. The moves don't matter. It's just the positions that are useful for testing, getting varied game with good coverage of the game space, and high information.

But information (in a statistical sense) is what matters for me, not just draw rate. It's an important distinction to understand. TCEC openings are quite poor, despite the low draw rate. When you have an opening that gives white a practically won game, that means 2 games that provide zero information: engine A will win with white, then engine B will win with white, no statistical information obtained as to the relative strength of A vs. B, so 2 games wasted (unless the elo diff between engines is so large that you don't end-up with a double 1-0 score, but for engine developement this is never the case, you want to measure small incremental elo gains).

That being said, your draw killer draw rate is impressive. I'll definitely have a look at your openings.

pohl4711 · Post by **pohl4711** » Wed Nov 28, 2018 1:09 pm

lucasart wrote: ↑Wed Nov 28, 2018 12:30 pm

But information (in a statistical sense) is what matters for me, not just draw rate. It's an important distinction to understand. TCEC openings are quite poor, despite the low draw rate. When you have an opening that gives white a practically won game, that means 2 games that provide zero information: engine A will win with white, then engine B will win with white, no statistical information obtained as to the relative strength of A vs. B, so 2 games wasted (unless the elo diff between engines is so large that you don't end-up with a double 1-0 score, but for engine developement this is never the case, you want to measure small incremental elo gains).

This is of course correct. Thats why it is important to look at the scores of asmFish and Komodo in that testruns, not only at the draw-rate. A draw has the same effect on the scores as two wins for white or two wins for black out of one position: The score of both engines will be pushed towards 50% (because both engines will get a win and a loose, which is the same as 2 draws = 1 point for each engine out of 2 games out of one opening-position = 50% score).
With the "normal, classical" openings (FEOBOS, SF 8 moves, HERT), asmFish-score was around 58-60%. An improved opening-set should give a better score for asmFish (more away from 50%) and a lower score for Komodo (more away from 50%, too), because of a lower number of draws and not too much 2 wins for one color out of one openings-position.

And the score of asmFish of your set was 64.2% (really good), SALC V5 big set: 61.6% (okay...) and
Drawkiller (Big set): Score 63.8% - 36.2%, draws: 39.5%
Drawkiller (Normal set): Score: 65.3% - 34.7%, draws: 33.5%
Drawkiller (Tournament set): Score: 65.3% - 34.7%, draws: 33.5%

So, Drawkiller Normal and Tournament have the best scores of asmFish and Komodo (most far away from 50%), too, not only the lowest draw-rate. And the Drawkiller Big-set has 63.8%, which is good, but not overwhelming and very close to your openings.
Lucasart short-openings set: 64.2% - 35.8% , draws: 52.2%

Regards - Stefan (SPCC)

PS: The Drawkiller openings were filtered with Komodo 11.2.2. Komodo checked all endpositions (using pgnscanner-tool), running on a i7-6700HQ 2.6GHz Notebook (Skylake CPU) with all 4 cores and 2048 Hash, Contempt=0.
The Komodo evaluation had to be in that interval for the big-files and the normal-files, otherwise, the position was deleted:
eval: [-0.49;-0.10] or [+0.10;+0.49]
For the tournament-files, the Komodo evaluation had to be in an even smaller interval:
eval: [-0.39;-0.20] or [+0.20;+0.39]
You can see, that the eval-intervals are quite small. No endposition of any Drawkiller opening can give a huge advantage to white or black!
Thinking-time for each endposition was:
Normal/Tournament set: 45''
Big set: 30''
( it took several months to finish these calculations!)

Michel · Post by **Michel** » Wed Nov 28, 2018 4:47 pm

IMHO the correct way to compare opening books (for not too large elo differences and assuming balanced openings so that the pentanomial model is not needed) is to calculate (w-l)/sqrt(w+l) (*) where w,l are respectively the win,loss ratio. This quantity has a standard deviation of 1/sqrt(N) so one can also easily verify if differences are significant.

The formula (*) multiplied by sqrt(N) is an approximation for the z value. So it is a measure for how many games are needed (both fixed length and sprt) to separate two engines with a given level of significance.

Drawkiller Openings Project

Re: Drawkiller Openings Project

Re: Drawkiller Openings Project

Re: Drawkiller Openings Project

Re: Drawkiller Openings Project

Re: Drawkiller Openings Project

Re: Drawkiller Openings Project

Re: Drawkiller Openings Project

Re: Drawkiller Openings Project

Re: Drawkiller Openings Project

Re: Drawkiller Openings Project