New opening-sets for a lower draw-rate in engine-testing

Uri Blass · Post by **Uri Blass** » Sat Oct 18, 2014 2:25 pm

For the question Who says, the engines must not disagree? then step number 6 take care that the engine cannot disagree by a big margin

Step 6: Checked the 10moves-database with Komodo 8, Houdini 4, Gull 3 (using PGNscanner (eval-interval
of +/-0.40)) and deleted all games if one engine-evaluation was outside the eval-interval.

Based on step 6 it is clear that there cannot be more than 0.8 pawn difference between 2 engines komodo houdini and gull because in this case at least one of the engine is going to have evaluation above 0.4

In many cases even difference of 0.4 is not accepted because difference of 0.4 is not always when both engines see 0.2 for themself and it can be that one engine say 0.5 for itself when the opponent say only 0.1 against itself.

lkaufman · Post by **lkaufman** » Sat Oct 18, 2014 4:50 pm

Limiting the openings to ones with kings castled on opposite sides is highly unrepresentative of real chess, and drastically changes what is being measured. I don't know who it favors, but Komodo would be a very different program if our goal was to optimize for openings with castling on opposite sides.

I agree with your goal of reducing draw percentage, but I think Uri's solution of just picking openings played by strong human players, with relatively low draw percentages, is the way to do this.

pohl4711 · Post by **pohl4711** » Sat Oct 18, 2014 5:16 pm

lkaufman wrote:Limiting the openings to ones with kings castled on opposite sides is highly unrepresentative of real chess, and drastically changes what is being measured. I don't know who it favors, but Komodo would be a very different program if our goal was to optimize for openings with castling on opposite sides.

I agree with your goal of reducing draw percentage, but I think Uri's solution of just picking openings played by strong human players, with relatively low draw percentages, is the way to do this.

Our tests didnt show measureable advantages for one of the top-engines (not even for Stockfish, the most aggressive engine) and we selected positions of all ECO-codes and tried to get a good mixture of opening systems.

Regards - Stefan

peter · Post by **peter** » Sat Oct 18, 2014 6:04 pm

lkaufman wrote: I agree with your goal of reducing draw percentage, but I think Uri's solution of just picking openings played by strong human players, with relatively low draw percentages, is the way to do this.

Picking openings from high rated over the board games with low percentage of draws only could be of much more bias favoring certain engines than evalutating those positions with engines additionally.
I don't think Sfefan's (Adam Hair's ones, that has been reduced by Stefan, as far as I understood) collections aren't about any other opening theory than the one human masters play.

Adam Hair · Post by **Adam Hair** » Sat Oct 18, 2014 6:38 pm

peter wrote:
lkaufman wrote: I agree with your goal of reducing draw percentage, but I think Uri's solution of just picking openings played by strong human players, with relatively low draw percentages, is the way to do this.
Picking openings from high rated over the board games with low percentage of draws only could be of much more bias favoring certain engines than evalutating those positions with engines additionally.
I don't think Sfefan's (Adam Hair's ones, that has been reduced by Stefan, as far as I understood) collections aren't about any other opening theory than the one human masters play.

Yes. The 10 move and 12 move PGNs that Stefan started with come from games that I collected about 4 years ago from the various ratings agencies. The only things I did to those games was truncate and remove the duplicates. If we compared those PGNs with similarly treated GM games, we would find a very large overlap (since the overwhelming majority of the opening books and pgns used by the rating agencies were developed from GM games).

Adam Hair · Post by **Adam Hair** » Sat Oct 18, 2014 6:48 pm

lkaufman wrote: I agree with your goal of reducing draw percentage, but I think Uri's solution of just picking openings played by strong human players, with relatively low draw percentages, is the way to do this.

Simply using low GM draw rates would leave a significant amount of games outside the range (+10cp, +50cp) according to Stockfish, Komodo, and Houdini at higher depths, if my work with the TCEC openings is any indication. I think that engine evaluations are needed in addition to the statistics.

peter · Post by **peter** » Sat Oct 18, 2014 7:29 pm

Adam Hair wrote:
lkaufman wrote: I agree with your goal of reducing draw percentage, but I think Uri's solution of just picking openings played by strong human players, with relatively low draw percentages, is the way to do this.
Simply using low GM draw rates would leave a significant amount of games outside the range (+10cp, +50cp) according to Stockfish, Komodo, and Houdini at higher depths, if my work with the TCEC openings is any indication. I think that engine evaluations are needed in addition to the statistics.

+1

lkaufman · Post by **lkaufman** » Sat Oct 18, 2014 7:55 pm

Adam Hair wrote:
lkaufman wrote: I agree with your goal of reducing draw percentage, but I think Uri's solution of just picking openings played by strong human players, with relatively low draw percentages, is the way to do this.
Simply using low GM draw rates would leave a significant amount of games outside the range (+10cp, +50cp) according to Stockfish, Komodo, and Houdini at higher depths, if my work with the TCEC openings is any indication. I think that engine evaluations are needed in addition to the statistics.

Adding a filter by engine score might be okay, but it's a minor detail compared to the restriction of opposite-castling games. Chess is not just about playing for mate, it's more about playing to win a pawn and promote it. Limiting openings to opposite castling gives overwhelming importance to specific attacking skills and de-emphasizes positional chess. If it doesn't seem to cchange results much of the top engines, that just indicates that they are well balanced and good at both attack and positional play. But it's easy to imagine a change that will weaken an engine in normal chess but help in opposite castling games, or the converse.

bob · Post by **bob** » Sat Oct 18, 2014 8:05 pm

pohl4711 wrote:Hi all,

In the last month, Hauke Lutz and me created 2 new opening-sets for engine-testing. We called them SALC (Short and Long Castling), because in all positions, white and black castled to opposite directions.

From now, I will use the SALC-sets for all of my testwork and you can download them on my website

http://spcc.beepworld.de

Below the ReadMe-File for further information about SALC:

This folder contains 2 PGN-databases and 2 EPD-databases:
- 10moves_SALC_500.pgn = 500 opening positions, well edited and mixed for serious testwork.
- 12moves_SALC_10k.pgn = 10000 opening positions for big tournaments or randomized opening selection.

- 10moves_SALC_500.epd and 12moves_SALC_10k.epd: Same (final) positions as in the .pgn-files but as EPD (final
board-positions only, without moves).
Use that files, when you use the LittleBlitzerGUI for testing, because the LittleBlitzerGUI has an
en-passant-bug (captured en-passant-pawns are not deleted by the LittleBlitzerGUI, if an en-passant-move
appears in the moves of the opening-PGN file (!!!))

Goal: Reduce the draw rate in engine-engine matches/testruns/tournaments (castling to opposite directions
with queens still on the board makes nice king-attacks possible...), because the faster the computers get,
the higher the quality of computerchess get and the higher the draw-rate in engine-engine-matches get...so
the computerchess is in danger to die the "draw-death" in the near future.
But we didnt want to go the simple way of using strange gambit-openings or positions with great (material)
imbalance. Take a look a the working steps-protocol below, where you can see, which filter-methods Hauke Lutz
used, in order to get only (nearly) balanced positions.

Idea and testwork/verification: Stefan Pohl
All work (editing, sorting) done by Hauke Lutz (using PGNscanner 0.92 (a really nice tool by Gabriel
Guillory) and EXCEL)

All games taken from Adam Hair's 12-moves-PGN openings database and his 10-moves-PGN openings-database.

So a big THANX to Adam Hair and Gabriel Guillory!!!

Here the protocol of the working steps:

Step 1 (by Stefan Pohl using the FritzGUI): Filter all games, where (at move 12 / 10) both sides still
have a queen and both sides castled to opposite directions.

= 17665 positions (12 moves deep) (out of 397457 games)
= 4602 positions (10 moves deep) (out of 199041 games)

Working steps 2-10 by Hauke Lutz (PGNscanner: thinking-time/position: 5 seconds (singlecore,
4.5 GHz (i7-4930k, Fritzmark 3367)):

Step 2: Checked both databases for duplicate games with the PGNscanner. Found nothing (nice work, Adam Hair!)

Step 3: Checked the 12moves-database with Komodo 8 (using PGNscanner (eval-interval of +/-0.50)) and deleted
all games with an evaluation outside the eval-interval.

Step 4: Deleted some games of the 12moves-database with ECO-code B and some games with white long castlings
for a better balance. Reduced the number of games to 10000. Used EXCEL for this.

Step 5: Mixed the games of the 12moves-database (by hand) by the castling-direction (we didnt want
some thousand games with white long castlings in a row followed by some thousand games with white short
castlings...)

Step 6: Checked the 10moves-database with Komodo 8, Houdini 4, Gull 3 (using PGNscanner (eval-interval
of +/-0.40)) and deleted all games if one engine-evaluation was outside the eval-interval.

Step 7: Checked the 10moves-database with Komodo 8 and Stockfish 5 (using PGNscanner (eval-interval
of +/-0.20)) and deleted all games if one engine-evaluation was inside the eval-interval, because we didnt
want positions which are too drawish.

Step 8: Counted/Analyzed the ECO-codes of the 10moves-database with EXCEL and deleted some ECO B+C positions
for a better ECO-code balance (and reduced the number of games/positions to 500).

Step 9: Mixed the 10moves-database (by hand) for a (nearly) uniform mixture of ECO-codes for better results,
if only a part of the database is used for an engine-testrun.

Step 10: 5 Bullet-testruns (singlecore, 20''+200ms, Stockfish 5 against Gull 3), using the complete 500
positions of the 10moves-database, and mixed the 10moves-database a second time, based on the
testrun-results (in 50 positions-blocks).

Step 11 (by Stefan Pohl): Changed the results of all games in the PGN-files to 1/2-1/2, deleted all
annotations (created by the PGNscanner) and created the EPD-files for using the SALC-openings in the
LittleBlitzerGUI.

A final gauntlet-testrun (singlecore, 70''+700ms) of Stockfish 140928 (1000 games against Houdini 4,
Komodo 7a, Gull 3, Fire 3 and Rybka 4.1 (=5000 games)) using the 10moves_500_SALC opening-positions-set
lowered the draw-rate down to 39.0% (original testrun (same conditions but using a "normal"
opening-positions-set (fq500n.pgn) with 500 positions) had a draw-rate of 47.9%.
So the number of draws was more than 18.5% lower with the SALC-set (!). And the overall score of Stockfish
was nearly the same (SALC-set: 1% lower (-7 Elo) = clearly inside the errorbars). And the aggressive
playing Stockfish-engine did not benefit from the SALC-positions (we were not sure about that...).
So the goal of the creation of the SALC-opening-positions-set was reached: a significant lower draw-rate,
while keeping the overall score nearly the same.
And - as a nice side effect - the testrun with the 10moves_500_SALC opening-positions-set took only 93 hours,
instead of the 100 hours, which the testrun with the "normal" opening-positions-set (fq500n.pgn) took.
That means around 7% less time- and power-consumption for the same number of played games...And all games
were adjusted as draw at move 120. With all games played to the end (technical draw), the timesaving would be
definitly higher (around 10%, we guess)...

Enjoy this next step of chess-engine matchplay and testwork. Less draws, more spectacular games/mates,
without distorting the test-results and scores !

Seems a bit dangerous to "use this for all my testing." It will train your eva to push pawns on the enemy king whenever possible. Which might not be the optimal plan in games where you both castle to the same side. Also it only tests 1/2 of the king safety evaluation, which might favor A over B if A pushes way to aggressively overall, but in these positions it is actually a good idea...

bob · Post by **bob** » Sat Oct 18, 2014 8:16 pm

Adam Hair wrote:
lkaufman wrote: I agree with your goal of reducing draw percentage, but I think Uri's solution of just picking openings played by strong human players, with relatively low draw percentages, is the way to do this.
Simply using low GM draw rates would leave a significant amount of games outside the range (+10cp, +50cp) according to Stockfish, Komodo, and Houdini at higher depths, if my work with the TCEC openings is any indication. I think that engine evaluations are needed in addition to the statistics.

I think +50 or whatever as a cutoff is not a particularly good idea. In fact, we (Cray Blitz, et. al.) gave up on this years ago. We found it better to drop out of book in unbalanced positions where there was play. If you look at Crafty's book learning code, that is the reason why I don't use the first move out of book to decide whether an opening is playable or not, I use the first 10 moves to see what is going on and use that information instead, because in some openings you can drop out at -50, but the eval is going to climb no matter what the opponent does, or vice-versa. Evals of +/- 0.20 really ought to mean the game is drawish. which means that screening the positions to within +/-20 while trying to avoid drawish positions seems like the methodology is diametrically opposed to the stated goal.

Given the potential unfairness with pure opposite castling games, and then the score screening, I am not quite sure what this test would actually measure other than how two programs compare in nothing but opposite castling positions.

New opening-sets for a lower draw-rate in engine-testing

Re: New opening-sets for a lower draw-rate in engine-testing

Re: New opening-sets for a lower draw-rate in engine-testing

Re: New opening-sets for a lower draw-rate in engine-testing

Re: New opening-sets for a lower draw-rate in engine-testing

Re: New opening-sets for a lower draw-rate in engine-testing

Re: New opening-sets for a lower draw-rate in engine-testing

Re: New opening-sets for a lower draw-rate in engine-testing

Re: New opening-sets for a lower draw-rate in engine-testing

Re: New opening-sets for a lower draw-rate in engine-testing

Re: New opening-sets for a lower draw-rate in engine-testing