With Houdini 3 at 2.5''+0.05'', Komodo 5.1 at 5''+0.1'', Rybka 4.1 at 5''+0.1'', Stockfish 3 at 7.5''+0.15'' to somewhat equalize the strengths, I put them to play from different length opening lines using SWCR.PGN and GM2006.PGN game collections, the opening length being 4, 16 and 30 plies. I expected a more spectacular result, with Houdini showing a better performance at only 4 ply length openings, as it's very good at Chess 960, but it's not the case, all four engines show similar behavior and dependency with the opening length. The longer the opening lines, more are the draws (9% more draws from 4 to 30 plies), and the ranking is compressed (from 76 points difference at 4 plies to 56 points at 30 plies openings). The strength shown subjected to compression remains very stable, so there is no danger of distorting the order in the rankings from longer or shorter openings of testers and testing groups.
Laskos wrote:...(from 76 points difference at 4 plies to 56 points at 30 plies openings...
What is the average game length for 4 and 30 plies? If there is a bigger difference in the "self calculated moves" it is like a longer o shorter time control which might increase the draw rate.
But interesting tha besides the higher draw rate notin changes in ranking ...
Laskos wrote:...(from 76 points difference at 4 plies to 56 points at 30 plies openings...
What is the average game length for 4 and 30 plies? If there is a bigger difference in the "self calculated moves" it is like a longer o shorter time control which might increase the draw rate.
But interesting tha besides the higher draw rate notin changes in ranking ...
Interesting, thx
Ingo
I looked in SCID for the game length, for 4 plies the average was about 70, for 16 about 74, for 30 about 77. And the time control was with increment, so the total effective TC didn't increase with longer openings by more than 2-3%. It cannot account for such an increase in draw rate (from 31% to 40%). Yes, I found surprising that the rating just compresses due to higher draw rate, but the relative ratings remain almost unchanged. Sure, this is with many games and neutral openings, if one begins to set all sorts of traps in the openings, the picture might look different, but I was not interested in that. My gut feeling was that Houdini deals better with the opening phase of the game by itself, without a book, but as it turns out, all the 4 top engines behave the same. And I somehow mistrusted tests with openings longer than 12 moves, it turns out that they are valid, if the openings are neutral, at least for these 4 engines (only that we should expect a higher draw rate).
Laskos wrote:...(from 76 points difference at 4 plies to 56 points at 30 plies openings...
What is the average game length for 4 and 30 plies? If there is a bigger difference in the "self calculated moves" it is like a longer o shorter time control which might increase the draw rate.
But interesting tha besides the higher draw rate notin changes in ranking ...
Interesting, thx
Ingo
I looked in SCID for the game length, for 4 plies the average was about 70, for 16 about 74, for 30 about 77. And the time control was with increment, so the total effective TC didn't increase with longer openings by more than 2-3%. It cannot account for such an increase in draw rate (from 31% to 40%). Yes, I found surprising that the rating just compresses due to higher draw rate, but the relative ratings remain almost unchanged. Sure, this is with many games and neutral openings, if one begins to set all sorts of traps in the openings, the picture might look different, but I was not interested in that. My gut feeling was that Houdini deals better with the opening phase of the game by itself, without a book, but as it turns out, all the 4 top engines behave the same. And I somehow mistrusted tests with openings longer than 12 moves, it turns out that they are valid, if the openings are neutral, at least for these 4 engines (only that we should expect a higher draw rate).
If the average game length is basically not influenced by the length of the opening the only thing to cause a higher draw rate I can think of is the fact that a longer but equal opening simply leaves less room for engines as it IS already more drawish ...
For a tester this means that he has to find openings which are
1. as short as possible (o let an engine decide and not a book or position)
2. as different as possible (to have a variaty of different chess openings)
Games are won when one player or the other makes a blunder or inaccuracy which, with accurate play, can be converted to a win. In long openings that exit into "neutral" positions you have substantially reduced the number of move-opportunities where an engine could potentially make such an inaccurate move, and the achievement of a neutral position after so many moves is inherently very drawish.
It turns out that in a surprising number of games one side is already losing by the 10th or 15th move and never equalizes. You eliminate such cases by exiting to neutral positions.
It is a well known fact that opening books are mostly useless ELO-wise, past the first 3-4 moves.
They are only useful to bring diversity in the games (no opening book would mean engine plays the same as algorithm is deterministic).
In fact, the real risk, is that using a large opening book (compiled automatically out of large database of unverified moves) forces mistakes that the engine wouldn't play. In other words, engines generally play better than large books (which doesn't mean that all moves they play are better. remember that ELO is more determined by your 1% worst moves than your 1% best ones...)
Personally, I use 8-moves, in order to reach a good balance between diversity and engine creativity. There's nothing more boring than these extremely long book lines that effectively start the game into an already drawn endgame.
An engine that I really hated for that was Spike 1.2: you cannot disable the book and it's hardcoded inside the executable! And the book is just endless and always tries to play for boring, blocked, symmetric, drawish positions... It really killed the creatiity and the fun.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
We do all our testing with the very shallow and wide opening books - we have only 5 moves (10 ply.)
Obviously, if the opening book is playing most of the game you are not really testing the engine as much and you give much greater chances to the weaker engine.
Some people use these highly developed opening books that go very deep and that is fine for maximizing the strength on-line or in a tournament setting but not for computer vs computer objective testing.
Laskos wrote:With Houdini 3 at 2.5''+0.05'', Komodo 5.1 at 5''+0.1'', Rybka 4.1 at 5''+0.1'', Stockfish 3 at 7.5''+0.15'' to somewhat equalize the strengths, I put them to play from different length opening lines using SWCR.PGN and GM2006.PGN game collections, the opening length being 4, 16 and 30 plies. I expected a more spectacular result, with Houdini showing a better performance at only 4 ply length openings, as it's very good at Chess 960, but it's not the case, all four engines show similar behavior and dependency with the opening length. The longer the opening lines, more are the draws (9% more draws from 4 to 30 plies), and the ranking is compressed (from 76 points difference at 4 plies to 56 points at 30 plies openings). The strength shown subjected to compression remains very stable, so there is no danger of distorting the order in the rankings from longer or shorter openings of testers and testing groups.
lucasart wrote:It is a well known fact that opening books are mostly useless ELO-wise, past the first 3-4 moves.
They are only useful to bring diversity in the games (no opening book would mean engine plays the same as algorithm is deterministic).
In fact, the real risk, is that using a large opening book (compiled automatically out of large database of unverified moves) forces mistakes that the engine wouldn't play. In other words, engines generally play better than large books (which doesn't mean that all moves they play are better. remember that ELO is more determined by your 1% worst moves than your 1% best ones...)
Personally, I use 8-moves, in order to reach a good balance between diversity and engine creativity. There's nothing more boring than these extremely long book lines that effectively start the game into an already drawn endgame.
An engine that I really hated for that was Spike 1.2: you cannot disable the book and it's hardcoded inside the executable! And the book is just endless and always tries to play for boring, blocked, symmetric, drawish positions... It really killed the creatiity and the fun.
Yes, it seems that a shallow, but still diverse book is optimal. What is curious is that I expected some engines to overperform or underpeform with only 4 ply book compared to 16 or 30 plies books. It's not the case, it seems that engines are playing openings by themselves at similar level to their overall strength. For openings you have to encode many things in the evaluation, similarly to endgames. But in endgames the engines do shift their strength compared to overall play, and it is well known that some engines under- or over-perform in the endgames.
Endgame balanced suite at same TC as in the first post:
The ratings are even more compressed, the draw ratio is very high, but one can say that in the endgames Stockfish overperforms, Houdini underperforms compared to overall rating. No such thing in the openings.