I disagree with you on many points. "Extremely Strong" has a clear meaning in my terminology and understanding. And that "absolute strength" is to blame for 98.8% draw rate in drawish endgames and not "similar endgame evaluation" or "similar paradigm". It might be the paradigm, but the paradigm works. ELO (logistic or otherwise) is not that abstract and volatile as it might seem to you. The difference between a random player and non-losing player from standard opening position seems to be capped at about 4000 logistic ELO points. Top engines already cut 3500-3600 ELO points of this 4000 ELO span. A strong human amateur or Zurichess_00 have it only at 1900 ELO points, they are closer to random player than to non-losing player. This is even more dramatic in "Endgame Chess". The ELO span there from random to non-losing is on only about 2000 ELO points from drawn position, and Stockfish is already at 1950 or so mark. It plays drawn endgames almost perfectly (in non-losing from drawn position sense). A strong amateur or Zurichess are at 1600 mark or so. In this sense my "Extremely Strong" has a meaning, it is extremely close to non-losing player from a drawn position. That's why the _absolute_strength_ gives that 98.8% draw rate from drawish endgame positions between Stockfish dev and Stockfish 7. There is no place for them to play other than a draw, and it's not "similar evaluation" and "similar paradigm". As for full game of Chess, if the capping to 400-500 more ELO points is real, it means that top engines play about 90% of all moves as non-losing moves from standard opening position, there are whole sequences of non-losing moves, and about 10% of games of top engines at LTC are perfect games in the sense of non-losing from standard opening position.Sven wrote:What does "extremely strong" really mean, other than "much stronger than almost all other (human or computer) chess players"? There is no absolute playing strength, only a comparison.Laskos wrote:I am more of opinion of Greg, that Chess is too easy for computers. You list deficiencies of chess engines, which are true deficiencies, but the bulk is that they are extremely strong.Sven wrote:One way to avoid "Draw Death" would be the appearance of one single engine that plays significantly stronger than the current top engines, say 100 or 200 Elo points. As of today this seems to be unlikely, but can we really exclude it? In the past there were already times of stagnation, then suddenly a heavy improvement came up when nobody thought it were possible.
I think today we are far away yet from knowing what the best openings really are, and there are also some endgame types that are not always evaluated correctly by top engines. Think about fortresses for instance, or about complex endgames with rooks, minor pieces and pawns for which we still lack any proven theoretical knowledge due to lack of EGTBs for more than 6 or 7 pieces. For me these are two good reasons for not believing that we are already close to "perfect play" in computer chess. It might be "perfect-looking play" only. All we know, in my opinion, is that we have a couple of very strong engines that are really hard to beat with today's chess programming knowledge and hardware.
Once that new mega engine appears (and as a programmer I hope this will happen one day, although it will probably not be my own engine ...) I expect that parts of these mathematical models may become outdated, or will have to be reviewed at least.
These results are not surprising at all. But my explanation is different than yours: the strength difference between the two Stockfish versions for the "sub-discipline" of playing balanced chess endgame positions is obviously much smaller than their Elo rating difference for playing whole games, perhaps because of their endgame evaluations being more similar than their evaluation functions as a whole. You won't get a 99% draw outcome between these two SFs when playing 1000 normal games since the expected result is something about 660:340 so you should never get significantly more than 68% draws.Laskos wrote:I took very balanced endgame positions (0cp-5cp unbalance) from real games and made the following experiments (no TBs):Stockfish dev is a recent version of Stockfish. Zurichess_00 is the first version of Zurichess I have, and it is about 1800 ELO human level (or pretty strong amateur). 30% of these ultra-balanced (endgame) openings are still playable against strong amateurs. Then I took the same Stockfish dev and Stockfish 7 separated by not that small 120 ELO points (on purpose not that small), and got:Code: Select all
Score of Stockfish dev vs Zurichess_00: 297 - 1 - 702 [0.648] 1000 ELO difference: 106.01 +/- 10.81 Finished match
Now, only 1.2% of endgames are playable. So, 96% of endgames playable against strong amateurs are dead draws now.Code: Select all
Score of Stockfish dev vs Stockfish 7: 8 - 4 - 988 [0.502] 1000 ELO difference: 1.39 +/- 2.35 Finished match
Here I disagree. The draw rate between two players A and B must also be a function of the strength difference between A and B. See my explanation regarding endgames above.Laskos wrote:And it is not strength difference (a respectable 120 ELO points), but strength itself which gives more draws and less sensitivity.
This "capping" theory seems to be based on the assumption that letting an engine search with "infinite speed", or "infinitely long", would (in theory) lead to solving chess by searching deep enough to get a perfect analysis of a given position, and therefore may serve as a justification for stating that "infinite speed" defines an upper bound for Elo ratings. However, I am not convinced of it. Today's heavy pruning and reductions often cause even a 50-ply search to be incomplete and imperfect. We only know that searching 50 plies deep is almost always better than searching only 40 plies and usually leads to a higher rating, but we still don't know whether the resulting analysis is "perfect". Now we could think of switching off all the pruning and reduction stuff, and repeat the test - but then I guess we will get very different results for the same test that was done by Andreas, i.e. there might be a much smaller effect of diminishing returns.Laskos wrote:The paradigm of Computer Chess is unchanged since Knuth of 40 years ago. The same alpha-beta, which reduces effective branching factor to 4-5, then ever more pruning and reductions to EBF 1.5 of today. According to this paradigm, the Computer Chess is capped to 400-800 more ELO points compared to today's top engines. The most comprehensive induction is from Andreas Strangmüller results here (and the discussion):
http://www.talkchess.com/forum/viewtopic.php?t=61784 ,
which points to lower values.
Here I mostly agree, although nobody really knows what will happen during the next 10-20 years.Laskos wrote:That a new, unchanged for 40 years, paradigm will appear, which will make a revolution in these conditions is almost as unlikely in the near future as a solution to Chess. It will be done, but in pretty far future. I believe in 10-20 years we will still be talking in the same terms and paradigm, but with 95%+ draw rate among top engines even in fast tests (with balanced positions). Sure, it might happen that an engine will appear which will be superior significantly, but this superiority diminishes _objectively_ (as separation power) with general strength, longer TC and more hardware.
As a general remark, what I am really missing is a test where not time, speed, number of threads, or any similar resource is doubled subsequently but knowledge ... I hope someone can do that at some point in the future ?!
And returning to the toy "Endgame Chess", I have large files of 5-men Draws and 5-men Wins occurring in games. Against a Syzygy enabled, Stockfish misses 2 draws out of 1000, Zurichess 72 out of 1000. Out of 1000 Wins, Stockfish misses 87 Wins, Zurichess 524 Wins. (On a side note, to observe that it's much easier for engines to miss a Win than to miss a Draw). These results show too the dramatic absolute strength of Stockfish in drawn endgames and generally in endgames. These sorts of results will occur in normal Chess in maybe 20 years, with 95%+ draw rates of top engines from balanced positions even in fast games, keeping the same paradigm, which seems to work, and which seems hard to improve. The parallel with Checkers seems hard to avoid.
As for your proposal for "doubling knowledge", I don't know how to measure "knowledge". Its efficiency ELO-wise? Its size? Its scaling with time control? The current paradigm is that progress in "knowledge" does not come uniformly with progress in ELO. Rybka 1.0 had less knowledge than Shredder 9, but was significantly ahead as strength goes and a real progress. Andscacs might have more "knowledge" than Stockfish, but it is behind. It seems, though, the more knowledgeable engines are scaling better to longer TC and hardware.