The future of chess and elo ratings

lkaufman · Post by **lkaufman** » Sun Sep 20, 2015 6:47 am

There is widespread concern, with good reason, that the combination of ever-expanding opening books together with improved level of play of both humans and engines will lead to increasing percentage of draws, eventually to the point of killing interest in chess. For engine play, the first point about opening books has been largely mitigated by requiring the engines to think for themselves after some small number of moves, usually with reversed colors for a replay. This idea has not yet spread to high level human competition, but it seems only natural that it be tried soon. Openings selected at random from a list of those that are popular in GM play using a database of only decisive games, with two game matches for each pairing. This would probably solve the draw problem in human competition for the forseeable future. But what about engines? Does the increasing draw percentage mean that even with randomized and short opening books there isn't much more room to improve?
My answer is that as things are done now, this is indeed the case, though we may still have a couple huindred elo or more to go. But there is a solution. Here is how I see the situation. Let's suppose that a score of over +60 centipawns means a theoretically won game, 60 or less means a theoretically drawn game. That's just a guess, and of course evals aren't perfect, but let's skip over those points. Let's assume that White's opening advantage is 20 centipawns, roughly correct, and that books leave White with this much advantage. Finally, let's guess that the two top engines at TCEC time controls will make cumulative errors in a game of anywhere from zero to 80 centipawns with equal probability, again a guess and a gross simplification of reality, but close enough to make my points. Then even if White plays his worst and Black plays perfectly, the score will only drop to -60, so he will never lose. But if White plays perfectly and Black plays his worst, the score will rise to +100, enough to win. So White will win a quarter of the games and draw the rest, roughly where we probably stand now. But once the maximum error drops from 80 to 40, even Black will never lose and all games will be drawn. Even if Black is a weaker player and his error rate max is 40 while White's is 30, he still won't ever lose.
The solution is that the openings chosen must result in scores much closer to the win/draw line. If all games were started with a +60 opening, then even if the error rate drops to just a couple centipawns White will win half the games and draw the rest. Then even if one player is just modestly stronger than the other, he will rack up a huge plus score. For example if my max error is 3 centipawns while yours is 5, I should win with White 5/8 of the time and draw with Black 5/8 of the time.
So the key to keeping chess interesting and to seeing large rating gains for decades to come is to use more unbalanced opening books, and also to avoid mismatches since eventually, once engines stop losing with White, they will never lose a match by more than 75% or 192 elo points. This problem can be gotten around by only rating wins, or by rating the result of the two game matches as win/loss/draw, or perhaps by other means.
To get the openings with larger than normal White advantage, it will probably be necessary to pick from amateur games rather than GM games, so this is of course not a perfect solution. But I predict it will be necessary. This should keep engine chess lively for at least the next century. If it is adopted, the sky is the limit for elo ratings.

carldaman · Post by **carldaman** » Sun Sep 20, 2015 8:09 am

lkaufman wrote:There is widespread concern, with good reason, that the combination of ever-expanding opening books together with improved level of play of both humans and engines will lead to increasing percentage of draws, eventually to the point of killing interest in chess. For engine play, the first point about opening books has been largely mitigated by requiring the engines to think for themselves after some small number of moves, usually with reversed colors for a replay. This idea has not yet spread to high level human competition, but it seems only natural that it be tried soon. Openings selected at random from a list of those that are popular in GM play using a database of only decisive games, with two game matches for each pairing. This would probably solve the draw problem in human competition for the forseeable future. But what about engines? Does the increasing draw percentage mean that even with randomized and short opening books there isn't much more room to improve?
My answer is that as things are done now, this is indeed the case, though we may still have a couple huindred elo or more to go. But there is a solution. Here is how I see the situation. Let's suppose that a score of over +60 centipawns means a theoretically won game, 60 or less means a theoretically drawn game. That's just a guess, and of course evals aren't perfect, but let's skip over those points. Let's assume that White's opening advantage is 20 centipawns, roughly correct, and that books leave White with this much advantage. Finally, let's guess that the two top engines at TCEC time controls will make cumulative errors in a game of anywhere from zero to 80 centipawns with equal probability, again a guess and a gross simplification of reality, but close enough to make my points. Then even if White plays his worst and Black plays perfectly, the score will only drop to -60, so he will never lose. But if White plays perfectly and Black plays his worst, the score will rise to +100, enough to win. So White will win a quarter of the games and draw the rest, roughly where we probably stand now. But once the maximum error drops from 80 to 40, even Black will never lose and all games will be drawn. Even if Black is a weaker player and his error rate max is 40 while White's is 30, he still won't ever lose.
The solution is that the openings chosen must result in scores much closer to the win/draw line. If all games were started with a +60 opening, then even if the error rate drops to just a couple centipawns White will win half the games and draw the rest. Then even if one player is just modestly stronger than the other, he will rack up a huge plus score. For example if my max error is 3 centipawns while yours is 5, I should win with White 5/8 of the time and draw with Black 5/8 of the time.
So the key to keeping chess interesting and to seeing large rating gains for decades to come is to use more unbalanced opening books, and also to avoid mismatches since eventually, once engines stop losing with White, they will never lose a match by more than 75% or 192 elo points. This problem can be gotten around by only rating wins, or by rating the result of the two game matches as win/loss/draw, or perhaps by other means.
To get the openings with larger than normal White advantage, it will probably be necessary to pick from amateur games rather than GM games, so this is of course not a perfect solution. But I predict it will be necessary. This should keep engine chess lively for at least the next century. If it is adopted, the sky is the limit for elo ratings.

Interesting ideas, Larry. I think HG Muller has been proposing more imbalanced books for some time now, and with good reason.

I wonder why you assume Black will 'never' win if the out-of-book eval is 0.60 or more for White. I see Black wins from (evenly matched) strong engines in the classical King's Indian Defense (KID) often enough, even though the exit eval is usually (and probably wrongly) more than -.60 for Black.

Perhaps we need to see more engine games played using a classical KID suite, since closed positions is an area where top engines still struggle to various degrees and could use a lot of improvement. Of course, this would imply longer opening lines in this case, to force the engines deeper into the KID lines.

Regards,
CL

Laskos · Post by **Laskos** » Sun Sep 20, 2015 8:41 am

lkaufman wrote:There is widespread concern, with good reason, that the combination of ever-expanding opening books together with improved level of play of both humans and engines will lead to increasing percentage of draws, eventually to the point of killing interest in chess. For engine play, the first point about opening books has been largely mitigated by requiring the engines to think for themselves after some small number of moves, usually with reversed colors for a replay. This idea has not yet spread to high level human competition, but it seems only natural that it be tried soon. Openings selected at random from a list of those that are popular in GM play using a database of only decisive games, with two game matches for each pairing. This would probably solve the draw problem in human competition for the forseeable future. But what about engines? Does the increasing draw percentage mean that even with randomized and short opening books there isn't much more room to improve?
My answer is that as things are done now, this is indeed the case, though we may still have a couple huindred elo or more to go. But there is a solution. Here is how I see the situation. Let's suppose that a score of over +60 centipawns means a theoretically won game, 60 or less means a theoretically drawn game. That's just a guess, and of course evals aren't perfect, but let's skip over those points. Let's assume that White's opening advantage is 20 centipawns, roughly correct, and that books leave White with this much advantage. Finally, let's guess that the two top engines at TCEC time controls will make cumulative errors in a game of anywhere from zero to 80 centipawns with equal probability, again a guess and a gross simplification of reality, but close enough to make my points. Then even if White plays his worst and Black plays perfectly, the score will only drop to -60, so he will never lose. But if White plays perfectly and Black plays his worst, the score will rise to +100, enough to win. So White will win a quarter of the games and draw the rest, roughly where we probably stand now. But once the maximum error drops from 80 to 40, even Black will never lose and all games will be drawn. Even if Black is a weaker player and his error rate max is 40 while White's is 30, he still won't ever lose.
The solution is that the openings chosen must result in scores much closer to the win/draw line. If all games were started with a +60 opening, then even if the error rate drops to just a couple centipawns White will win half the games and draw the rest. Then even if one player is just modestly stronger than the other, he will rack up a huge plus score. For example if my max error is 3 centipawns while yours is 5, I should win with White 5/8 of the time and draw with Black 5/8 of the time.
So the key to keeping chess interesting and to seeing large rating gains for decades to come is to use more unbalanced opening books, and also to avoid mismatches since eventually, once engines stop losing with White, they will never lose a match by more than 75% or 192 elo points. This problem can be gotten around by only rating wins, or by rating the result of the two game matches as win/loss/draw, or perhaps by other means.
To get the openings with larger than normal White advantage, it will probably be necessary to pick from amateur games rather than GM games, so this is of course not a perfect solution. But I predict it will be necessary. This should keep engine chess lively for at least the next century. If it is adopted, the sky is the limit for elo ratings.

Practically the distributions of performance, wins and draws are fairly smooth functions, and the gain in resolution will be there, but a moderate one.

Take as an example Win distribution function of eval as logistic (the same can be done with Gaussian erf or with drawelo used in BayesElo):

Win = 1/(1+a*exp{-b*eval})

And a particular (and practical) case shown in the plot:
a=3
b=2

We are interested in maximizing its first derivative, so that small variation in eval (as you exemplified, say infinitesimal 3cp and 5cp) give large variations in decided games:

The first derivative of the Win distribution is shown here:

We see that the optimal opening is around eval of 0.55 or 55 cp. But the snag is the resolution compared to 0.00 or 0 cp openings increased by only 35%, and not as dramatically as you presented.

For the general Win = 1/(1+a*exp{-b*eval}), the optimum is at (log a)/b, or in our case a=3 and b=2, (log 3) /2 ~ 0.5493. One has to fit empirically the Win performance of engines to find a and b, then use (log a)/b to find the optimum eval of openings, and I bet usually it's not too far from your 0.60 or so.

Jhoravi · Post by **Jhoravi** » Sun Sep 20, 2015 8:44 am

I like "Seirawan Chess" because the initial position is not different from normal chess.

Modern Times · Post by **Modern Times** » Sun Sep 20, 2015 9:30 am

I like Chess960, but I don't know if that is a solution to the draw problem.

hgm · Post by **hgm** » Sun Sep 20, 2015 10:02 am

Indeed, this is exactly what I proposed to make engine testing more sensitive. If all booklines end in positions that are on the win/draw boundary, the draw percentage between equal opponents would be 50%, but these would all be draws the engine would have to fight hard for. Draws are not bad per se. It is the 'freewheeling draws' you want to get rid of.

Jhoravi wrote:I like "Seirawan Chess" because the initial position is not different from normal chess.

That of course depends on whether you consider holding pieces 'in hand' is different. 'Pocket Knight' is another variant that has the same initial position as normal Chess, in this sense,and could be considered a less drastic modification because it does not use any unorthodox pieces.

sandro Necchi · Post by **sandro Necchi** » Sun Sep 20, 2015 11:51 am

Hi Larry,
If the target is to see which engine is the best, I agree with you.
The reasons of so many draws are the following:
- Improved opening books playing the best lines only with deep variations.
- Incresed of engine strenght.
- Elo strenght of best engine very close.

My suggestion to improve this is the following:

- Limit book moves to 14 plies (7 moves each side) maximum.
- Change time for games down to 10 minutes each player.
- Instead of one game change to a 10 games match, like it is done in the candidates.

This would make the Championship more interesting!

Of course this will result in development of books specific for these Championship and "suitable" to the engines, but would not change the fun.

Best regards,

Sandro

Ozymandias · Post by **Ozymandias** » Sun Sep 20, 2015 12:17 pm

lkaufman wrote:There is widespread concern, with good reason, that the combination of ever-expanding opening books together with improved level of play of both humans and engines will lead to increasing percentage of draws, eventually to the point of killing interest in chess.

You may want to confer this thread, about a similar problem:

http://rybkaforum.net/cgi-bin/rybkaforu ... pid=552902

lkaufman wrote:For engine play, the first point about opening books has been largely mitigated by requiring the engines to think for themselves after some small number of moves, usually with reversed colors for a replay.

Has it been "largely mitigated"? We can still see, some pretty high draw rates, at TCEC.

lkaufman wrote:This idea has not yet spread to high level human competition, but it seems only natural that it be tried soon. Openings selected at random from a list of those that are popular in GM play using a database of only decisive games, with two game matches for each pairing. This would probably solve the draw problem in human competition for the forseeable future.

I don't think, that the same solution can be applied, to different problems. Check the aforementioned thread. Engine play is different from core-freestyle, but the cause for OTB draws, isn't what fuels a high draw rate here, either.

lkaufman wrote:But what about engines? Does the increasing draw percentage mean that even with randomized and short opening books there isn't much more room to improve?
My answer is that as things are done now, this is indeed the case, though we may still have a couple huindred elo or more to go.

For this season, keeping the draw rate below 85%, should be satisfactory. Just one hundred ELO points from now, we'd be looking at 95%, without any countermeasures.

lkaufman wrote:But there is a solution. Here is how I see the situation. Let's suppose that a score of over +60 centipawns means a theoretically won game, 60 or less means a theoretically drawn game. That's just a guess, and of course evals aren't perfect, but let's skip over those points. Let's assume that White's opening advantage is 20 centipawns, roughly correct, and that books leave White with this much advantage. Finally, let's guess that the two top engines at TCEC time controls will make cumulative errors in a game of anywhere from zero to 80 centipawns with equal probability, again a guess and a gross simplification of reality, but close enough to make my points. Then even if White plays his worst and Black plays perfectly, the score will only drop to -60, so he will never lose. But if White plays perfectly and Black plays his worst, the score will rise to +100, enough to win. So White will win a quarter of the games and draw the rest, roughly where we probably stand now. But once the maximum error drops from 80 to 40, even Black will never lose and all games will be drawn. Even if Black is a weaker player and his error rate max is 40 while White's is 30, he still won't ever lose.
The solution is that the openings chosen must result in scores much closer to the win/draw line. If all games were started with a +60 opening, then even if the error rate drops to just a couple centipawns White will win half the games and draw the rest. Then even if one player is just modestly stronger than the other, he will rack up a huge plus score. For example if my max error is 3 centipawns while yours is 5, I should win with White 5/8 of the time and draw with Black 5/8 of the time.

Evals are good for predicting a score, only if the complexity of the position is "low" enough. But, if ONE engine can cope with it, what's the point on using THAT position? You'd only be exaggerating strength differences (or getting draws, if the other engine can also play it correctly). Complex positions, are the ones, a TD should be looking for, in the quest for reducing draw rates.

lkaufman wrote:So the key to keeping chess interesting and to seeing large rating gains for decades to come is to use more unbalanced opening books, and also to avoid mismatches since eventually, once engines stop losing with White, they will never lose a match by more than 75% or 192 elo points. This problem can be gotten around by only rating wins, or by rating the result of the two game matches as win/loss/draw, or perhaps by other means.
To get the openings with larger than normal White advantage, it will probably be necessary to pick from amateur games rather than GM games, so this is of course not a perfect solution. But I predict it will be necessary. This should keep engine chess lively for at least the next century. If it is adopted, the sky is the limit for elo ratings.

I don't see the interest, in watching games, played from a one-sided position (which you should avoid yourself), or from a drawn position, regardless of the engine's eval (I'm thinking useless pawns, here).

Yes, you'd be getting large "ELO" gains, but they would be fabricated.

syzygy · Post by **syzygy** » Sun Sep 20, 2015 12:33 pm

Imbalanced opening positions will result in (predictable) 1-0, 0-1 outcomes but are not a better way to measure the relative strength of two players.

What you would need are opening positions that are balanced but "sharp". There is probably a significant correlation between a position being "sharp" and the players not having much (opening) knowledge about that position. So Fischer Random is not a bad idea (for now).

Roger Brown · Post by **Roger Brown** » Sun Sep 20, 2015 12:46 pm

syzygy wrote:Imbalanced opening positions will result in (predictable) 1-0, 0-1 outcomes but are not a better way to measure the relative strength of two players.

What you would need are opening positions that are balanced but "sharp". There is probably a significant correlation between a position being "sharp" and the players not having much (opening) knowledge about that position. So Fischer Random is not a bad idea (for now).

Hello Ronald,

Your post makes logical good sense (as usual) and provokes me to make a few comments:

I am a total chess nitwit but isn't the degree of sharpness measured by some perceived advantage from that opening for one side or the other? Whether it is material or tempo?

I mean, "dull and drawish" implies that neither side has an advantage from the particular opening.

Further, how is dynamism to be measured? As in, what evaluation?

Finally, human chess was thought to have been on its last legs too, partially as a result of advances in opening theory going deep into the game. Hasn't happened yet...

The future of chess and elo ratings

The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings