The future of chess and elo ratings

kbhearn · Post by **kbhearn** » Mon Sep 21, 2015 3:00 am

Has an investigation of the simple change 'stalemate = win' been done thoroughly? It adds a lot more wins in elementary endings and doesn't add complexity like the chase rules in xiang qi (in fact one could argue it makes the rules make more sense - you have no moves because your king would have to move into check, well then i guess your king is dead). If you took a couple of the top engines and a referee and modified them to respect those rules how much would the draw rate change? (i'd recommend stripping out almost all draw-scaling except opposite bishops while at it to make sure they don't go avoid KPvK, KQvKP, KNNvK etc stalemates from the now-winning side).

With regard to human OTB chess, I think concern is overblown. And 960 is a sufficient solution if it ever does crop up - it may not be a problem for computers to have books for 960 starting positions but for humans, book prep would lose all value.

carldaman · Post by **carldaman** » Mon Sep 21, 2015 3:10 am

syzygy wrote:
lkaufman wrote:3. Choosing "sharp" openings is another way to increase resolution. The simplest way to do this is simply to select openings from a database of decisive GM games. But I think that pretty soon it will be difficult to find openings popular in GM praxis that offer Black any significant chance to win in a match between top engines. Going for the win/draw threshold is a solution that should last for centuries.
You think going for the win/draw threshold is going to give Black any chance to win??

A more reasonable approach would be to try to include imbalanced openings, but only doing this to add to a pool of dynamically balanced openings that give both sides a chance to win.

An example of a dynamically balanced opening is the Frankenstein-Dracula variation, where both sides have chances to win depending on how play goes:

https://en.wikipedia.org/wiki/Vienna_Ga ... _Variation

One has to be careful to remove the overly drawish opening lines, since it is these that typically contribute to the excess of draws between engines, and leave in the rich, dynamic and complex variations, whether balanced or unbalanced.

CL

bob · Post by **bob** » Mon Sep 21, 2015 4:00 am

syzygy wrote:Imbalanced opening positions will result in (predictable) 1-0, 0-1 outcomes but are not a better way to measure the relative strength of two players.

What you would need are opening positions that are balanced but "sharp". There is probably a significant correlation between a position being "sharp" and the players not having much (opening) knowledge about that position. So Fischer Random is not a bad idea (for now).

There are two kinds of unbalanced positions. Those where one side wins usually (probably what you are thinking of) and those that are unbalanced in other ways. IE one side an exchange down but significant compensation for this. That's gives lots of room for different playing styles to produce different results. These are the kinds of positions I have generally tried to find to use in specifically prepared tournament books. These are the kinds of positions that a good program can win from both sides against a weaker opponent.

Maybe this is what you mean by "sharp" but I don't take it that far. IE no very precise play is required to win or avoid losing, just knowledge about various facets of chess evaluation and a good search to go with it.

lkaufman · Post by **lkaufman** » Mon Sep 21, 2015 4:33 am

kbhearn wrote:Has an investigation of the simple change 'stalemate = win' been done thoroughly? It adds a lot more wins in elementary endings and doesn't add complexity like the chase rules in xiang qi (in fact one could argue it makes the rules make more sense - you have no moves because your king would have to move into check, well then i guess your king is dead). If you took a couple of the top engines and a referee and modified them to respect those rules how much would the draw rate change? (i'd recommend stripping out almost all draw-scaling except opposite bishops while at it to make sure they don't go avoid KPvK, KQvKP, KNNvK etc stalemates from the now-winning side).

With regard to human OTB chess, I think concern is overblown. And 960 is a sufficient solution if it ever does crop up - it may not be a problem for computers to have books for 960 starting positions but for humans, book prep would lose all value.

Mark and I tried an experiment a few months ago to see what the effect of callling stalemate a win (or say 3/4 point) would be. I am in favor of this change, but the result of our experiment was that it only made a small reduction in draws. Much more consequential would be to restore the old "bare king loses" rule, and even more helpful would be to bar perpetual check.

lkaufman · Post by **lkaufman** » Mon Sep 21, 2015 4:40 am

carldaman wrote:
syzygy wrote:
lkaufman wrote:3. Choosing "sharp" openings is another way to increase resolution. The simplest way to do this is simply to select openings from a database of decisive GM games. But I think that pretty soon it will be difficult to find openings popular in GM praxis that offer Black any significant chance to win in a match between top engines. Going for the win/draw threshold is a solution that should last for centuries.
You think going for the win/draw threshold is going to give Black any chance to win??
A more reasonable approach would be to try to include imbalanced openings, but only doing this to add to a pool of dynamically balanced openings that give both sides a chance to win.

An example of a dynamically balanced opening is the Frankenstein-Dracula variation, where both sides have chances to win depending on how play goes:

https://en.wikipedia.org/wiki/Vienna_Ga ... _Variation

One has to be careful to remove the overly drawish opening lines, since it is these that typically contribute to the excess of draws between engines, and leave in the rich, dynamic and complex variations, whether balanced or unbalanced.

CL

If the error rate in future engines becomes very low, then I think that even highly interesting, unbalanced positions will rarely result in fair winning chances for both sides. If chances for White and Black are really about equal, then it will be very hard to find positions that do not generally end in draws with such super-accurate engines. You might find positions that are 1% White wins, 1% Black wins, 98% draws, but I don't think you'll find many positions that are 30% White wins, 30% Black wins, and 40% draws. That's why choosing positions close to the win/draw line is a more lasting solution, because no matter how good the engines get to be, the percentage of draws stays near 50%. It doesn't matter whether the positions are near-wins for White or near-wins for Black, but as a practical matter you won't get near-wins for Black without really bad opening play by White, whereas you will get near-wins for White with just slightly questionable play for Black.

Jhoravi · Post by **Jhoravi** » Mon Sep 21, 2015 4:48 am

My idea is to switch Whites King and Queen resulting to a none symmetrical starting position vs Black. The purpose is to promote opposite side casting resulting to instant imbalance!

Secondly, the unreasonable Stalemate Rule should be abolished.

lkaufman · Post by **lkaufman** » Mon Sep 21, 2015 6:56 am

Jhoravi wrote:My idea is to switch Whites King and Queen resulting to a none symmetrical starting position vs Black. The purpose is to promote opposite side casting resulting to instant imbalance!

Secondly, the unreasonable Stalemate Rule should be abolished.

One way to combine your idea with mine would be to use the normal start position, but simply forbid castling on the same side as the opponent has castled on. This would not only make opposite castling usual, but would greatly increase White's advantage, since he would usually be able to make the choice as to which side each player will castle. This variant should be fairly close to the win/draw line in my opinion, further minimizing draws.

Laskos · Post by **Laskos** » Mon Sep 21, 2015 9:39 am

lkaufman wrote:Here are comments about several above posts.
1. My simplifications overstated the case, but the 35% benefit Kai mentions for using positions near the win/draw line is still pretty significant. I would conclude from his analysis that there is no need to go right up to the win/draw line, just approach it. Maybe +50 centipawn positions would be about ideal, probably still theoretically drawn in general but big chances of losing errors.
2. I have long been an advocate of banning perpetual check, as done in Chinese Chess and Japansese chess. I am also a fan of chess 960, having been US open champion of it, and I also like Seirawan chess, for which Don and I made a Komodo version. But I'm trying to focus on ideas in this thread that don't change the rules of chess, other than by mandating certain openings, for which there is ample historical precedent.
3. Choosing "sharp" openings is another way to increase resolution. The simplest way to do this is simply to select openings from a database of decisive GM games. But I think that pretty soon it will be difficult to find openings popular in GM praxis that offer Black any significant chance to win in a match between top engines. Going for the win/draw threshold is a solution that should last for centuries.
4. I have myself proposed (for human play) the idea of replaying games without resetting clocks until someone wins. But if you are testing two super strong engines at long tc that might take forever. Of course you can avoid draws by playing super-fast games, but that has obvious drawbacks. Note that this idea has very different consequences if you reverse colors than if you don't. If you reverse, play will be more like it is normally. If not, White will aim for just slightly better endgames, because even if they are drawn 95% of the time, as long as only White can win, he eventually will do so.

I can confirm that my theoretical result is confirmed by empirical data. I took a database of 3000 blitz games at 240'' + 2.4'' of two successive versions of Komodo playing against each other and having therefore a high draw rate:

Code: Select all

Games&#58; 3000  (+420,=2263,-317&#41;, 51.7 %

Win rate&#58;   14.0% 
Draw rate&#58;  75.4%
Loss rate&#58;  10.6%

Then, from this database, I picked the games which had the eval between 0.60 and 0.80 at move 12. It turned out that the borderline of half draws is achieved around eval of 0.70, instead of the guessed by you 0.60. At least on move 12 at this time control.

Code: Select all

Move 12
Eval&#58; 0.60-0.80

Games&#58; 460 (+129, =228,-103&#41;, 52.8%

Win rate&#58;  28.0%
Draw rate&#58; 49.6%
Loss rate&#58; 22.4%

We see that from borderline opening positions, the ELO difference inflates significantly, draw rate goes down from 75% to 50%. The trinomial distributions w/d/l with these two sets of openings show that using borderline openings requires about 30% less games than from fairly balanced positions to reach the desired stop based on confidence intervals, LOS, p-value and SPRT. In reasonable good agreement with my previous result. Although the 25% difference in the draw rate goes pretty democratically to both wins and losses (the Ronald's fear), it still discriminates for the win the stronger engine, significantly enough to need 30% less games in order to decide a test. So borderline positions against fairly balanced ones achieved here the following:

1/ Reduced to ~50% the draw rate.
2/ Increased significantly the ELO difference.
3/ Reduced by ~30% the number of required games.

I guess with ever inflating draw rate in the future, in 10 years the borderline positions will only be even more beneficial than what was found here for Komodo, and your and HGM idea seems pretty sound.

hgm · Post by **hgm** » Mon Sep 21, 2015 11:33 am

Ozymandias wrote:If you simplify the position enough, even I would be able to convert the win, irrespective of my opponent's strength. Of corse, such a position would likely get a higher eval.

That is exactly the point. Of course you should not start from positions that are already decided, like KQK. The advantage should be so small that it is on the edge of being won, and that the slightest mistake of either side woul make it either a win or a draw.

This could be guessed from engine eval, but also determined empirically by taking a set of positions roughly in the right eval range, and letting a variety of engines play them against each other, and then keep those where the win/draw ration is close to 50%. This way you would improve the test over time.

Michel · Post by **Michel** » Mon Sep 21, 2015 12:31 pm

Laskos wrote:2/ Increased significantly the ELO difference.

Which elo model were you using? Were you replaying positions with reversed colors?

I am always worried about statements that such and such testing method "increases elo differences". It is easy to increase elo differences. Just multiply with a large constant... The point is you should not increase elo differences but decrease the error bars.

When using very unbalanced positions the elo of the positions should be incorporated in the model I think (like WhiteAdvantage is now) to calculate correct error bars (via the inverse of the Hessian of the log likelihood function, or else by simulation).

The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings