Elo points gain from doubling time

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Elo points gain from doubling time

Post by Laskos »

Sven Schüle wrote:
hgm wrote:but in the latter situation the weaker player would have approximately 10 times as much chance to beat the strong one in a match over 10 games.
I don't understand that part. Do you mean "beat" in the sense of winning one of the games, or "beat" as in winning the match? The latter I won't believe until you show it. Perhaps you are talking about some 0.0x% vs. 0.00x%?

Sven
HGM is correct, that's why I consider his remark interesting. Observe that the probability of the weaker engine to win a match of 10 games is proportional to the probability to win one game in 10 games, which is almost equal to 10 times winning percentage of the weaker engine. So yes, in a match of 10 games the engine with 1.1% has almost 10 times more chances to beat the stronger engine than that with 0.1%. LOS is more useful here.

Kai
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Elo points gain from doubling time

Post by hgm »

To be exact: the probability for 9 draws + 1 win of the weaker in the two cases is:

10 * 0.90^9 * 0.001 = 0.39724%

10* 0.88^9 * 0.011 = 3.4813%
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Elo points gain from doubling time

Post by Laskos »

hgm wrote:To be exact: the probability for 9 draws + 1 win of the weaker in the two cases is:

10 * 0.90^9 * 0.001 = 0.39724%

10* 0.88^9 * 0.011 = 3.4813%
And the likelihood of strict superiority in 10 games is 0.396% and 3.88% in these two cases (a quick monte carlo).

Kai
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Elo points gain from doubling time

Post by Sven »

hgm wrote:To be exact: the probability for 9 draws + 1 win of the weaker in the two cases is:

10 * 0.90^9 * 0.001 = 0.39724%

10* 0.88^9 * 0.011 = 3.4813%
Correct, apart from a minor typo (first value is 0.38724%).

So it is roughly around 3% vs. 0.3%. More than I expected but still very low.

Furthermore, this is purely virtual, in reality you never know such "probabilities" since you know nothing about the draw probability between two given players. And there is nothing you can derive that from. You are right of course that the rating difference itself is not sufficient to predict wins/draws/losses, it is only sufficient for prediction of scores. But I don't see why this fact could be misleading somehow, and I don't see how a model could be set up that helps in predicting wins/draws/losses.

The observation that rating differences correspond to scores somehow is the base of the ELO model but AFAIK there is no well-founded observation telling us anything definite about wins/draws/losses. One reason for that could be that the draw probability might depend on some conditions of complex nature that are hard to formalize, e.g. "closeness of playing style", "individual choice of openings", "contempt factor" (in case of engines), etc. This could also mean that "draw probability" is more of a bilateral issue than "expected score" due to those "individual" factors.

Sven
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Elo points gain from doubling time

Post by hgm »

The point is that it ceases to be a personality or style issue when you reach the level were the quality of play is such that you should be always able to hold the draw, and losing is a severe blunder that heavily counts against our quality as a player.

We want rating to represent quality of the player. Not reflect drawishness of the game / initial position. Because most users of Chess Engines rarely use the engine on the initial position, so who cares how drawish that is? They use them to analyze critical positions from games.

If at a high-enough level eventually 99% of the games from the initial position will be draws, all engines will get nearly the same rating, as it becomes impossible to beat the opponent by more than 1% (= 7 Elo). Yet, the 4007-rated engine might beat the 4000-rated engine by 90% when started from a critical position near the edge of the draw sector. So it is of paramount importance for the user to believe the 4007-rated engine, and not the 4000-rated one when they disagree, or he would be heading for a 'sure loss'. The 7-point Elo difference sort of hides that.

That is what I consider misleading.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Elo points gain from doubling time

Post by Don »

hgm wrote:The point is that it ceases to be a personality or style issue when you reach the level were the quality of play is such that you should be always able to hold the draw, and losing is a severe blunder that heavily counts against our quality as a player.
I am not so clear on that point. 2 perfect players, why cannot each have a different style?

In checkers we have a small preview of how this might work because the programs are much closer to perfection than in chess. Jonathon Schaeffer believes that style has become a very important part of winning - at least against weaker players. Presumably you wouldn't change your style against any particular opponent unless you wanted to get into opponent modeling.

The primary point being that you want to play "provocatively." I think that modern programs playing at long time controls are strong enough to get the occasionally draw against a perfect player playing a "random perfect" move but that it might be close to impossible for them to draw if the style of the perfect player is chosen appropriately.

A trivial example to illustrate the point is that the perfect player, knowing the position is a draw anyway, forces a repetition - which against a fallible opponent is a "blunder."

We want rating to represent quality of the player. Not reflect drawishness of the game / initial position. Because most users of Chess Engines rarely use the engine on the initial position, so who cares how drawish that is? They use them to analyze critical positions from games.

If at a high-enough level eventually 99% of the games from the initial position will be draws, all engines will get nearly the same rating, as it becomes impossible to beat the opponent by more than 1% (= 7 Elo). Yet, the 4007-rated engine might beat the 4000-rated engine by 90% when started from a critical position near the edge of the draw sector. So it is of paramount importance for the user to believe the 4007-rated engine, and not the 4000-rated one when they disagree, or he would be heading for a 'sure loss'. The 7-point Elo difference sort of hides that.

That is what I consider misleading.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Elo points gain from doubling time

Post by hgm »

Don wrote:I am not so clear on that point. 2 perfect players, why cannot each have a different style?
The issue is not so much whether they can have a different style, but whether the draw rate would be affected by style.

Style is a relative notion anyway. What at a low level seems a style, like going for highly risky sacrificial combinations into murky positions, to come out on top more often than not, can at a higher level be simply poor play, leading to a certain loss.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Elo points gain from doubling time

Post by Don »

hgm wrote:
Don wrote:I am not so clear on that point. 2 perfect players, why cannot each have a different style?
The issue is not so much whether they can have a different style, but whether the draw rate would be affected by style.
To be it's self-evident that the draw rate is affected by style - only when BOTH players are perfect does that cease to be the case.

Style can be defined as any recognizable tendency in a player. Does the player like to "mix it up", or is he one that is always looking to simply? Will he take chances to win?

A perfect player does not need to take chances in order to win but he can play in such a way that it APPEARS that he is. When Tal or some great players makes highly speculative sacrifice in order to win, or avoids the obvious draw to play for a possible win it's a manifestation of that style. In the sacrifice case, if the move actually comes out as a draw we still say it was a good try and a perfect player could go for the same strategy.

In the face of ever increasing draws in computer chess, if you were given a 32 man database how would you use it? I think I would not just push out moves but I would work hard on building in a strategy model which tried to make it as difficult as possible for fallible opponents to draw. I would avoid trading pieces, I would keep the legal move count as high as possible and probably other things as well. So within the perfect play strategy you could have many different styles.

Style is a relative notion anyway. What at a low level seems a style, like going for highly risky sacrificial combinations into murky positions, to come out on top more often than not, can at a higher level be simply poor play, leading to a certain loss.
In another thread I am trying to show that some aspects of style can actually be measured. But I agree that in general style is subject to interpretation. In humans it's difficult to define "ugly" be we mostly agree on it when we see it. However it's also true that beauty is in the eye of the beholder.

With chess, all the great players have recognizable styles and yet they all play more or less soundly. A willingness to take risks does imply more losses too but this is compensated by more wins. Of course if a player is just plain stupid you are right. It's not difficult to find ways to make unsound sacrifices but that's not what I am talking about here.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Elo points gain from doubling time

Post by lkaufman »

Adam Hair wrote:I did a similar study with Houdini 2.0c.

Code: Select all

Ply X+1 – Ply X	Elo Diff
3 – 2	          256.3
4 – 3	          195.2
5 – 4	          183.2
6 – 5	          180.0
7 – 6	          175.1
8 – 7	          173.1
9 – 8	          235.2
10 – 9	         190.5
11 – 10	        154.4
12 – 11	        132.8
13 - 12	        112.4
14 – 13	        104.0
15 – 14	         98.1
16 – 15	         86.4
17 - 16	         86.7
I did a similar partial study for Critter 1.2 and 1.6, to see if the pattern is much different. Each match at least 2000 games. I got (Critter 1.2 first):

6 - 5 192.2 191.3
7 - 6 180.5 167.0
8 - 7 173.6 179.1
9 - 8 236.9 233.1
10 -9 175.2 176.0

Note first the remarkable agreement between Critter 1.2 and 1.6, closer than I would expect if they were the same program! I'm sure this is mostly coincidence, as Critter 1.6 is significantly stronger than 1.2, but the search must be pretty similar.
Next note the remarkable agreement between the average of the Critter values and the Houdini values obtained by Adam. Only at ten ply do we see a non-trivial difference, perhaps due to the search feature that Robert reported kicking in at that depth. But the jump up from 8 to 9 ply (which was accompanied by a larger time ratio, so it was not a case of getting something for nothing) is virtually identical in all three engines. As far as I know, Critter 1.2 does not have any search feature other than Singular Extension that kicks in at 9 ply, so this would appear to fully account for the observed jump in Critter, and so should be enough to do the same for Houdini. If anyone with knowledge of Critter and/or Houdini can confirm or deny any of this, please do so.
Maybe I'll do a similar analysis for other engines, as it only takes me about an hour to do this on my 16 core, 32 thread machine. This looks like a good way to do a "similarity test" for search, to complement Don's "similarity test" for eval.
User avatar
rvida
Posts: 481
Joined: Thu Apr 16, 2009 12:00 pm
Location: Slovakia, EU

Re: Elo points gain from doubling time

Post by rvida »

lkaufman wrote: I did a similar partial study for Critter 1.2 and 1.6, to see if the pattern is much different.
lkaufman wrote: As far as I know, Critter 1.2 does not have any search feature other than Singular Extension that kicks in at 9 ply, so this would appear to fully account for the observed jump in Critter
Easy to verify. Retry the 8 vs 9 ply match with singular extensions disabled and compare the results.