What can be said about 1 - 0 score?

Laskos
Posts: 9417
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

What can be said about 1 - 0 score?

It happens to me that I have only one game between 2 engines. It gets even worse, engines have an unknown difference in strength, and I generally don't know anything about them. No theoretical or empiric prior. But I have the moves in the game. Here I use only the number of moves until one engine mates another in that single game (no adjudication). What I can say about the general relative strength of the engines?

1/ Expectation value of the score.

Expectation value for win: (1+1)/(3+1) = 1/2
Expectation value for draw: (0+1)/(3+1) = 1/4
Expectation value for loss: (0+1)/(3+1) = 1/4

Expectation value for Score: 1/2+1/8 = 5/8

Likelihood of Success (LOS): (1!0!/1!+1/2*1!0!/1!)/2 = 3/4

2/ Maximum Likelihood Estimation of the ELO difference by number of moves.

Here I plot likelihoods for the number of moves to the mate in Stockfish-Zurichess games (1200 Elo points difference) and Stockfish-Stockfish games (0 Elo points difference):

We see that if the game ended in mate in 40 moves, the likelihood that one engine is stronger by 1200 Elo points is an order of magnitude larger than the likelihood that the engines are equal. The number of moves at maximum likelihood for 1200 Elo points difference is 32, for 0 Elo points difference is 77 moves. And the linear interpolation is:

So, if this single game we have ended in mate in 40 moves, the maximum likelihood estimation would be that engines are separated by something like 1000 Elo points. I think everybody justly feels that it is a very rough estimation, only useful when we don't know anything about the engines.

hgm
Posts: 23631
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Re: What can be said about 1 - 0 score?

Doesn't the length of the game go up when the players get stronger? (E.g. at larger TC.) If so, the length of the game might not hold information on the Elo difference, but on the absolute strength. Perhaps it reflects the absolute strength of the weakest player more than the difference between the players.

Laskos
Posts: 9417
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: What can be said about 1 - 0 score?

hgm wrote:Doesn't the length of the game go up when the players get stronger? (E.g. at larger TC.) If so, the length of the game might not hold information on the Elo difference, but on the absolute strength. Perhaps it reflects the absolute strength of the weakest player more than the difference between the players.
Yes, I will check in Zu-Zu games. I think it is both, absolute strength, and difference. Maybe I can separate them by some criteria.

Laskos
Posts: 9417
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: What can be said about 1 - 0 score?

Laskos wrote:
hgm wrote:Doesn't the length of the game go up when the players get stronger? (E.g. at larger TC.) If so, the length of the game might not hold information on the Elo difference, but on the absolute strength. Perhaps it reflects the absolute strength of the weakest player more than the difference between the players.
Yes, I will check in Zu-Zu games. I think it is both, absolute strength, and difference. Maybe I can separate them by some criteria.
The pure strength scaling is mild, strength difference is much more important. Zurichess - Zurichess match, being 1200 Elo points weaker, has maximum likelihood of won game length at 70 moves, while SF-SF at 77 moves. SF-Zu match has the won game at 32 moves.

So, my result seems to hold, although there is a weak dependence on absolute strength. But the estimation is anyway very rough, only useful if we don't really have any prior information.

cdani
Posts: 2104
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Re: What can be said about 1 - 0 score?

But can be increased the confidence of a rating taking into account the length of the games?

Laskos
Posts: 9417
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: What can be said about 1 - 0 score?

cdani wrote:But can be increased the confidence of a rating taking into account the length of the games?
This prior is weak, it quickly vanishes in many games of a well connected rating list. But for unconnected clusters with few games, it can be useful. With this "length of games" prior, 3-0 score with 35 move games is seen as different than 3-0 score with 80 movers. Again, if the error margins are reasonable, the prior is very weak. It only affects very large uncertainties. Maybe Miguel with Ordo can see what happens if he implements some of this knowledge.

cdani
Posts: 2104
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Re: What can be said about 1 - 0 score?

Laskos wrote:
cdani wrote:But can be increased the confidence of a rating taking into account the length of the games?
This prior is weak, it quickly vanishes in many games of a well connected rating list. But for unconnected clusters with few games, it can be useful. With this "length of games" prior, 3-0 score with 35 move games is seen as different than 3-0 score with 80 movers. Again, if the error margins are reasonable, the prior is very weak. It only affects very large uncertainties. Maybe Miguel with Ordo can see what happens if he implements some of this knowledge.
Thanks. Maybe the level of confidence can be increased taking as the number of moves the first one that shows +3 or +4 or wathever, thus for example avoiding long to win endgames.

Laskos
Posts: 9417
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: What can be said about 1 - 0 score?

Laskos wrote:It happens to me that I have only one game between 2 engines. It gets even worse, engines have an unknown difference in strength, and I generally don't know anything about them. No theoretical or empiric prior. But I have the moves in the game. Here I use only the number of moves until one engine mates another in that single game (no adjudication). What I can say about the general relative strength of the engines?

1/ Expectation value of the score.

Expectation value for win: (1+1)/(3+1) = 1/2
Expectation value for draw: (0+1)/(3+1) = 1/4
Expectation value for loss: (0+1)/(3+1) = 1/4

Expectation value for Score: 1/2+1/8 = 5/8

Likelihood of Success (LOS): (1!0!/1!+1/2*1!0!/1!)/2 = 3/4
Here I should add Bayesian analysis:

If we know that the engines must be close in strength, we can take this prior probability distribution (unnormalized):

For 1-0 score we get LOS=51%, so not much can be said to separate engines based on this one game.

If we know that the strength difference between engines is large, we might take this prior (unnormalized):

For 1-0 score we get LOS=95%, and from this single game we are pretty confident that the engine scoring is significantly stronger than the other.
2/ Maximum Likelihood Estimation of the ELO difference by number of moves.

Here I plot likelihoods for the number of moves to the mate in Stockfish-Zurichess games (1200 Elo points difference) and Stockfish-Stockfish games (0 Elo points difference):

We see that if the game ended in mate in 40 moves, the likelihood that one engine is stronger by 1200 Elo points is an order of magnitude larger than the likelihood that the engines are equal. The number of moves at maximum likelihood for 1200 Elo points difference is 32, for 0 Elo points difference is 77 moves. And the linear interpolation is:

So, if this single game we have ended in mate in 40 moves, the maximum likelihood estimation would be that engines are separated by something like 1000 Elo points. I think everybody justly feels that it is a very rough estimation, only useful when we don't know anything about the engines.

Uri Blass
Posts: 8558
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: What can be said about 1 - 0 score?

hgm wrote:Doesn't the length of the game go up when the players get stronger? (E.g. at larger TC.) If so, the length of the game might not hold information on the Elo difference, but on the absolute strength. Perhaps it reflects the absolute strength of the weakest player more than the difference between the players.
Not always.
If the players are very weak and make random moves you can expect very long games.

Laskos
Posts: 9417
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: What can be said about 1 - 0 score?

I decided this week to do a more thorough study, and I played a dozen or so matches between wildly differing in strength engines (without adjudications), then analyzed their game lengths. One useful engine is the very first Zurichess, Zurichess Aargau, which is very stable and very weak (1730 CCRL 40/4). One analyzed match on normalized histogram looks like this:

The maximum likelihood lengths of these 3 matches:

Stockfish - Zurichess: 25 moves
Zurichess - Zurichess: 68 moves
Stockfish - Stockfish: 75 moves

Here too the strength difference is the main factor determining the mode of the game length. Based on the dozen matches played like this one, I derived a rule-of-thumb 1 standard deviation band of ELO difference versus game length:

We see that the band is very broad, and 2 standard deviations almost cover the entire range. So, one would say that little can be said of the derivation of strength difference from the length of a single game.

But as broad and inconclusive this result seems, this is an information we know beforehand playing that single game. Assuming normal distributions with the center value given by black line and standard deviation given by red region, I built symmetric priors dependent on game lengths to be used in calculation of Likelihood of Superiority (LOS) for 1-0 result. With uniform prior, LOS=75%.

Here are the derived according to the previous plot priors, for 45, 55, 65 and 75 move lengths of the games:

With these priors, I computed LOS for 1-0 score depending on the length of that single game:

We see that for game lengths above ~55 moves, the priors and LOS are very close to uniform, regular prior and LOS for 1-0 result. So, not much more information was gained. But below that length, going towards 28-30 moves, the LOS increases pretty dramatically to 0.9999. So, on this single game, if it is shorter than 50-55 moves, we can gain some information, and for very short games one can almost be sure that the winning engine is better (can be used as stopping rule too).