Likelihood of superiority

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Uri Blass
Posts: 8551
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: Likelihood of superiority

bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote: The probabilty is dependent on the change that you make
so you cannot give a general answer to it(different changes mean different apriory knowledge).

Here are 2 examples.

1)Suppose that you do some change in the order of moves and find based on test suite of 1000 positions that the program is 5% faster in getting depth 12.

I think that you can be almost sure that the change is an improvement even without games.
Classic mistake. Just change null-move R=5 and you will get that result (faster time to depth 12). Or change your reduction factor to 3 or 4 or 5. Ditto. Is the program actually stronger? almost certainly not. Is it faster to depth 12? significantly faster.
I talked about change that is only in order of moves
R=5 in null move is a different type of change.

I did not claim that getting depth 12 faster is always better but if you get depth 12 faster only by changing the order of moves(no change in extensions or reductions) then you probably have an improvement.

Uri
Doesn't matter. Change ordering so that you search checks first. That will help most tactical positions go by faster. But the program will play worse overall. The rule is that changing ordering can change the time, but not the score, with normal alpha/beta. Of course with today's programs the scores can change a bit too. But faster needs to be faster in game positions. Using non-game test positions will most likely distort the result, and in the wrong direction.
The test positions clearly can come from games
I did not think about tactical suite.

If the target is only to get depth 12 and not to find a specific move then it does not make sense to use tactical test suite.

Zach Wegner
Posts: 1922
Joined: Wed Mar 08, 2006 11:51 pm
Location: Earth
Contact:

Re: Likelihood of superiority

Rémi Coulom wrote:It is even simpler if you know the Gaussian joint distribution of the two ratings: any linear combination of ratings is also Gaussian, so the difference of ratings is Gaussian, and its variance can be computed from the joint variance. So estimating the LOS only consists in ERF((Y-X)/sigma) or something like that. This is how it is computed in bayeselo.
Really? The CBradleyTerry::ComputeCovariance() seemed pretty complicated, I stopped trying to understand it

I see what you are saying though, that's much simpler. I was hoping you might chime in here

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 7:17 pm

Re: Likelihood of superiority

Rémi Coulom wrote:
Note 1: draws do not count
Thanks a lot ! I was looking for something like this. But I have a big question: How draws do not count ?

I mean I could have in A vs A' 1 win, 0 loss or I could have 1 win 300 draws 0 loss. What we can infere in the two cases is totally different.

In the first case we can say almost nothing, while in the second case we can say that the two engines should not (with good probability) be so far apart.

Could you pelase explain why number of draws (that at the end is similar to say number of palyed matches) does not count ?

Thanks
Marco

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 7:17 pm

Re: Likelihood of superiority

Uri Blass wrote:
The probability that A is stronger than A' is always 1 or 0
and the result of the match does not change this probability.

I guess that you want to know what is the probability that the A' is better than A after you know the result of 500 games match between them(when A' is better than A mean that you can expect A' to beat A in a match).
Yes I mean that.
Uri Blass wrote: Without knowing a prior probability distribution it is impossible to calculate probability.
I have very big doubts on your last sentence. For instance suppose you have two engines A and A' and you know nothing about them. You play say 100.000 games between them and at the end the first engine scores 100 ELO points more then the second.

I think you can quite clearly assume that A is stronger then A' even without knowing anything about a prior probability of them.

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 7:17 pm

Re: Likelihood of superiority

Uri Blass wrote:<snipped>

it is possible to have 9 changes that add 1 elo and 1 change that reduce 10 elo and has a negative change of 1 elo(9*1-10).
Uri, I think that when we talk about probability and statistic we have to _forget_ terms like "it is possible", "could happen", "there is the case that" and so on.

In statistical terms what counts are not the exceptions but the average case or, to be more correct, the probability that a given event is true.

In your example of course it can happen, but the probability of your case is surely very very low, much lower then the average case in which 10 "supposed" positive changes actually increase the strength of the engine.

If you read the original question I asked: "what is a probability that after applying 10 supposed positive patches the engine is stronger".

This does NOT mean that there couldn't be a very unlike case where after 10 patches the engine is weaker, but that this case has a very low probability to happen.

BTW it is much easier to misunderstand a patch that adds or removes 1 ELO point then one that adds 10 ELO points. So in your example the probability that among ten patches, the 9 that add only 1 point are correctly identified as positive and the one that removes 10 ELO is instead misunderstood and erroneusly identified as positive also is of course very very low.

It does not mean that could not happen. It only means that we don't care

Rémi Coulom
Posts: 432
Joined: Mon Apr 24, 2006 6:06 pm
Contact:

Re: Likelihood of superiority

mcostalba wrote:
Rémi Coulom wrote:
Note 1: draws do not count
Thanks a lot ! I was looking for something like this. But I have a big question: How draws do not count ?

I mean I could have in A vs A' 1 win, 0 loss or I could have 1 win 300 draws 0 loss. What we can infere in the two cases is totally different.

In the first case we can say almost nothing, while in the second case we can say that the two engines should not (with good probability) be so far apart.

Could you pelase explain why number of draws (that at the end is similar to say number of palyed matches) does not count ?

Thanks
Marco
The two cases you mention are very different in terms of estimating the strength difference between the two programs: in the first case you know very little about their relative playing strength, in the second case you know that they are likely to be very close in strength. But LOS is still the same.

A draw will at the same time make estimated Elo ratings closer to each other, and reduce the width of confidence intervals. It does this in such a way that the LOS does not change.

More precisely, here is a mathematical proof. Probabilities of win, draw, and loss are p_1, p_0.5, and p_0. Number of wins, draws, losses are n_1, n_0.5, and n_0. With a uniform prior:

Rémi

Uri Blass
Posts: 8551
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: Likelihood of superiority

mcostalba wrote:
Uri Blass wrote:
The probability that A is stronger than A' is always 1 or 0
and the result of the match does not change this probability.

I guess that you want to know what is the probability that the A' is better than A after you know the result of 500 games match between them(when A' is better than A mean that you can expect A' to beat A in a match).
Yes I mean that.
Uri Blass wrote: Without knowing a prior probability distribution it is impossible to calculate probability.
I have very big doubts on your last sentence. For instance suppose you have two engines A and A' and you know nothing about them. You play say 100.000 games between them and at the end the first engine scores 100 ELO points more then the second.

I think you can quite clearly assume that A is stronger then A' even without knowing anything about a prior probability of them.
You can be practically sure that A is stronger than A' because every reasonable distribution lead to this conclusion.

By the same way you can say that if you throw a cube 100,000 times and it falls on 6 90,000 times the cube is not a fair cube but you cannot calculate exact probability.

You also cannot calculate exact probability that the cube is not a fair cube if you know the cube fell 5 times on 6 in the first 5 times that you throw it(in this case if your apriory assumption is that you are almost sure that the cube is fair you may continue to believe that the cube is fair inspite of the fact that it fell 5 times on the same side)
You need some apriory probability about the cube to calculate probabilities.

Uri

Rémi Coulom
Posts: 432
Joined: Mon Apr 24, 2006 6:06 pm
Contact:

Re: Likelihood of superiority

mcostalba wrote:
Rémi Coulom wrote:
Note 1: draws do not count
Thanks a lot ! I was looking for something like this. But I have a big question: How draws do not count ?

I mean I could have in A vs A' 1 win, 0 loss or I could have 1 win 300 draws 0 loss. What we can infere in the two cases is totally different.

In the first case we can say almost nothing, while in the second case we can say that the two engines should not (with good probability) be so far apart.

Could you pelase explain why number of draws (that at the end is similar to say number of palyed matches) does not count ?

Thanks
Marco
Also, there is another intuitive explanation:

Suppose you change the rules of the game, and call it chess*: whenever two players draw, they have to start playing a new game until the outcome is not a draw.

Being stronger at chess is equivalent to being stronger at chess*. With your data, if we consider chess*, the outcome is the same: 1-0.

Rémi

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 7:17 pm

Re: Likelihood of superiority

Rémi Coulom wrote: A draw will at the same time make estimated Elo ratings closer to each other, and reduce the width of confidence intervals. It does this in such a way that the LOS does not change.
Wow, this is absolutely a counterintuitive result (at least for me). I already knew statistic could be very counter intuitive sometime, but this really surprised me !

Thanks again for the explanation. I have learnt something this evening

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 7:17 pm

Re: Likelihood of superiority

Rémi Coulom wrote: Also, there is another intuitive explanation:

Suppose you change the rules of the game, and call it chess*: whenever two players draw, they have to start playing a new game until the outcome is not a draw.

Being stronger at chess is equivalent to being stronger at chess*. With your data, if we consider chess*, the outcome is the same: 1-0.

Rémi
Yes but you can get a 1-0 score after 1 game only or after 1000 games where the first 999 were draws !