Likelihood Of Success (LOS) in the real world

Laskos · Post by **Laskos** » Fri May 26, 2017 2:53 am

LOS as usually understood and computed is a mathematical fiction. It uses a uniform prior (1 for [0,1] region of score in Chess). Humans have uniforms priors probably only at birth. In the likes of Stockfish Testing Framework and other development frameworks, the unconsciously assumed priors are so strong, that LOSp (or LOS with non-uniform prior) is completely off LOS with uniform prior. LOSp depends both on prior and Draws, besides Wins and Losses. LOS depends only on Wins and Losses.

A unnormalized prior for Stockfish Testing Framework might look a bit scary:

[score*(1-score)]**1000

As scary as it seems, it assumes that the ELO differences between development versions are no larger than 15 ELO points, which is a reasonable assumption for the Framework. LOS and LOSp in for W=1, D=0, L=0 look as following:

LOS = 0.75
LOSp = 0.517

One Win gives almost no information in real Stockfish world. Suppose with have now 5 consecutive Draws: W=1, D=5, L=0:

LOS = 0.75 again (independent of Draws)
LOSp = 0.553

5 Draws gave more information on LOSp than 1 Win (having that Win).

Rating groups use more liberal ELO differences in direct matches of up to say 400 ELO points. A suitable prior is [score*(1-score)]**2. In this case the differences is less accentuated, but still visible in W=1, D=0, L=0 case:

LOS = 0.75
LOSp = 0.698

With 5 Draws added, LOSp becomes 0.739, closer to 0.75 of the uniform prior.

Many people are still using LOS as some empiric stopping rule, and additional care must be taken, especially when one feels or knows engines are very close in strength.

My computations were done in Mathematica for general prior and W, D, L, I just exemplified hare some results. I could post the code, but it's not very illuminating.

AlvaroBegue · Post by **AlvaroBegue** » Fri May 26, 2017 3:11 am

I don't think of LOS in any Bayesian framework: LOS is what in other fields is called a p-value. It's a quantity that, under the null hypothesis that both players are equally strong (i.e., if the true Elo difference is 0), would be uniformly distributed in [0,1].

See https://en.wikipedia.org/wiki/P-value .

Laskos · Post by **Laskos** » Fri May 26, 2017 3:39 am

AlvaroBegue wrote:I don't think of LOS in any Bayesian framework: LOS is what in other fields is called a p-value. It's a quantity that, under the null hypothesis that both players are equally strong (i.e., if the true Elo difference is 0), would be uniformly distributed in [0,1].

See https://en.wikipedia.org/wiki/P-value .

LOS as p-value is indeed defined with an uniform prior. So, a pretty bad quantity to use for us.

AlvaroBegue · Post by **AlvaroBegue** » Fri May 26, 2017 3:43 am

Laskos wrote:
AlvaroBegue wrote:I don't think of LOS in any Bayesian framework: LOS is what in other fields is called a p-value. It's a quantity that, under the null hypothesis that both players are equally strong (i.e., if the true Elo difference is 0), would be uniformly distributed in [0,1].

See https://en.wikipedia.org/wiki/P-value .
LOS as p-value is indeed defined with an uniform prior. So, a pretty bad quantity to use for us.

I don't understand what you are saying. LOS as a p-value doesn't use a prior at all, uniform or otherwise. It is a measure of statistical significance of a departure from the null hypothesis. There is really nothing wrong with it.

Laskos · Post by **Laskos** » Fri May 26, 2017 3:51 am

AlvaroBegue wrote:
Laskos wrote:
AlvaroBegue wrote:I don't think of LOS in any Bayesian framework: LOS is what in other fields is called a p-value. It's a quantity that, under the null hypothesis that both players are equally strong (i.e., if the true Elo difference is 0), would be uniformly distributed in [0,1].

See https://en.wikipedia.org/wiki/P-value .
LOS as p-value is indeed defined with an uniform prior. So, a pretty bad quantity to use for us.
I don't understand what you are saying. LOS as a p-value doesn't use a prior at all, uniform or otherwise. It is a measure of statistical significance of a departure from the null hypothesis. There is really nothing wrong with it.

Bayes' formula to derive LOS uses the uniform prior (and thus giving that nice closed form and erf approximation). I used the same Bayes' formula to derive LOSp (non-uniform prior) with numerical results.

AlvaroBegue · Post by **AlvaroBegue** » Fri May 26, 2017 4:13 am

Laskos wrote:
AlvaroBegue wrote:
Laskos wrote:
AlvaroBegue wrote:I don't think of LOS in any Bayesian framework: LOS is what in other fields is called a p-value. It's a quantity that, under the null hypothesis that both players are equally strong (i.e., if the true Elo difference is 0), would be uniformly distributed in [0,1].

See https://en.wikipedia.org/wiki/P-value .
LOS as p-value is indeed defined with an uniform prior. So, a pretty bad quantity to use for us.
I don't understand what you are saying. LOS as a p-value doesn't use a prior at all, uniform or otherwise. It is a measure of statistical significance of a departure from the null hypothesis. There is really nothing wrong with it.
Bayes' formula to derive LOS uses the uniform prior (and thus giving that nice closed form and erf approximation). I used the same Bayes' formula to derive LOSp (non-uniform prior) with numerical results.

Let me see if I understand what you are saying. We can consider one "heat" to be a small match between engine 1 and engine 2 where we continue playing games until we get a result that is not a draw. There is a true probability of engine 1 winning the heat, and we call it p.

We could discover something about p by using Bayesian statistics, where we start with some prior, we observe some results of heats and we then get a posterior probability. We might be interested in answering questions like "what is the probability that p is larger than 0.5?".

If we use a uniform prior, that probability is the LOS as it's usually defined. If we use a different prior (I think you suggest a Beta(1001,1001) distribution), we'll get an alternative definition (which will look a lot like assuming an initial tally of 1000 wins and 1000 losses).

Are we together so far?

What I am saying is that you can define LOS as a p-value of the results, which is a test of the plausibility of the null hypothesis. This is a frequentist approach to the problem, and not a Bayesian one. This is how I think of the meaning of LOS, and nothing else. It's still a very useful number, but it needs to be interpreted carefully, just like any p-value.

CPW wrote:The likelihood of superiority (LOS) denotes the probability of a certain engine being stronger than another.

WHAAAT??

Now I see what your beef is about! Your Bayesian interpretation of LOS with uniform prior would give some meaning to that sentence, but assuming a uniform prior is unreasonable. The other possibility is that whoever wrote that is being tripped by a very common misunderstanding of p-values. So common in fact that it has its own Wikipedia page: https://en.wikipedia.org/wiki/Misunders ... f_p-values

cdani · Post by **cdani** » Fri May 26, 2017 7:15 am

AlvaroBegue wrote:
CPW wrote:The likelihood of superiority (LOS) denotes the probability of a certain engine being stronger than another.
WHAAAT??

Can any of you give a non mathematical plain language sentence of what is LOS? Thanks!

ZirconiumX · Post by **ZirconiumX** » Fri May 26, 2017 9:40 am

cdani wrote:
AlvaroBegue wrote:
CPW wrote:The likelihood of superiority (LOS) denotes the probability of a certain engine being stronger than another.
WHAAAT??
Can any of you give a non mathematical plain language sentence of what is LOS? Thanks!

I'm going to say this as I understand it; if I'm wrong then we've both learned something.

Let's say you have a match between engine A and engine B. The Likelihood of superiority is the probability that A will win the match. If A is clearly stronger (wins more games), then the LOS will increase to a limit of 1. If A is clearly weaker (loses more games), the LOS will decrease to a limit of 0. If the two are equally strong (games are about equal), the LOS will be around the 0.5 mark.

Some people then use LOS > 0.99 or whatever to conclude A is stronger and LOS < 0.01 to conclude B is stronger.

The mathematicians among us say this is a bad idea and you should use the SPRT instead, like Stockfish does.

Laskos · Post by **Laskos** » Fri May 26, 2017 10:02 am

cdani wrote:
AlvaroBegue wrote:
CPW wrote:The likelihood of superiority (LOS) denotes the probability of a certain engine being stronger than another.
WHAAAT??
Can any of you give a non mathematical plain language sentence of what is LOS? Thanks!

In Bayesian approach P(w>l | W,D,L) is indeed the probability w>l with a given prior. And our usual LOS is the probability of w>l with a usually wrong, uniform prior. In frequentist approach, LOS gives the plausibility of the Null hypothesis. LOS of 50% gives 100% plausibility, LOS 100% gives 0% plausibility. It gives no information on probabilities w>l, as it just tests the Null hypothesis (p-value). So, if you want to have (posterior) probability, use a reasonable prior and use Bayes' formula to get LOSp as I have shown in OP.

Laskos · Post by **Laskos** » Fri May 26, 2017 10:36 am

There is some universality of LOSp under chosen prior, its width is all that matters. Priors (s*(1-s))**1000 and exp(-(s-0.5)**2 * 2500) with widths both of 15 ELO points difference give similar results for LOSp. And very different from naive, uniform prior LOS.

Likelihood Of Success (LOS) in the real world

Likelihood Of Success (LOS) in the real world

Re: Likelihood Of Success (LOS) in the real world

Re: Likelihood Of Success (LOS) in the real world

Re: Likelihood Of Success (LOS) in the real world

Re: Likelihood Of Success (LOS) in the real world

Re: Likelihood Of Success (LOS) in the real world

Re: Likelihood Of Success (LOS) in the real world

Re: Likelihood Of Success (LOS) in the real world

Re: Likelihood Of Success (LOS) in the real world

Re: Likelihood Of Success (LOS) in the real world