Likelihood Of Success (LOS) in the real world

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Likelihood Of Success (LOS) in the real world

Post by AlvaroBegue »

Laskos wrote:
AlvaroBegue wrote:
Laskos wrote:
AlvaroBegue wrote:I don't think of LOS in any Bayesian framework: LOS is what in other fields is called a p-value. It's a quantity that, under the null hypothesis that both players are equally strong (i.e., if the true Elo difference is 0), would be uniformly distributed in [0,1].

See https://en.wikipedia.org/wiki/P-value .
LOS as p-value is indeed defined with an uniform prior. So, a pretty bad quantity to use for us.
I don't understand what you are saying. LOS as a p-value doesn't use a prior at all, uniform or otherwise. It is a measure of statistical significance of a departure from the null hypothesis. There is really nothing wrong with it.
Bayes' formula to derive LOS uses the uniform prior (and thus giving that nice closed form and erf approximation). I used the same Bayes' formula to derive LOSp (non-uniform prior) with numerical results.
Let me see if I understand what you are saying. We can consider one "heat" to be a small match between engine 1 and engine 2 where we continue playing games until we get a result that is not a draw. There is a true probability of engine 1 winning the heat, and we call it p.

We could discover something about p by using Bayesian statistics, where we start with some prior, we observe some results of heats and we then get a posterior probability. We might be interested in answering questions like "what is the probability that p is larger than 0.5?".

If we use a uniform prior, that probability is the LOS as it's usually defined. If we use a different prior (I think you suggest a Beta(1001,1001) distribution), we'll get an alternative definition (which will look a lot like assuming an initial tally of 1000 wins and 1000 losses).

Are we together so far?

What I am saying is that you can define LOS as a p-value of the results, which is a test of the plausibility of the null hypothesis. This is a frequentist approach to the problem, and not a Bayesian one. This is how I think of the meaning of LOS, and nothing else. It's still a very useful number, but it needs to be interpreted carefully, just like any p-value.

CPW wrote:The likelihood of superiority (LOS) denotes the probability of a certain engine being stronger than another.
WHAAAT??

Now I see what your beef is about! Your Bayesian interpretation of LOS with uniform prior would give some meaning to that sentence, but assuming a uniform prior is unreasonable. The other possibility is that whoever wrote that is being tripped by a very common misunderstanding of p-values. So common in fact that it has its own Wikipedia page: https://en.wikipedia.org/wiki/Misunders ... f_p-values
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Likelihood Of Success (LOS) in the real world

Post by cdani »

AlvaroBegue wrote:
CPW wrote:The likelihood of superiority (LOS) denotes the probability of a certain engine being stronger than another.
WHAAAT??
Can any of you give a non mathematical plain language sentence of what is LOS? Thanks!
ZirconiumX
Posts: 1334
Joined: Sun Jul 17, 2011 11:14 am

Re: Likelihood Of Success (LOS) in the real world

Post by ZirconiumX »

cdani wrote:
AlvaroBegue wrote:
CPW wrote:The likelihood of superiority (LOS) denotes the probability of a certain engine being stronger than another.
WHAAAT??
Can any of you give a non mathematical plain language sentence of what is LOS? Thanks!
I'm going to say this as I understand it; if I'm wrong then we've both learned something.

Let's say you have a match between engine A and engine B. The Likelihood of superiority is the probability that A will win the match. If A is clearly stronger (wins more games), then the LOS will increase to a limit of 1. If A is clearly weaker (loses more games), the LOS will decrease to a limit of 0. If the two are equally strong (games are about equal), the LOS will be around the 0.5 mark.

Some people then use LOS > 0.99 or whatever to conclude A is stronger and LOS < 0.01 to conclude B is stronger.

The mathematicians among us say this is a bad idea and you should use the SPRT instead, like Stockfish does.
Some believe in the almighty dollar.

I believe in the almighty printf statement.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Likelihood Of Success (LOS) in the real world

Post by Laskos »

cdani wrote:
AlvaroBegue wrote:
CPW wrote:The likelihood of superiority (LOS) denotes the probability of a certain engine being stronger than another.
WHAAAT??
Can any of you give a non mathematical plain language sentence of what is LOS? Thanks!
In Bayesian approach P(w>l | W,D,L) is indeed the probability w>l with a given prior. And our usual LOS is the probability of w>l with a usually wrong, uniform prior. In frequentist approach, LOS gives the plausibility of the Null hypothesis. LOS of 50% gives 100% plausibility, LOS 100% gives 0% plausibility. It gives no information on probabilities w>l, as it just tests the Null hypothesis (p-value). So, if you want to have (posterior) probability, use a reasonable prior and use Bayes' formula to get LOSp as I have shown in OP.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Likelihood Of Success (LOS) in the real world

Post by Laskos »

There is some universality of LOSp under chosen prior, its width is all that matters. Priors (s*(1-s))**1000 and exp(-(s-0.5)**2 * 2500) with widths both of 15 ELO points difference give similar results for LOSp. And very different from naive, uniform prior LOS.
Dann Corbit
Posts: 12542
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Likelihood Of Success (LOS) in the real world

Post by Dann Corbit »

LOS (Likelihood of superiority) can be interpreted as:
What are the chances that A is better than B?
If it is 50%, then a coin toss.
If it is 0% then no chance.
If it is 100% then certain.

Simple as that.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: Likelihood Of Success (LOS) in the real world

Post by Michel »

The Bayesian "probabilities" are not true probabilities. Instead they represent your "belief" that something is true. The prior is your original belief and as new information comes in you adapt your position. A uniform prior is a so-called non-informative prior. It is what you choose if you know nothing about a subject.

Interpreted in this way Bayesian statistics is suitable for reasoning but unsuitable for making precise scientific statements.

By some fortunate accident, in the case of fixed length tests, the LOS for a uniform prior happens to be almost the same as the p-value which has a precise scientific interpretation.

This equality is no longer true in the case of sequential tests and in that case it is easy to see that the naive interpretation of LOS as a probability leads to disastrous results.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Likelihood Of Success (LOS) in the real world

Post by Laskos »

Michel wrote:The Bayesian "probabilities" are not true probabilities. Instead they represent your "belief" that something is true. The prior is your original belief and as new information comes in you adapt your position. A uniform prior is a so-called non-informative prior. It is what you choose if you know nothing about a subject.

Interpreted in this way Bayesian statistics is suitable for reasoning but unsuitable for making precise scientific statements.

By some fortunate accident, in the case of fixed length tests, the LOS for a uniform prior happens to be almost the same as the p-value which has a precise scientific interpretation.

This equality is no longer true in the case of sequential tests and in that case it is easy to see that the naive interpretation of LOS as a probability leads to disastrous results.
P-value is often misused in scientific literature. Ask a physicist why 5-sigma is required. He will talk of probabilities, and will not be able to explain why such a high "confidence" as t=5 is required. P-value is dealing with Null hypothesis, and that's more rigorous but less informative than probability based on belief. Also, as reasoning goes, isn't the belief in uniform prior not only also a belief, but also a misplaced belief? It's hard to me to reason about anything having uniform prior. We take two Stockfishes, see 10-0-0 result and conclude that it has p-value of <0.001, test passed? What would be here a precise scientific statement? Practically, the belief in correct interpretation of p-value is more dangerous than belief using a prior. That 10-0-0 is 60% probability that 10 score engine is stronger, given the reasonable prior for Stockfishes.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Likelihood Of Success (LOS) in the real world

Post by AlvaroBegue »

Michel wrote:The Bayesian "probabilities" are not true probabilities. Instead they represent your "belief" that something is true.
This is an old debate which mathematicians no longer have. If you have a set X and a function P that maps certain subsets of X to the interval [0,1] and certain axioms are satisfied, we call P a probability. That's a definition. Bayesian probabilities are probabilities, and the whole theory of probability applies to them.

I am generally a big fan of a Bayesian approach to modeling uncertainty. However, coming up with a reasonable prior in some situations (like this one) is tricky, and the prior can have a large influence in the result of the analysis, as Kai points out. So I would rather use a frequentist approach here.

In order to use a p-value rigorously, you need to design the experiment in advance (i.e., decide how many games you are going to play or -even better- how many non-draw results you need, and decide what threshold of LOS you are going to use to accept or reject the new version). For instance, if I say I will play 10,000 games and accept the change if the LOS is above 0.995, I know that the probability of a non-improvement getting through is at most 0.005. If in my process of trying changes I have an acceptance rate much higher than 0.005 (say, 10%), I can be pretty certain that the majority of my accepted changes are improvements.

This seems like a reasonable plan to make progress, so LOS is a very useful number to me.

[Full confession: I am not as systematic in my approach as I just described. It's a hobby, after all. ;) ]
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: Likelihood Of Success (LOS) in the real world

Post by Michel »

This is an old debate which mathematicians no longer have. If you have a set X and a function P that maps certain subsets of X to the interval [0,1] and certain axioms are satisfied, we call P a probability.
The numbers that come out of Bayesian statistics could only be called probabilities in the common sense of the word if the prior were perfectly known, which is almost never the case.

The concept of "belief" is the foundation of Bayesian statistics.

https://en.wikipedia.org/wiki/Bayesian_statistics
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.