mcostalba wrote:bob wrote:
Other terms you use are ambiguous, making questions difficult to answer. What is a "match"? A set of games starting from a chosen set of positions, against an opponent (or group of opponents)? What is "matches"? Repeating the same set of positions/opponents again? Different positions?
In the simplest case is a match between A and A' only, a direct match on say 500 games where the two engines play against each other.
I would not consider others in this contest. I am interested to know with what probability engine A can be said to be stronger then A' _in that testing conditions_ (not that I didn't say "if A is stronger then A' "). I just need to know how to caulculale, starting from the 500 games result in terms of games wins, draws and losts, with what probability, say, A is stronger then A'.
This is a theoretical question, I am not arguing _if_ the one vs one matches are relaible or not or are effective or not in chess engine's testing.
Regarding testing condition I don't think that are important because, as I said, I want to know with what probability A is stronger then A' in THAT CONDITIONS, not in general terms, and not how well that conditions could be transalted to the general case.
So I have removed the variable of different opponents because I am not interested for this experiment and the variable on conditions because I accept that the diffference in strenght applies only under the same conditions in which the engines were tested.
How this LOS, calculated in this way, can be extrapolated for the general case it is a _complete_ other problem. I would call it 'problem of testing effectiveness', but is not discussed here.
1)You ask the wrong question.
The probability that A is stronger than A' is always 1 or 0
and the result of the match does not change this probability.
I guess that you want to know what is the probability that the A' is better than A after you know the result of 500 games match between them(when A' is better than A mean that you can expect A' to beat A in a match).
Edit:The part that begins with "I guess" is also not a correct discription and you want to know what is the probability that you get the right decision if you have some rule of choice
to decide which program is better based on the result.
2)There is no easy answer for what you want to know
because the probability that you want to know is based on
a prior probability distribution that may be different in different cases.
An extreme example is when
you know in the beginning of the match that
A' and A is exactly the same program in this case the probability that the winner is better is 0 even after you know the result.
You usually have some knowledge and I think that it is a mistake to ignore the knowledge.
you usually know for example that a change that you make in stockfish is not going to make the program 200 elo better but it may make the program 200 elo weaker because of some bug so you have an apriori distribution about the rating difference between A and A'.
If you give some apriori distribution of the rating difference between A and A' then it is possible to calculate the probability that you get the right decision based on the result.
Without knowing a prior probability distribution it is impossible to calculate probability.
You can read about a prior probability distribution in
http://en.wikipedia.org/wiki/Prior_probability
Note that I did not ask Tord's opinion about it but I come from mathematicl background and I
expect every mathematician to agree with me so I expect Tord to agree with me.
Uri