Dann Corbit wrote: ↑Tue Jun 30, 2020 7:35 pm
And you know that the program won the 450 games because the program was stronger and not because one of the two machines was one Hz faster and therfore won more games due to experimental error when the operator did not switch machines routinely during the contests?
And you know that in the eons it took to caculate all those games that the 450 games were actually won and we did not have a cosmic ray hit one of the accumulators (See Akan Ban's experience with cosmic rays in calculation of Perft(15))?
And you know that the program won the 450 extra games rather than an error in accumulation due to doubles eventually not accumulating a sum of 1 because the distance between consecutive numbers becomes larger at some distant point and we therefore had a numerical error?
Point 1 is tantamount to claiming the result is not valid in the first place. Of course you cannot draw any conclusion from invalid test scores. If the operator consciously cheats, he can easily make the stroger program lose. Perhaps the engine that won was actually 300 Elo weaker, and the operator just reset the computer in all the 3e100 games that it was on the way of losing or winning, to replay them until they happened to be draws. This isn't worth arguing about.
If this is a valid test result (no cheating involved) you would have to explain why it was always the same engine that suffered from 'accidental operator mistakes', cosmic rays, etc. That it happened 8 times to one, and never to the other by accident is quite unlikely.
A million games in a row that are drawn is absurdly strong evidence that the programs are exactly equal.
Not at all. There are infinitely more cases where they are not exactly equal, which could give exactly the same result (if they were nearly perfect, but not quite). Of course it begs for an explanation that they would always draw, and almost never win (approximately equally many times). This suggests they are playing near-perfect chess.
And 10^100-450 out of 10^100 draws is incredible evidence of identical strength, far greater than anyone could ever hope for in a real experiment.
By contrast 450 out of 10^100 is no evidence at all.
Nonsense. 450 is not nothing, and it will never become nothing just because something else is big. 450 wins are 450 wins, and will always stay 450 wins. Extreme ratios of the number of wins and losses do not become any more likely when the win+lose probability goes down. They only become more likely when the ratio of the win vs loss probability increases.
You can easily check that for yourself, if you don't believe it, with your coin-flip engine: give a win if r < epsilon, a loss if r > 1-epsilon, and a draw otherwise. Let each epsilon play matches until there are 8 non-draws, and the result 8-0 has occured, say, a dozen times. Then look which fraction of the matches did have 8-0 for the non-draws (rather than 7-1 or 4-4 etc.). Then divide epsilon by 10 and do the same. Keep doing that until your patience runs out.
The coin tossing experiment I showed proves that we should not expect exact equality in a long match amonst pure equals. In fact, it is lunacy to do so.
If it proved anything, it was that you never got a 100% vs 0% result. 8-0 was such a result, though. So you 'proved' that the engines that were drawing so much could not have been equal.