zamar wrote:
No idea. But you do get it by visiting any of the online blackjack forums and start to discuss "progressive betting strategies". There are world-class statisticians which will be quite happy to explain why the idea is broken. It happens every day.
Sorry Bob, but I don't get what you are trying to say.
I understand fully well why "progressive betting strategies" won't work, because I could prove it when I was kid, but I can't see the connection between "progressive betting strategies" and "determining chess engine relative strength through sequential probability ratio test".
The idea being discussed was this. You start a test. And you note that the results are quite bad after 100 games, so you stop the test and throw the change out. That's wrong. Same thing when you start the test, and after 100 or 1000 games, the Elo is better than the old best version. So you stop the test and declare that the new version is better. Which is also wrong.
The results bounce around a lot early on, and slowly settle down as the number of games increases. If you want a short test, pick the number of rounds _first_ and then run the test. It will not be as accurate as more games, but it is better than waiting until you see some high point (or low point) and deciding at that instant in time.
That was the idea I was describing as being flawed. I even posted some results here last week showing this very thing. That you could have picked a stop point where the change looked very bad, or one where it looked very good. And neither would have been correct.
I didn't say "no way it can be done". And I gave a methodology where one could create a simple table indexed by elo difference in one axis and acceptable error on the other, and the intersection would give the number of games you need before terminating the test.
But just looking at the results after (say) 200 games where the new version is doing better or worse than the old, and stopping at that point, without some real statistical support behind you will lead to errors.
So yes, it can be done. But the number of games is absolutely a function of the acceptable error range and the difference in elo between the two engines. The wider the gap, the fewer games you need. The higher the acceptable error can be, the fewer games you need. Just stopping because one is a bit (or a lot) ahead is not going to work.
I'll explain my view once more: When the match between two engines goes on, every new result provides us with information. And always when we get information, probabilities change. Now the question is how can we use that information to adjust match length so that we reach wanted confidence level. You say that there is no way, and that's false, and very easy to prove. Here is counter-example: Suppose we want to play 1000 games match and accept the confidence level we get from there. After 700 games match A-B is 700-0. You claim "because progressive betting strategies won't work, you have to play the full 1000 games before we can know anything with wanted confidence level", although it can clearly be seen from here that match could at worst end 700-300 and A could be clear winner with wanted confidence level and we can immediately stop the match.
That's not what I said at all, unfortunately. Nowhere has anyone discussed 700-0 results. We are talking about results between two versions that may or may not be pretty close. With a small number of games, you might get 50-10 and conclude something dead wrong, had you gone to 500 games where the results may well even out. That's the flaw here.
Of course 700-0 is just extreme, but then you start "milding" it step by step 699-1, 698-2, ..., but when can we stop? And that's the question where we are looking for the answer...