Zenmastur wrote:Michel wrote:Zenmastur wrote: So, in effect, their "High Tech" advantage has been largely squandered by poor framework design.
Ah ok. Good to know this.
Why is this good to know?
If it turns out that there isn't a good reason to do this from an efficiency stand point I would think it would be a TERRIBLE thing to know. It means that they have probably cut their efficiency roughly in half. Besides, there's a VERY easy fix for this that would probably take all of 10 seconds to implement.
Regards,
Zen
Even at half the efficiency, SPRT 5% 5% is still miles ahead of usual "outside error margins" (usually 2 SD) argument. SPRT 5% 5% introduces no only fairly close to optimum efficiency, but some discipline too. What even experienced testers/developers do? They rarely use SPRT, instead run scripts for rating tools showing ELO and error margins (or LOS or p-value). And the main problem is not even that. They often do it negligently.
Say, you plan a match of 12000 games. From time to time you have a glimpse at the intermediate result, and after 7000 games the long-expected outcome is apparent, it's "outside error margins", stop it. Inadvertently, this sort of negligence accumulates the dangerous Type I error at a fast pace.
I made simulations, for a match of 12000 games, using 2 SD (CI=95.4%), if the tester has a glimpse on ELO and error margins after each 1000 games (12 total glimpses maximum) in a hope of a clear outcome, then the "bad" Type I error climbs from theoretical 2.3% to 9.1%. I plotted here the efficiency of many developers/testers, who use "Sloppy 2 SD".
SPRT 5% 5% is 3-5 times faster in the relevant range, with "Sloppy 2 SD" breaking down at 16% Pgood. Even at half the efficiency SPRT is a big step forward from the artful use of error margins (or LOS or p-value). In fact, for artists, I would recommend to use 3 SD instead of 2 SD, as Type I error due to indiscipline is much tamer.