Page 1 of 1

Testing resolution and combining results

Posted: Thu Jul 28, 2016 8:27 am
by cdani
Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

What do you think about this?

Re: Testing resolution and combining results

Posted: Thu Jul 28, 2016 8:30 am
by Dann Corbit
cdani wrote:Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

What do you think about this?
The result is shaky. Otherwise both tests would have confirmed it. I would say that if it clarifies or simplifies the code then go ahead and make the change. If it complicates the code then run both tests to a decision.

Re: Testing resolution and combining results

Posted: Thu Jul 28, 2016 8:35 am
by cdani
Dann Corbit wrote: The result is shaky. Otherwise both tests would have confirmed it. I would say that if it clarifies or simplifies the code then go ahead and make the change. If it complicates the code then run both tests to a decision.
I guess that can be proven to be matematically safe to use this way of testing, to save a lot time testing. Sure some of the readers here have the required knowledge to validate this.

Re: Testing resolution and combining results

Posted: Thu Jul 28, 2016 10:53 am
by matthewlai
It's fine as long as you do the exact same thing every time, and don't make the decision to do the second test or not (or when to cut off either test) depending on how the test is going. If you do, you can introduce biases.

The mathematical model relies on the fact that no decision to stop/continue has been made based on results.

Re: Testing resolution and combining results

Posted: Thu Jul 28, 2016 11:01 am
by Michel
cdani wrote:Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

What do you think about this?
What for sure you cannot do is decide on the testing strategy during the test. This creates horrible bias. Some people claim this is not true if you are using a Bayesian framework, but believe me, it is.

If you design a testing strategy then the simplest method to evaluate its merrits is by simulation.

For small elo patches, you will find it very hard to do better than the traditional STC/LTC SPRT's.

Re: Testing resolution and combining results

Posted: Thu Jul 28, 2016 2:13 pm
by cdani
Thanks all! It's clear now. Probably some testing strategies can be made that relies on this combining STC and LTC games.

Re: Testing resolution and combining results

Posted: Thu Jul 28, 2016 5:00 pm
by Raptor
What I do with Raptor is I go ahead and finish the STC tests to validate my changes. My reasoning for it is:

It takes shorter time to complete more STC games than introducing LTC games.
IMO it is better to finish one test and have enough resolution/confidence that a change actually works.

In my experience a lot of times my changes started off the STC (even midway through the test) really well, only to fade off and turn out to be regressions at the end of the test.

So maybe I am hit with the wrong end of the 'confidence' stick and as a result I feel better once I have enough resolution.

Once I have solid grounds that 'merit' a LTC verification, I do that.

Having said that, to be honest all my changes get tested at STC individually, and I run LTC bunching a few of them together.

This is just my preference/methodolgy, and by no means do I claim it to be optimal.

Re: Testing resolution and combining results

Posted: Thu Jul 28, 2016 7:11 pm
by brtzsnr
I'm assuming you use the SPRT stopping criteria.

For normal tests, since STC and LTC are often correlated, you can increase alpha / beta at LTC. For example:

STC: alpha = 0.05, beta = 0.05
LTC: alpha = 0.10, beta = 0.10

For simplification tests often you can just run at STC because that reverse patch would fail if you were to test it.

However, the best trick I learned from Kai is to use larger betas, e.g. beta = 0.15. This saves a lot of testing on bad patches.

Re: Testing resolution and combining results

Posted: Thu Jul 28, 2016 9:36 pm
by bob
cdani wrote:Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

What do you think about this?
Lousy idea. Why introduce ANOTHER variable into the equation?

Re: Testing resolution and combining results

Posted: Fri Jul 29, 2016 2:33 pm
by cdani
bob wrote:
cdani wrote:Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

What do you think about this?
Lousy idea. Why introduce ANOTHER variable into the equation?
Was just guesswork. Also if I remember well Mark Lefler at some point told something like they can do tests with less resources or something like this, and they where surprised that nobody else had found their system. So I'm trying to imagine it :-)