## Testing resolution and combining results

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
cdani
Posts: 2166
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Contact:

### Testing resolution and combining results

Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

Dann Corbit
Posts: 11238
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

### Re: Testing resolution and combining results

cdani wrote:Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

The result is shaky. Otherwise both tests would have confirmed it. I would say that if it clarifies or simplifies the code then go ahead and make the change. If it complicates the code then run both tests to a decision.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

cdani
Posts: 2166
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Contact:

### Re: Testing resolution and combining results

Dann Corbit wrote: The result is shaky. Otherwise both tests would have confirmed it. I would say that if it clarifies or simplifies the code then go ahead and make the change. If it complicates the code then run both tests to a decision.
I guess that can be proven to be matematically safe to use this way of testing, to save a lot time testing. Sure some of the readers here have the required knowledge to validate this.

matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 2:48 am
Location: London, UK
Contact:

### Re: Testing resolution and combining results

It's fine as long as you do the exact same thing every time, and don't make the decision to do the second test or not (or when to cut off either test) depending on how the test is going. If you do, you can introduce biases.

The mathematical model relies on the fact that no decision to stop/continue has been made based on results.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.

Michel
Posts: 2102
Joined: Sun Sep 28, 2008 11:50 pm

### Re: Testing resolution and combining results

cdani wrote:Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

What for sure you cannot do is decide on the testing strategy during the test. This creates horrible bias. Some people claim this is not true if you are using a Bayesian framework, but believe me, it is.

If you design a testing strategy then the simplest method to evaluate its merrits is by simulation.

For small elo patches, you will find it very hard to do better than the traditional STC/LTC SPRT's.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

cdani
Posts: 2166
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Contact:

### Re: Testing resolution and combining results

Thanks all! It's clear now. Probably some testing strategies can be made that relies on this combining STC and LTC games.

Raptor
Posts: 29
Joined: Mon Jan 28, 2013 9:18 am

### Re: Testing resolution and combining results

What I do with Raptor is I go ahead and finish the STC tests to validate my changes. My reasoning for it is:

It takes shorter time to complete more STC games than introducing LTC games.
IMO it is better to finish one test and have enough resolution/confidence that a change actually works.

In my experience a lot of times my changes started off the STC (even midway through the test) really well, only to fade off and turn out to be regressions at the end of the test.

So maybe I am hit with the wrong end of the 'confidence' stick and as a result I feel better once I have enough resolution.

Once I have solid grounds that 'merit' a LTC verification, I do that.

Having said that, to be honest all my changes get tested at STC individually, and I run LTC bunching a few of them together.

This is just my preference/methodolgy, and by no means do I claim it to be optimal.

brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 3:02 pm
Contact:

### Re: Testing resolution and combining results

I'm assuming you use the SPRT stopping criteria.

For normal tests, since STC and LTC are often correlated, you can increase alpha / beta at LTC. For example:

STC: alpha = 0.05, beta = 0.05
LTC: alpha = 0.10, beta = 0.10

For simplification tests often you can just run at STC because that reverse patch would fail if you were to test it.

However, the best trick I learned from Kai is to use larger betas, e.g. beta = 0.15. This saves a lot of testing on bad patches.

bob
Posts: 20916
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

### Re: Testing resolution and combining results

cdani wrote:Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

Lousy idea. Why introduce ANOTHER variable into the equation?

cdani
Posts: 2166
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Contact:

### Re: Testing resolution and combining results

bob wrote:
cdani wrote:Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.