Testing resolution and combining results

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Testing resolution and combining results

Post by cdani »

Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

What do you think about this?
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Testing resolution and combining results

Post by Dann Corbit »

cdani wrote:Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

What do you think about this?
The result is shaky. Otherwise both tests would have confirmed it. I would say that if it clarifies or simplifies the code then go ahead and make the change. If it complicates the code then run both tests to a decision.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Testing resolution and combining results

Post by cdani »

Dann Corbit wrote: The result is shaky. Otherwise both tests would have confirmed it. I would say that if it clarifies or simplifies the code then go ahead and make the change. If it complicates the code then run both tests to a decision.
I guess that can be proven to be matematically safe to use this way of testing, to save a lot time testing. Sure some of the readers here have the required knowledge to validate this.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Testing resolution and combining results

Post by matthewlai »

It's fine as long as you do the exact same thing every time, and don't make the decision to do the second test or not (or when to cut off either test) depending on how the test is going. If you do, you can introduce biases.

The mathematical model relies on the fact that no decision to stop/continue has been made based on results.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: Testing resolution and combining results

Post by Michel »

cdani wrote:Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

What do you think about this?
What for sure you cannot do is decide on the testing strategy during the test. This creates horrible bias. Some people claim this is not true if you are using a Bayesian framework, but believe me, it is.

If you design a testing strategy then the simplest method to evaluate its merrits is by simulation.

For small elo patches, you will find it very hard to do better than the traditional STC/LTC SPRT's.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Testing resolution and combining results

Post by cdani »

Thanks all! It's clear now. Probably some testing strategies can be made that relies on this combining STC and LTC games.
User avatar
Raptor
Posts: 29
Joined: Mon Jan 28, 2013 10:18 am

Re: Testing resolution and combining results

Post by Raptor »

What I do with Raptor is I go ahead and finish the STC tests to validate my changes. My reasoning for it is:

It takes shorter time to complete more STC games than introducing LTC games.
IMO it is better to finish one test and have enough resolution/confidence that a change actually works.

In my experience a lot of times my changes started off the STC (even midway through the test) really well, only to fade off and turn out to be regressions at the end of the test.

So maybe I am hit with the wrong end of the 'confidence' stick and as a result I feel better once I have enough resolution.

Once I have solid grounds that 'merit' a LTC verification, I do that.

Having said that, to be honest all my changes get tested at STC individually, and I run LTC bunching a few of them together.

This is just my preference/methodolgy, and by no means do I claim it to be optimal.
brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 4:02 pm

Re: Testing resolution and combining results

Post by brtzsnr »

I'm assuming you use the SPRT stopping criteria.

For normal tests, since STC and LTC are often correlated, you can increase alpha / beta at LTC. For example:

STC: alpha = 0.05, beta = 0.05
LTC: alpha = 0.10, beta = 0.10

For simplification tests often you can just run at STC because that reverse patch would fail if you were to test it.

However, the best trick I learned from Kai is to use larger betas, e.g. beta = 0.15. This saves a lot of testing on bad patches.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Testing resolution and combining results

Post by bob »

cdani wrote:Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

What do you think about this?
Lousy idea. Why introduce ANOTHER variable into the equation?
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Testing resolution and combining results

Post by cdani »

bob wrote:
cdani wrote:Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

What do you think about this?
Lousy idea. Why introduce ANOTHER variable into the equation?
Was just guesswork. Also if I remember well Mark Lefler at some point told something like they can do tests with less resources or something like this, and they where surprised that nobody else had found their system. So I'm trying to imagine it :-)