Testing resolution and combining results

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
User avatar
cdani
Posts: 2104
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Contact:

Testing resolution and combining results

Post by cdani » Thu Jul 28, 2016 6:27 am

Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

What do you think about this?

Dann Corbit
Posts: 9986
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Testing resolution and combining results

Post by Dann Corbit » Thu Jul 28, 2016 6:30 am

cdani wrote:Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

What do you think about this?
The result is shaky. Otherwise both tests would have confirmed it. I would say that if it clarifies or simplifies the code then go ahead and make the change. If it complicates the code then run both tests to a decision.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

User avatar
cdani
Posts: 2104
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Contact:

Re: Testing resolution and combining results

Post by cdani » Thu Jul 28, 2016 6:35 am

Dann Corbit wrote: The result is shaky. Otherwise both tests would have confirmed it. I would say that if it clarifies or simplifies the code then go ahead and make the change. If it complicates the code then run both tests to a decision.
I guess that can be proven to be matematically safe to use this way of testing, to save a lot time testing. Sure some of the readers here have the required knowledge to validate this.

matthewlai
Posts: 791
Joined: Sun Aug 03, 2014 2:48 am
Location: London, UK
Contact:

Re: Testing resolution and combining results

Post by matthewlai » Thu Jul 28, 2016 8:53 am

It's fine as long as you do the exact same thing every time, and don't make the decision to do the second test or not (or when to cut off either test) depending on how the test is going. If you do, you can introduce biases.

The mathematical model relies on the fact that no decision to stop/continue has been made based on results.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.

Michel
Posts: 2040
Joined: Sun Sep 28, 2008 11:50 pm

Re: Testing resolution and combining results

Post by Michel » Thu Jul 28, 2016 9:01 am

cdani wrote:Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

What do you think about this?
What for sure you cannot do is decide on the testing strategy during the test. This creates horrible bias. Some people claim this is not true if you are using a Bayesian framework, but believe me, it is.

If you design a testing strategy then the simplest method to evaluate its merrits is by simulation.

For small elo patches, you will find it very hard to do better than the traditional STC/LTC SPRT's.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

User avatar
cdani
Posts: 2104
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Contact:

Re: Testing resolution and combining results

Post by cdani » Thu Jul 28, 2016 12:13 pm

Thanks all! It's clear now. Probably some testing strategies can be made that relies on this combining STC and LTC games.

User avatar
Raptor
Posts: 29
Joined: Mon Jan 28, 2013 9:18 am

Re: Testing resolution and combining results

Post by Raptor » Thu Jul 28, 2016 3:00 pm

What I do with Raptor is I go ahead and finish the STC tests to validate my changes. My reasoning for it is:

It takes shorter time to complete more STC games than introducing LTC games.
IMO it is better to finish one test and have enough resolution/confidence that a change actually works.

In my experience a lot of times my changes started off the STC (even midway through the test) really well, only to fade off and turn out to be regressions at the end of the test.

So maybe I am hit with the wrong end of the 'confidence' stick and as a result I feel better once I have enough resolution.

Once I have solid grounds that 'merit' a LTC verification, I do that.

Having said that, to be honest all my changes get tested at STC individually, and I run LTC bunching a few of them together.

This is just my preference/methodolgy, and by no means do I claim it to be optimal.

brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 3:02 pm
Contact:

Re: Testing resolution and combining results

Post by brtzsnr » Thu Jul 28, 2016 5:11 pm

I'm assuming you use the SPRT stopping criteria.

For normal tests, since STC and LTC are often correlated, you can increase alpha / beta at LTC. For example:

STC: alpha = 0.05, beta = 0.05
LTC: alpha = 0.10, beta = 0.10

For simplification tests often you can just run at STC because that reverse patch would fail if you were to test it.

However, the best trick I learned from Kai is to use larger betas, e.g. beta = 0.15. This saves a lot of testing on bad patches.

bob
Posts: 20475
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: Testing resolution and combining results

Post by bob » Thu Jul 28, 2016 7:36 pm

cdani wrote:Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

What do you think about this?
Lousy idea. Why introduce ANOTHER variable into the equation?

User avatar
cdani
Posts: 2104
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Contact:

Re: Testing resolution and combining results

Post by cdani » Fri Jul 29, 2016 12:33 pm

bob wrote:
cdani wrote:Suppose you are doing a test at your typical short time control that start to show to be good but it lacks enough resolution. Instead of finishing this test you start the LTC one. You do the same with it. It seems to be good but you stop it before having enough resolution. But you have stoped the two tests in a point that you can combine the two and have enough resolution, and you see that the test is good.

What do you think about this?
Lousy idea. Why introduce ANOTHER variable into the equation?
Was just guesswork. Also if I remember well Mark Lefler at some point told something like they can do tests with less resources or something like this, and they where surprised that nobody else had found their system. So I'm trying to imagine it :-)

Post Reply