Page 1 of 3

Some musings about search

Posted: Fri Aug 14, 2015 5:40 pm
by Rebel
After a search change I am (was) used to get an impression first by manually going through 40-50 positions before starting a self-play match.

And sometimes that first glimpse looked so good it was reasonable to assume the change to be an improvement. And then after playing 10,000 x 40/15s games it turned out not to be, say an 48% result as an example.

Then stubbornly I couldn't believe it (not a bad attitude in CC) and started to play the match on 40/30s and even 40/1m.

And I never have seen such a change to become an improvement after doubling or four folding the time control.

Is this a global experience?

Re: Some musings about search

Posted: Fri Aug 14, 2015 7:20 pm
by cdani
In king safety yes, and many times, but my first testing time was like 7 seconds, and the second one at 25 seconds, but always against a gauntlet.

I don't try manually, the impression is just too shallow. If I try manually is to try to see weaknesses.

My testing time control is always like this, 4-7 seconds at first, and 25 for second.

Re: Some musings about search

Posted: Fri Aug 14, 2015 8:09 pm
by Bloodbane
I've seen some patches which were bad at short TC but good at long TC, but I only accept patches which are good at all time controls I use.

Re: Some musings about search

Posted: Fri Aug 14, 2015 9:02 pm
by Ferdy
Rebel wrote:After a search change I am (was) used to get an impression first by manually going through 40-50 positions before starting a self-play match.

And sometimes that first glimpse looked so good it was reasonable to assume the change to be an improvement. And then after playing 10,000 x 40/15s games it turned out not to be, say an 48% result as an example.

Then stubbornly I couldn't believe it (not a bad attitude in CC) and started to play the match on 40/30s and even 40/1m.

And I never have seen such a change to become an improvement after doubling or four folding the time control.

Is this a global experience?
If a change is interesting (feeling would kick at higher depths <after so many failed tests you also learn something from the program>) I increase testing time,
instead of testing first at short TC I go directly at longer TC say 120s + 100ms inc. Sometimes it succeeded.
Probably the field is generally contested at typical CCRL 40/4 and 40/40, I try to look good at 40/4.

What do you mean by the following?
manually going through 40-50 positions before starting a self-play match

Re: Some musings about search

Posted: Sat Aug 15, 2015 12:03 am
by Rebel
Ferdy wrote: Probably the field is generally contested at typical CCRL 40/4 and 40/40, I try to look good at 40/4.
One thing I learned from my last active period (2012-2013) is that playing too few games can be disastrous. In principle playing 1000-2000 games in most cases is good enough, (say) 9 of 10 times. But it is the 10th time that is going to hurt, you get good results while if you had played more you would have known the change isn't an improvement at all. Yet you count the change as an improvement and the damage is done, from version to version.

It's (I think) the reason why in the 80's, 90's and early 00's less progress was made than nowadays, lack of sufficient hardware.

So unless you have access to 200-300 processors (or so) it's impossible to play on CCRL level. I currently have set the limit to 12,000 games.

What do you mean by the following?
manually going through 40-50 positions before starting a self-play match
Going through a (fixed) testset first to get an impression of the change you made. Isn't conclusive but at least it avoids obvious bugs.

Re: Some musings about search

Posted: Sat Aug 15, 2015 12:16 am
by cdani
Bloodbane wrote:I've seen some patches which were bad at short TC but good at long TC, but I only accept patches which are good at all time controls I use.
Really curious. I always accept those patches, as they are even better at longer time controls, not by much, of course.

Re: Some musings about search

Posted: Sat Aug 15, 2015 12:21 am
by cdani
Rebel wrote: One thing I learned from my last active period (2012-2013) is that playing too few games can be disastrous. In principle playing 1000-2000 games in most cases is good enough, (say) 9 of 10 times. But it is the 10th time that is going to hurt, you get good results while if you had played more you would have known the change isn't an improvement at all. Yet you count the change as an improvement and the damage is done, from version to version.
Some days ago it happened to me that three computers at something like 2000 games each already played, so 6000 games, where at +10 elo for a patch in every one of the three computers. I was tempted to stop the test and give it as good, but I let it continue. When the total games where at 20000, the patch clearly showed as a regression!

Re: Some musings about search

Posted: Sat Aug 15, 2015 12:52 am
by Rebel
cdani wrote:
Rebel wrote: One thing I learned from my last active period (2012-2013) is that playing too few games can be disastrous. In principle playing 1000-2000 games in most cases is good enough, (say) 9 of 10 times. But it is the 10th time that is going to hurt, you get good results while if you had played more you would have known the change isn't an improvement at all. Yet you count the change as an improvement and the damage is done, from version to version.
Some days ago it happened to me that three computers at something like 2000 games each already played, so 6000 games, where at +10 elo for a patch in every one of the three computers. I was tempted to stop the test and give it as good, but I let it continue. When the total games where at 20000, the patch clearly showed as a regression!
Yep.

And these kind of things happened more than I like. In 2012/13 I started to replay (usually 2000 x 40/1m) matches that gave a positive result and often the gain disappeared as snow for the sun.

Re: Some musings about search

Posted: Sat Aug 15, 2015 3:09 am
by Laskos
Rebel wrote:
Ferdy wrote: Probably the field is generally contested at typical CCRL 40/4 and 40/40, I try to look good at 40/4.
One thing I learned from my last active period (2012-2013) is that playing too few games can be disastrous. In principle playing 1000-2000 games in most cases is good enough, (say) 9 of 10 times. But it is the 10th time that is going to hurt, you get good results while if you had played more you would have known the change isn't an improvement at all. Yet you count the change as an improvement and the damage is done, from version to version.
That's why SPRT framework is important. Or, if too cumbersome, keep 3 standard deviations at the stop of your choosing. Not 2, at least 3.
It's (I think) the reason why in the 80's, 90's and early 00's less progress was made than nowadays, lack of sufficient hardware.
People back then tested either in some ridiculously small amount of pretty long games, or on testsuites, which are misleading. The necessary hardware tools were available back then too, the clock time had the same 15ms granularity, but the testing proceeded amateurishly, and little attention was paid to ultra-fast games.
So unless you have access to 200-300 processors (or so) it's impossible to play on CCRL level. I currently have set the limit to 12,000 games.

What do you mean by the following?
manually going through 40-50 positions before starting a self-play match
Going through a (fixed) testset first to get an impression of the change you made. Isn't conclusive but at least it avoids obvious bugs.

Re: Some musings about search

Posted: Sat Aug 15, 2015 5:35 am
by jdart
I used test suites for tuning for years. It is probably better than guesswork. But it is not a reliable method in general.

That said, I have actually been looking at some test results very recently to at least select some interesting modifications for further testing. But I don't regard the test results as conclusive, just a possible indicator.


--Jon