Some musings about search

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Some musings about search

Post by Rebel »

After a search change I am (was) used to get an impression first by manually going through 40-50 positions before starting a self-play match.

And sometimes that first glimpse looked so good it was reasonable to assume the change to be an improvement. And then after playing 10,000 x 40/15s games it turned out not to be, say an 48% result as an example.

Then stubbornly I couldn't believe it (not a bad attitude in CC) and started to play the match on 40/30s and even 40/1m.

And I never have seen such a change to become an improvement after doubling or four folding the time control.

Is this a global experience?
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Some musings about search

Post by cdani »

In king safety yes, and many times, but my first testing time was like 7 seconds, and the second one at 25 seconds, but always against a gauntlet.

I don't try manually, the impression is just too shallow. If I try manually is to try to see weaknesses.

My testing time control is always like this, 4-7 seconds at first, and 25 for second.
User avatar
Bloodbane
Posts: 154
Joined: Thu Oct 03, 2013 4:17 pm

Re: Some musings about search

Post by Bloodbane »

I've seen some patches which were bad at short TC but good at long TC, but I only accept patches which are good at all time controls I use.
Functional programming combines the flexibility and power of abstract mathematics with the intuitive clarity of abstract mathematics.
https://github.com/mAarnos
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Some musings about search

Post by Ferdy »

Rebel wrote:After a search change I am (was) used to get an impression first by manually going through 40-50 positions before starting a self-play match.

And sometimes that first glimpse looked so good it was reasonable to assume the change to be an improvement. And then after playing 10,000 x 40/15s games it turned out not to be, say an 48% result as an example.

Then stubbornly I couldn't believe it (not a bad attitude in CC) and started to play the match on 40/30s and even 40/1m.

And I never have seen such a change to become an improvement after doubling or four folding the time control.

Is this a global experience?
If a change is interesting (feeling would kick at higher depths <after so many failed tests you also learn something from the program>) I increase testing time,
instead of testing first at short TC I go directly at longer TC say 120s + 100ms inc. Sometimes it succeeded.
Probably the field is generally contested at typical CCRL 40/4 and 40/40, I try to look good at 40/4.

What do you mean by the following?
manually going through 40-50 positions before starting a self-play match
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Some musings about search

Post by Rebel »

Ferdy wrote: Probably the field is generally contested at typical CCRL 40/4 and 40/40, I try to look good at 40/4.
One thing I learned from my last active period (2012-2013) is that playing too few games can be disastrous. In principle playing 1000-2000 games in most cases is good enough, (say) 9 of 10 times. But it is the 10th time that is going to hurt, you get good results while if you had played more you would have known the change isn't an improvement at all. Yet you count the change as an improvement and the damage is done, from version to version.

It's (I think) the reason why in the 80's, 90's and early 00's less progress was made than nowadays, lack of sufficient hardware.

So unless you have access to 200-300 processors (or so) it's impossible to play on CCRL level. I currently have set the limit to 12,000 games.

What do you mean by the following?
manually going through 40-50 positions before starting a self-play match
Going through a (fixed) testset first to get an impression of the change you made. Isn't conclusive but at least it avoids obvious bugs.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Some musings about search

Post by cdani »

Bloodbane wrote:I've seen some patches which were bad at short TC but good at long TC, but I only accept patches which are good at all time controls I use.
Really curious. I always accept those patches, as they are even better at longer time controls, not by much, of course.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Some musings about search

Post by cdani »

Rebel wrote: One thing I learned from my last active period (2012-2013) is that playing too few games can be disastrous. In principle playing 1000-2000 games in most cases is good enough, (say) 9 of 10 times. But it is the 10th time that is going to hurt, you get good results while if you had played more you would have known the change isn't an improvement at all. Yet you count the change as an improvement and the damage is done, from version to version.
Some days ago it happened to me that three computers at something like 2000 games each already played, so 6000 games, where at +10 elo for a patch in every one of the three computers. I was tempted to stop the test and give it as good, but I let it continue. When the total games where at 20000, the patch clearly showed as a regression!
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Some musings about search

Post by Rebel »

cdani wrote:
Rebel wrote: One thing I learned from my last active period (2012-2013) is that playing too few games can be disastrous. In principle playing 1000-2000 games in most cases is good enough, (say) 9 of 10 times. But it is the 10th time that is going to hurt, you get good results while if you had played more you would have known the change isn't an improvement at all. Yet you count the change as an improvement and the damage is done, from version to version.
Some days ago it happened to me that three computers at something like 2000 games each already played, so 6000 games, where at +10 elo for a patch in every one of the three computers. I was tempted to stop the test and give it as good, but I let it continue. When the total games where at 20000, the patch clearly showed as a regression!
Yep.

And these kind of things happened more than I like. In 2012/13 I started to replay (usually 2000 x 40/1m) matches that gave a positive result and often the gain disappeared as snow for the sun.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Some musings about search

Post by Laskos »

Rebel wrote:
Ferdy wrote: Probably the field is generally contested at typical CCRL 40/4 and 40/40, I try to look good at 40/4.
One thing I learned from my last active period (2012-2013) is that playing too few games can be disastrous. In principle playing 1000-2000 games in most cases is good enough, (say) 9 of 10 times. But it is the 10th time that is going to hurt, you get good results while if you had played more you would have known the change isn't an improvement at all. Yet you count the change as an improvement and the damage is done, from version to version.
That's why SPRT framework is important. Or, if too cumbersome, keep 3 standard deviations at the stop of your choosing. Not 2, at least 3.
It's (I think) the reason why in the 80's, 90's and early 00's less progress was made than nowadays, lack of sufficient hardware.
People back then tested either in some ridiculously small amount of pretty long games, or on testsuites, which are misleading. The necessary hardware tools were available back then too, the clock time had the same 15ms granularity, but the testing proceeded amateurishly, and little attention was paid to ultra-fast games.
So unless you have access to 200-300 processors (or so) it's impossible to play on CCRL level. I currently have set the limit to 12,000 games.

What do you mean by the following?
manually going through 40-50 positions before starting a self-play match
Going through a (fixed) testset first to get an impression of the change you made. Isn't conclusive but at least it avoids obvious bugs.
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Some musings about search

Post by jdart »

I used test suites for tuning for years. It is probably better than guesswork. But it is not a reliable method in general.

That said, I have actually been looking at some test results very recently to at least select some interesting modifications for further testing. But I don't regard the test results as conclusive, just a possible indicator.


--Jon