This is not quite "urban legend". I ran such a test years ago, and what I saw, was an "overstated gain" when using self-play. If I would see +20 in self-play, it was ALWAYS much less than that against other opponents.Don wrote:The test I just did was without pondering but we do in fact test against reference opponents quit a bit although this was a quick and dirty test.AlvaroBegue wrote:I would be careful when using self-tests for anything having to do with time control and pondering. The number of ponder hits is likely much higher in self-tests than playing against a different opponent, and that will distort the results. Even with pondering off, the amount of useful information left in the transposition tables is probably also higher than it should be.
This also applies to Bob's test, of course.
Do you guys test against some reference opponents as well?
I'm going to go into a bit of a rant here so please forgive me.
For almost 30 years of computer chess I have been getting the warning from people about avoiding self-testing and although I consider it a well-meaning warning nobody has once offered any evidence other than their own superstition. I'm very close to putting it in the category of "myth" or "Conventional wisdom" which by definition is not "exceptional" - it is usually untested and believed based on blind credulity or gut instinct which is notoriously unreliable.
Nevertheless, due to my own superstitions I have over 30 years done a lot of mixed tests simply because I know that that program version are not 100 percent transitive. And yet I have never seen intransitivity that cannot be explained by error margins.
I'm always looking for an edge, so if it were a factor I would drop all self testing faster than a ton of bricks. I have 30 years of considering this issue and running tests.
What I DO see from time to time is that self testing can distort the value of a change, but even then it's not by much. That happens often enough that I consider it a real thing. But that is actually a good thing since the majority of the time I only want to know if a change is beneficial. I'm happy to have a magnifying glass.
I can understand why you bring it up because with pondering it is likely to be more than just superstition or speculation. I would expect a much bigger benefit with self-play for this particular thing.
Time control in general is a messy thing even without pondering - it is important to utilize your time to the best advantage without actually knowing how much time you actually have since you do not know how long the game will last. There are several very important principles to be considered, the important of "front loading" your time (spend a lot more time on early moves), trying hard to finish an iteration that you started, and others.
So it's probably natural to imagine that there could be a strong self-test interaction (even without pondering) but I imagine this about almost every change we try. As it turns out, the way we do time control tests has been mostly against foreign programs due to our own superstitions in this regard, which always turn out to be unfounded.
Here is something that I happen to know about Rybka. Remember Rybka 3 which took the world by storm in a big way? ALL their testing was done with "incestuous" self-testing, primary because they had no worthy opponents. It did not seem to have much of an impact on their progress.
I will say this. I know that you have some background in computer Go and I also flirted a bit with computer go although not to the extent that you did. Computer go may be more prone to the effects of intransitivity because even combined with Monte Carlo Tree Search they are fairly pattern intensive which means they could be more susceptible to other programs which address their weaknesses and exploit them without necessarily being superior in any other sense. But even that is just a theory on my part, I don't know to what extent that is true.
Just to be clear, I am not saying there is no transitivity in computer chess, I am quite sure there is some. I just don't think it's much of a practical concern 99% of the time and it's CLEARLY over-hyped and not for any rational reason.
To succeed at computer chess you have to sort out the nonsense from the reality and do it with a fair amount of objectivity, otherwise it's like stepping on the brakes. If you are too "anal retentive" you become a cripple. If you are not careful enough you could be spinning your tires so you have to find the balance. You are a good engineer yourself so you know what I am talking about.
I can actually run this test again, and will do so. But it will take a bit of time as I don't have any cluster testing set up to run crafty vs crafty. But I can do something like Crafty-23.4 vs Crafty-23.5 and then each against the normal gauntlet. Will report back...