obvious/easy move

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: obvious/easy move - final results

Post by bob »

Don wrote:
AlvaroBegue wrote:
bob wrote:OK, this is one of those GOOD exampls of why you should not draw conclusions after even 8500 games. Here is the 30K game final results:

Code: Select all


   2 Crafty-23.6-1        2652    4    4 30000   62%  2557   25%
   3 Crafty-23.6R02-1     2651    4    4 30000   62%  2557   25%
In short, no measurable difference with ponder = off.
That's exactly what I suspected. This type of thing doesn't happen very often, and when it does it just saves a bit of time, so the improvement would be similar to a very minor speed up (say, 1%), which I don't expect to be measurable in ELO.

Thanks for running this!
The ELO gain from doing this in Komodo is surprisingly large, so large we don't need thousands of games to see it clearly. It is on the order of 25 ELO.

It's not done in any way described here.
25 elo suggests a 50% savings in time. Seems REALLY difficult to imagine this producing that kind of time savings, letting you average about 1.5x as much time as without the algorithm.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: obvious/easy move - final results

Post by Don »

bob wrote:
Don wrote:
AlvaroBegue wrote:
bob wrote:OK, this is one of those GOOD exampls of why you should not draw conclusions after even 8500 games. Here is the 30K game final results:

Code: Select all


   2 Crafty-23.6-1        2652    4    4 30000   62%  2557   25%
   3 Crafty-23.6R02-1     2651    4    4 30000   62%  2557   25%
In short, no measurable difference with ponder = off.
That's exactly what I suspected. This type of thing doesn't happen very often, and when it does it just saves a bit of time, so the improvement would be similar to a very minor speed up (say, 1%), which I don't expect to be measurable in ELO.

Thanks for running this!
The ELO gain from doing this in Komodo is surprisingly large, so large we don't need thousands of games to see it clearly. It is on the order of 25 ELO.

It's not done in any way described here.
25 elo suggests a 50% savings in time. Seems REALLY difficult to imagine this producing that kind of time savings, letting you average about 1.5x as much time as without the algorithm.
Well then you lack imagination. A big part of the reason Komodo is so strong is that we have a number of exceptional innovations such that go beyond the usual and ordinary.

The easy move in Komodo is very powerful and saves a lot of time. When it was implemented we had to make our normal time control significantly more aggressive to compensate.

The basic concept is that EVERY move is an easy move (if the program doesn't change it's mind) and it's just a matter of degrees. It can cause a search to return between 5% and 100% of the time it would normally take depending on the degree of easiness. The nature is such that most moves will abort at least a little earlier than normal when the program does not change it's mind during the iteration. This is so much more sensible that a single narrowly defined class of "easy move" which rarely kicks in.

I did say "on the order of" which is a signal that I did not give you a precise number. I think in ponder mode it probably is worth 25 ELO but I did run a test with 12,000 games to estimate this again and it comes out to be 17 ELO when running 12 second games. I turned off the easy move stuff of course in one version in order to run this test.

The funny thing about this and your statement is that when I set out to do a superior "easy move" heuristic my goal was to get an effective 50% saving in time. As a human I spend 80% of my time on 20% of the moves. I had this concept that most of the moves could be searched in half the time without having too much of an impact on the strength. As it turns out that was pretty optimistic and I don't think I obtained 50% equivalent but I did get a lot. Or maybe I did get a 50% effective speedup but it's not free since aborting early entails some risk. I was actually pretty disappointed that it wasn't more because this seems like really "low hanging fruit" an obvious way to get an effective and free non-trivial speedup. Just stop being stupid about spending a huge amount of time on an obvious move! Easier said than done but it CAN be done.

There are two side effects to this algorithm which people have reported on in private mail and even on the forums:

The first side effect is that once in a while someone will report that "Komodo is overly aggressive about time use" and usually they will explain that it happens only on occasion where Komodo will have very little time left and the other program will have several minutes left. That would be because the game did not contain many moves that were very easy. We have to set our time control more aggressive that usual to compensate for the easy move algorithm, otherwise it would finish the games with too much time on average. The point of the heuristic is to spend MORE time on important moves, not to make the computer race through the game and finish with a huge amount of time left on the clock. The normal time control mechanism will gradually distribute this time to other moves but it will not be aggressive enough to make a very big difference.

The second side effect is that people will report that on occasion Komodo will make a blunder and this is partly responsible for it's reputation as being a program that is not very tactical. The truth is that Komodo is very strong tactically, at least much better than it's reputation. But this is where our easy move goes wrong. Nevertheless we have carefully tested this and the concept is that you have to take a risk to get a real gain. The average engineer is stupid about things like this, they try to "fix" every problem by removing beneficial risk. You play a stupid move in some game that happens to be important and it affects you so deeply that you vow that it will never happen again so you weaken the program a little to prevent it. Knee jerk engineering.

Incidentally, I don't know where you get 50% time saving equivalent, that does not match Komodo's reality. Here is a test in progress where I pit a 6 + 0.06 search against a 9 + 0.09 search:

Code: Select all

Rank    ELO     +/-    Games    Score  Player
---- ------- ------ -------- --------  ----------------------------
   1  3096.8   10.1     2434   63.640  9 
   2  3000.0   10.1     2434   36.360  6 

w/l/d: 717 519 1198    49.22 percent draws


      TIME       RATIO    log(r)     NODES    log(r)  ave DEPTH    GAMES   PLAYER
 ---------  ----------  --------  --------  --------  ---------  -------   -
    0.1584       1.000     0.000     0.141     0.000    11.4669     2434   6
    0.2328       1.470     0.385     0.212     0.406    12.2415     2434   9
Notice that we get the better part of a full additional ply with a 50% increase in time to go from less than 11.5 ply to almost 12 1/4 ply.

So the "effective" extra time we get is a lot less than 50% but it is substantial.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: obvious/easy move - final results

Post by AlvaroBegue »

I would be careful when using self-tests for anything having to do with time control and pondering. The number of ponder hits is likely much higher in self-tests than playing against a different opponent, and that will distort the results. Even with pondering off, the amount of useful information left in the transposition tables is probably also higher than it should be.

This also applies to Bob's test, of course.

Do you guys test against some reference opponents as well?
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: obvious/easy move - final results

Post by Don »

AlvaroBegue wrote:I would be careful when using self-tests for anything having to do with time control and pondering. The number of ponder hits is likely much higher in self-tests than playing against a different opponent, and that will distort the results. Even with pondering off, the amount of useful information left in the transposition tables is probably also higher than it should be.

This also applies to Bob's test, of course.

Do you guys test against some reference opponents as well?
The test I just did was without pondering but we do in fact test against reference opponents quit a bit although this was a quick and dirty test.

I'm going to go into a bit of a rant here so please forgive me.

For almost 30 years of computer chess I have been getting the warning from people about avoiding self-testing and although I consider it a well-meaning warning nobody has once offered any evidence other than their own superstition. I'm very close to putting it in the category of "myth" or "Conventional wisdom" which by definition is not "exceptional" - it is usually untested and believed based on blind credulity or gut instinct which is notoriously unreliable.

Nevertheless, due to my own superstitions I have over 30 years done a lot of mixed tests simply because I know that that program version are not 100 percent transitive. And yet I have never seen intransitivity that cannot be explained by error margins.

I'm always looking for an edge, so if it were a factor I would drop all self testing faster than a ton of bricks. I have 30 years of considering this issue and running tests.

What I DO see from time to time is that self testing can distort the value of a change, but even then it's not by much. That happens often enough that I consider it a real thing. But that is actually a good thing since the majority of the time I only want to know if a change is beneficial. I'm happy to have a magnifying glass.

I can understand why you bring it up because with pondering it is likely to be more than just superstition or speculation. I would expect a much bigger benefit with self-play for this particular thing.

Time control in general is a messy thing even without pondering - it is important to utilize your time to the best advantage without actually knowing how much time you actually have since you do not know how long the game will last. There are several very important principles to be considered, the important of "front loading" your time (spend a lot more time on early moves), trying hard to finish an iteration that you started, and others.

So it's probably natural to imagine that there could be a strong self-test interaction (even without pondering) but I imagine this about almost every change we try. As it turns out, the way we do time control tests has been mostly against foreign programs due to our own superstitions in this regard, which always turn out to be unfounded.

Here is something that I happen to know about Rybka. Remember Rybka 3 which took the world by storm in a big way? ALL their testing was done with "incestuous" self-testing, primary because they had no worthy opponents. It did not seem to have much of an impact on their progress.

I will say this. I know that you have some background in computer Go and I also flirted a bit with computer go although not to the extent that you did. Computer go may be more prone to the effects of intransitivity because even combined with Monte Carlo Tree Search they are fairly pattern intensive which means they could be more susceptible to other programs which address their weaknesses and exploit them without necessarily being superior in any other sense. But even that is just a theory on my part, I don't know to what extent that is true.

Just to be clear, I am not saying there is no transitivity in computer chess, I am quite sure there is some. I just don't think it's much of a practical concern 99% of the time and it's CLEARLY over-hyped and not for any rational reason.

To succeed at computer chess you have to sort out the nonsense from the reality and do it with a fair amount of objectivity, otherwise it's like stepping on the brakes. If you are too "anal retentive" you become a cripple. If you are not careful enough you could be spinning your tires so you have to find the balance. You are a good engineer yourself so you know what I am talking about.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
Rebel
Posts: 6995
Joined: Thu Aug 18, 2011 12:04 pm

Re: obvious/easy move - final results

Post by Rebel »

Don wrote: I'm going to go into a bit of a rant here so please forgive me.
:lol:
For almost 30 years of computer chess I have been getting the warning from people about avoiding self-testing and although I consider it a well-meaning warning nobody has once offered any evidence other than their own superstition. I'm very close to putting it in the category of "myth" or "Conventional wisdom" which by definition is not "exceptional" - it is usually untested and believed based on blind credulity or gut instinct which is notoriously unreliable.
Excellent rant.
There is something that I happen to know about Rybka. Remember Rybka 3 which took the world by storm in a big way? ALL their testing was done with "incestuous" self-testing, primary because they had no worthy opponents. It did not seem to have much of an impact on their progress.
By the lack of competitive opponents incest testing more or less was forced.
Just to be clear, I am not saying there is no transitivity in computer chess, I am quite sure there is some. I just don't think it's much of a practical concern 99% of the time and it's CLEARLY over-hyped and not for any rational reason.
Please rant more :wink:
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: obvious/easy move - final results

Post by Don »

Rebel wrote:
Don wrote: I'm going to go into a bit of a rant here so please forgive me.
:lol:
For almost 30 years of computer chess I have been getting the warning from people about avoiding self-testing and although I consider it a well-meaning warning nobody has once offered any evidence other than their own superstition. I'm very close to putting it in the category of "myth" or "Conventional wisdom" which by definition is not "exceptional" - it is usually untested and believed based on blind credulity or gut instinct which is notoriously unreliable.
Excellent rant.
There is something that I happen to know about Rybka. Remember Rybka 3 which took the world by storm in a big way? ALL their testing was done with "incestuous" self-testing, primary because they had no worthy opponents. It did not seem to have much of an impact on their progress.
By the lack of competitive opponents incest testing more or less was forced.
Actually, you can get perfectly good results against any opponent within a couple of hundred ELO and you can also give time odds, so this wasn't forced on them. If it was a real problem instead of an imaginary problem they would have be forced the OTHER way, to only test against foreign opponents despite the obvious inefficiencies.
Just to be clear, I am not saying there is no transitivity in computer chess, I am quite sure there is some. I just don't think it's much of a practical concern 99% of the time and it's CLEARLY over-hyped and not for any rational reason.
Please rant more :wink:
Ok :-)

Seriously, self-testing works beautifully and we supplement it with foreign testing on occasion but in general this is a sanity test. Of course we do want to know how we stand against other programs so I don't consider it too much of a waste when done judiciously. So far it has never surprised us.

The biggest argument in favor of self-testing is that self-testing is far more efficient use of CPU resources. Our bottlneck and probably everyone else's is the testing procedure itself. By testing incorrectly we effectively throw out half of our hardware! Would you trade in your quad for a 2 core machine? I don't think you would willing do such a thing.

Imagine that you want to know if program B is an improvement over program A. If you forbid self-testing you have to run program A against 1 or more foreign opponents, then you have to run program B against those same opponents and compare the results. You have to run twice as many games to do this because you are not just testing your own program, but somebody else's program too. That stinks! You also have 2 sets of error margins to deal with. The rating program A achieved has error and so does program B. Comparing them is indirect. If you want to see which stick is the longest you should hold them side by side to get a more accurate answer. You could also use a tape measure on each separately, but the result will be less reliable, especially if the difference is small.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
JBNielsen
Posts: 267
Joined: Thu Jul 07, 2011 10:31 pm
Location: Denmark

Re: obvious/easy move - final results

Post by JBNielsen »

Don wrote:

The basic concept is that EVERY move is an easy move (if the program doesn't change it's mind) and it's just a matter of degrees. It can cause a search to return between 5% and 100% of the time it would normally take depending on the degree of easiness. The nature is such that most moves will abort at least a little earlier than normal when the program does not change it's mind during the iteration. This is so much more sensible that a single narrowly defined class of "easy move" which rarely kicks in.
Don, I don't know if you have read my earlier post. With my formula I wanted to spend less time on easy moves and more time on normal/hard moves. Also if there are several good moves, and also if it is a minor gap.

Here is my formula again:

Code: Select all

There is always a risk of stopping a search early, but I believe the time is better spent in positions where many moves are candidates for the best move. 

Humans have the same strategy, and I think it is wise. 

I have not made this in Dabbaba yet, but your results makes me want to implement something like this: 

Don't start a new iteration if more than x% of the time for this move is used. 

x = 35 - 8 * z + 2 * y 

y = the number of moves (max 3) that are z centipawns better than the rest. 
z is the biggest gap among the 4 best moves. z is max 3.00. 

x is always 45% if the biggest gap is less than 0.20 or we have a matescore. 

x = 13% if 1 move recaptures a queen (z=3.0) 

x = 15% if 2 moves recaptures a bishop (z=3.0) 

x = 31% if 2 moves recaptures a pawn (z=1.0) 

x = 33% if 1 move is z=0.50 better than the rest 

x = 39% if 3 moves are z=0.25 better than the rest. 

Notice I try to take advantage of small gaps too.... 

This is just a quick shot that must be refined - but the point is, that there may be some elo in a better time disposition. 
A similar calculation should be done to stop earlier in the middle of an iteration.


PS.
Right now I am running a selftest.
After 180 games the version leading with 53% has this rule added:
Don't start a new iteration if more than 30% of the time for the move is used.

Has anyone else tried to add this simple rule?
An easy way to gain 20 elo if a longer test confirms the 53%.

I have not made anything about easy moves yet, but I will soon do that as well as a better ordering at the root.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: obvious/easy move - final results

Post by Don »

I think anything you can do in this regard is low hanging fruit for program improvement.

What do you mean when you say:

y = the number of moves (max 3) that are z centipawns better than the rest.

How do you know that a move is better than the rest without raising beta?



JBNielsen wrote:
Don wrote:

The basic concept is that EVERY move is an easy move (if the program doesn't change it's mind) and it's just a matter of degrees. It can cause a search to return between 5% and 100% of the time it would normally take depending on the degree of easiness. The nature is such that most moves will abort at least a little earlier than normal when the program does not change it's mind during the iteration. This is so much more sensible that a single narrowly defined class of "easy move" which rarely kicks in.
Don, I don't know if you have read my earlier post. With my formula I wanted to spend less time on easy moves and more time on normal/hard moves. Also if there are several good moves, and also if it is a minor gap.

Here is my formula again:

Code: Select all

There is always a risk of stopping a search early, but I believe the time is better spent in positions where many moves are candidates for the best move. 

Humans have the same strategy, and I think it is wise. 

I have not made this in Dabbaba yet, but your results makes me want to implement something like this: 

Don't start a new iteration if more than x% of the time for this move is used. 

x = 35 - 8 * z + 2 * y 

y = the number of moves (max 3) that are z centipawns better than the rest. 
z is the biggest gap among the 4 best moves. z is max 3.00. 

x is always 45% if the biggest gap is less than 0.20 or we have a matescore. 

x = 13% if 1 move recaptures a queen (z=3.0) 

x = 15% if 2 moves recaptures a bishop (z=3.0) 

x = 31% if 2 moves recaptures a pawn (z=1.0) 

x = 33% if 1 move is z=0.50 better than the rest 

x = 39% if 3 moves are z=0.25 better than the rest. 

Notice I try to take advantage of small gaps too.... 

This is just a quick shot that must be refined - but the point is, that there may be some elo in a better time disposition. 
A similar calculation should be done to stop earlier in the middle of an iteration.


PS.
Right now I am running a selftest.
After 180 games the version leading with 53% has this rule added:
Don't start a new iteration if more than 30% of the time for the move is used.

Has anyone else tried to add this simple rule?
An easy way to gain 20 elo if a longer test confirms the 53%.

I have not made anything about easy moves yet, but I will soon do that as well as a better ordering at the root.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: obvious/easy move - final results

Post by bob »

Don wrote:
bob wrote:
Don wrote:
AlvaroBegue wrote:
bob wrote:OK, this is one of those GOOD exampls of why you should not draw conclusions after even 8500 games. Here is the 30K game final results:

Code: Select all


   2 Crafty-23.6-1        2652    4    4 30000   62%  2557   25%
   3 Crafty-23.6R02-1     2651    4    4 30000   62%  2557   25%
In short, no measurable difference with ponder = off.
That's exactly what I suspected. This type of thing doesn't happen very often, and when it does it just saves a bit of time, so the improvement would be similar to a very minor speed up (say, 1%), which I don't expect to be measurable in ELO.

Thanks for running this!
The ELO gain from doing this in Komodo is surprisingly large, so large we don't need thousands of games to see it clearly. It is on the order of 25 ELO.

It's not done in any way described here.
25 elo suggests a 50% savings in time. Seems REALLY difficult to imagine this producing that kind of time savings, letting you average about 1.5x as much time as without the algorithm.
Well then you lack imagination. A big part of the reason Komodo is so strong is that we have a number of exceptional innovations such that go beyond the usual and ordinary.

The easy move in Komodo is very powerful and saves a lot of time. When it was implemented we had to make our normal time control significantly more aggressive to compensate.

The basic concept is that EVERY move is an easy move (if the program doesn't change it's mind) and it's just a matter of degrees. It can cause a search to return between 5% and 100% of the time it would normally take depending on the degree of easiness. The nature is such that most moves will abort at least a little earlier than normal when the program does not change it's mind during the iteration. This is so much more sensible that a single narrowly defined class of "easy move" which rarely kicks in.

I did say "on the order of" which is a signal that I did not give you a precise number. I think in ponder mode it probably is worth 25 ELO but I did run a test with 12,000 games to estimate this again and it comes out to be 17 ELO when running 12 second games. I turned off the easy move stuff of course in one version in order to run this test.

The funny thing about this and your statement is that when I set out to do a superior "easy move" heuristic my goal was to get an effective 50% saving in time. As a human I spend 80% of my time on 20% of the moves. I had this concept that most of the moves could be searched in half the time without having too much of an impact on the strength. As it turns out that was pretty optimistic and I don't think I obtained 50% equivalent but I did get a lot. Or maybe I did get a 50% effective speedup but it's not free since aborting early entails some risk. I was actually pretty disappointed that it wasn't more because this seems like really "low hanging fruit" an obvious way to get an effective and free non-trivial speedup. Just stop being stupid about spending a huge amount of time on an obvious move! Easier said than done but it CAN be done.

There are two side effects to this algorithm which people have reported on in private mail and even on the forums:

The first side effect is that once in a while someone will report that "Komodo is overly aggressive about time use" and usually they will explain that it happens only on occasion where Komodo will have very little time left and the other program will have several minutes left. That would be because the game did not contain many moves that were very easy. We have to set our time control more aggressive that usual to compensate for the easy move algorithm, otherwise it would finish the games with too much time on average. The point of the heuristic is to spend MORE time on important moves, not to make the computer race through the game and finish with a huge amount of time left on the clock. The normal time control mechanism will gradually distribute this time to other moves but it will not be aggressive enough to make a very big difference.

The second side effect is that people will report that on occasion Komodo will make a blunder and this is partly responsible for it's reputation as being a program that is not very tactical. The truth is that Komodo is very strong tactically, at least much better than it's reputation. But this is where our easy move goes wrong. Nevertheless we have carefully tested this and the concept is that you have to take a risk to get a real gain. The average engineer is stupid about things like this, they try to "fix" every problem by removing beneficial risk. You play a stupid move in some game that happens to be important and it affects you so deeply that you vow that it will never happen again so you weaken the program a little to prevent it. Knee jerk engineering.

Incidentally, I don't know where you get 50% time saving equivalent, that does not match Komodo's reality. Here is a test in progress where I pit a 6 + 0.06 search against a 9 + 0.09 search:

Code: Select all

Rank    ELO     +/-    Games    Score  Player
---- ------- ------ -------- --------  ----------------------------
   1  3096.8   10.1     2434   63.640  9 
   2  3000.0   10.1     2434   36.360  6 

w/l/d: 717 519 1198    49.22 percent draws


      TIME       RATIO    log(r)     NODES    log(r)  ave DEPTH    GAMES   PLAYER
 ---------  ----------  --------  --------  --------  ---------  -------   -
    0.1584       1.000     0.000     0.141     0.000    11.4669     2434   6
    0.2328       1.470     0.385     0.212     0.406    12.2415     2434   9
Notice that we get the better part of a full additional ply with a 50% increase in time to go from less than 11.5 ply to almost 12 1/4 ply.

So the "effective" extra time we get is a lot less than 50% but it is substantial.
I get the 50% from the common measurements that show that doubling one's speed is good for maybe +50 elo. Increasing it by 50% would, then, give about 1/2 of that...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: obvious/easy move - final results

Post by bob »

AlvaroBegue wrote:I would be careful when using self-tests for anything having to do with time control and pondering. The number of ponder hits is likely much higher in self-tests than playing against a different opponent, and that will distort the results. Even with pondering off, the amount of useful information left in the transposition tables is probably also higher than it should be.

This also applies to Bob's test, of course.

Do you guys test against some reference opponents as well?
I don't do ANY "self-testing" except for debugging, when I am hunting for some sort of problem.

Otherwise, all of my testing is against a pool of other opponents (stockfish, etc...)