BubbaTough wrote:Given that one area of steady progress in software improvements has been in decreasing the branching factor, it would not be surprising if the longer the game, the more valuable the software changes prove. Perhaps, no matter what hardware difference is claimed, at some length of time, perhaps days or weeks, software changes prove more important than hardware. Conversely, at some short length of time, the speed of hardware is much more important than software changes.
-Sam
That is based on conjecture rather than fact. For example, I have run extensive tests on null-move, and could find no difference in Elo gain/loss for several time controls. And turning it off is not a huge deal. Same thing for LMR. Your assumption is that today's plies are the same as plies 10 years ago, we just get more of them with more and more time because of the reduced branching factor. That is not a given, in my opinion. Which means that the 'additional plies" with longer time controls is not necessarily _that_ much better overall.
You can probably find the posts where I ran some tests with null move or LMR or both disabled... But your argument is flawed in a basic way. What you are really saying is that the _faster_ the hardware gets, the better these algorithms perform. And that is not the point most want to accept.
Again, hardware has brought more gain than software. How much is TBD, but I'd bet 2/3 for hardware, 1/3 for software myself. And anyone that says we need a longer time control to test is only making the point more clear.
I do not know if null move or LMR help more at long time control but
I strongly believe that other factors help more at long time control.
1)better order of moves relative to 1998
Eh? My move ordering has not changed at all since 1998. Or even since 1988. I have been using the same ordering for at least 25 years now. And I believe that everyone else is doing about the same thing. Hash move. Sorted captures. killer/history moves. Etc...
2)better evaluation relative to 1998
I agree to a point. But not because our evaluation is that much better by itself, but because we can get away with doing things today that we could _not_ afford in 1998 because the hardware was too slow.
I think that 2 may be more important but it is obvious that 1 helps more at longer time control.
I don't even believe (1) exists... what move ordering are you using today that was not used in 1998? killers? 1970's in slate/atkin. SEE? 1970s from Coko to Blitz and everything in between. History moves? 1980's.
Uri
1)The order of move 10 years ago is not optimal move ordering so common sense tell me that it is an area when an improvement is possible.
There are clearly ideas for better move ordering relative to the idea that you use.
For example
suppose black plays a6 that threats Bb5 and suppose your program has no moves in the hash.
Does your program search escapes with the bishop first?
If you search only hash captures killers and history there is no way for you to know to search escapes of the bishop b5 first.
2)For the evaluation if your changes are productive today at 1+1 time control then they were also productive in 1998 in tournament time control.
The fact that you had no time to test it at that time does not mean that it is not better software for old hardware.
Remember that I am interested in 120/40 time control and not in blitz when I talk about software improvement.
I do not talk about correspondence time control that is impossible to test but about time control that the ssdf used.
bob wrote:Couple of points. First, no way you could get 75K on a P200/mmx. The Pentium Pro 200 would do 75 K, but the basic Pentium, all the way thru the P200/mmx was a different processor architecture, no out of order execution or any of that. I had a p200/mmx laptop that would not break 50K. The Pentium pro 200 was getting around 75K. I ran on it in Jakarta, and in my office, until I got the quad Pentium pro 200 box.
If you take the 400mhz processor, which I am not sure was around in 1988, at least the xeon 2mb L2 that I had, then 150K nodes per second would be the answer. Late in 2008 an i7 was producing 20M nodes per second.. If you divide 20M by 150K you get a raw factor of 133x. If you factor in the 3.3/4.0 speedup for a parallel search you get a factor of 110x.
We could settle on 100:1 for simplicity, although that is _still_ a daunting advantage. I find it amusing that so many are convinced that software is most of the improvement, yet we are quibbling over a 10%-20% speed difference in hardware. It seems, to me, that perhaps most really do believe that hardware is the more important contributor to today's ratings in spite of rhetoric to the contrary...
Once we get to the improvement in hardware as settled, then there are going to be enough arguments to make the test hopeless anyway. 1 sec to 100 secs is a _long_ game. At 60 moves as a reasonable average game length, that turns into almost 2 hours per game. Even at 256 games at a time on my cluster, it takes a long time to play enough games to get good answers. And some are not going to accept 1 second vs 100 seconds. And want 1 minute to 100 minutes, which is not going to happen with me...
One hundred to one does sound like a nice number. You seem convinced that is too little, others are sure it's too much.
Are you considering 1 vs 100 seconds as an increment? That would be unfair to Fritz. The fairest time would have the time they were tuned for as the geometric mean. While everyone I'm sure tries to tune for longer time controls, in practice I think it must be much shorter, perhaps someplace around 30 seconds per game. I think either 2"/40 vs 200"/40 or 1" + 0.03 vs 100" + 3 would be in the right ballpark, assuming Rybka's time management doesn't completely fail at that level.
I do not care about the time the program were tuned for.
progress of software is based on result and not based on intentions.
If program X was better at 120/40 ponder on ssdf games and weaker at blitz at 1998 then I consider program X to be better for the discussion.
120/40 ponder on games mean something equivalent to 4.5 minutes ponder off on P200MMX so it clearly mean more than 1 second on hardware of today.
Uri
You totally miss his point. Programs of a specific period are tuned for the usual time controls of that period, on the hardware of that period. Whether this is an issue with a very long increment or not I don't know. Crafty can work with any increment at all....
I talk about ssdf time control(120/40 ponder on) or something equivalent to it.
Programmers clearly cared about winning ssdf and world championship was even equivalent to slower time control if you consider the fact that programs use a better hardware in WCCC.
They had not enough computer time to test on equivalent time control and I agree that part of the software improvement is thanks to better hardware that allow more testing but the reason for software improvement does not change the fact that it is a software improvement.
BubbaTough wrote:Given that one area of steady progress in software improvements has been in decreasing the branching factor, it would not be surprising if the longer the game, the more valuable the software changes prove. Perhaps, no matter what hardware difference is claimed, at some length of time, perhaps days or weeks, software changes prove more important than hardware. Conversely, at some short length of time, the speed of hardware is much more important than software changes.
-Sam
...
That is based on conjecture rather than fact. For example, I have run extensive tests on null-move, and could find no difference in Elo gain/loss for several time controls. And turning it off is not a huge deal. Same thing for LMR. Your assumption is that today's plies are the same as plies 10 years ago, we just get more of them with more and more time because of the reduced branching factor. That is not a given, in my opinion. Which means that the 'additional plies" with longer time controls is not necessarily _that_ much better overall.
...
It is not just based on conjecture, it IS conjecture. Notice key words/phrases such as "Not surprising if" and "Perhaps". I have not made any assumptions about equal plies...unequal plys were definitely on my mind as I typed (otherwise I would not need my qualifiers). My personal experience has been older programs make much worse use of extra time on today's hardware than today's programs. MUCH worse. Thus my speculation...
Though it hardly needs to be said: as always, my speculative hypothesis based on flimsy data might be wrong. I don't have a particularly strong belief that its true myself, it just sounds plausible...perhaps 63% likely.
-Sam
My personal experience is also that older programs earn less from time.
It is based on matches at unequal time control of movei against other programs.
Movei could perform better against weaker programs when the time control was longer and rybka2.3.2a could perform better against movei when the time control was longer when I chose 10:1 time handicap under arena.
Note also that better books is part of software improvement unless the book is too big to be memorized by old hardware but inspite of it
I agree to having the same book because I am interested in engine improvement and not in total software improvements.
Uri
Why use a book? That's a silly way of testing when you imply "I am not interested in book improvement..." I don't test with books of any kind, to get rid of that aspect of bias.
I am also not interested in book improvement but the subject is about hardware vs software.
If we talk about software improvement then book is included in software
so I think that
the subject should be hardware vs software(not including books)
BubbaTough wrote:Given that one area of steady progress in software improvements has been in decreasing the branching factor, it would not be surprising if the longer the game, the more valuable the software changes prove. Perhaps, no matter what hardware difference is claimed, at some length of time, perhaps days or weeks, software changes prove more important than hardware. Conversely, at some short length of time, the speed of hardware is much more important than software changes.
-Sam
That is based on conjecture rather than fact. For example, I have run extensive tests on null-move, and could find no difference in Elo gain/loss for several time controls. And turning it off is not a huge deal. Same thing for LMR. Your assumption is that today's plies are the same as plies 10 years ago, we just get more of them with more and more time because of the reduced branching factor. That is not a given, in my opinion. Which means that the 'additional plies" with longer time controls is not necessarily _that_ much better overall.
You can probably find the posts where I ran some tests with null move or LMR or both disabled... But your argument is flawed in a basic way. What you are really saying is that the _faster_ the hardware gets, the better these algorithms perform. And that is not the point most want to accept.
Again, hardware has brought more gain than software. How much is TBD, but I'd bet 2/3 for hardware, 1/3 for software myself. And anyone that says we need a longer time control to test is only making the point more clear.
I do not know if null move or LMR help more at long time control but
I strongly believe that other factors help more at long time control.
1)better order of moves relative to 1998
Eh? My move ordering has not changed at all since 1998. Or even since 1988. I have been using the same ordering for at least 25 years now. And I believe that everyone else is doing about the same thing. Hash move. Sorted captures. killer/history moves. Etc...
2)better evaluation relative to 1998
I agree to a point. But not because our evaluation is that much better by itself, but because we can get away with doing things today that we could _not_ afford in 1998 because the hardware was too slow.
I think that 2 may be more important but it is obvious that 1 helps more at longer time control.
I don't even believe (1) exists... what move ordering are you using today that was not used in 1998? killers? 1970's in slate/atkin. SEE? 1970s from Coko to Blitz and everything in between. History moves? 1980's.
Uri
1)The order of move 10 years ago is not optimal move ordering so common sense tell me that it is an area when an improvement is possible.
There are clearly ideas for better move ordering relative to the idea that you use.
For example
suppose black plays a6 that threats Bb5 and suppose your program has no moves in the hash.
Does your program search escapes with the bishop first?
If you search only hash captures killers and history there is no way for you to know to search escapes of the bishop b5 first.
So? we program for the _general_ case. The time spent detecting the fact that the bishop is under attack, so that you can try to move it (and presumably to a safe square as well) does not sound like "time well-spent'>
But in any case, does anybody do such today???
2)For the evaluation if your changes are productive today at 1+1 time control then they were also productive in 1998 in tournament time control.
You are thinking backward. They might be effective today at tournament time controls, but in 1998 that would be a couple of hours per move so no one tested them in that way.
The fact that you had no time to test it at that time does not mean that it is not better software for old hardware.
Remember that I am interested in 120/40 time control and not in blitz when I talk about software improvement.
As am I.
I do not talk about correspondence time control that is impossible to test but about time control that the ssdf used.
Uri
however you can't compare ssdf ratings of today with 10 years ago, because in 1998 they were testing at a speed that would be considered "blitz" today.
BubbaTough wrote:Given that one area of steady progress in software improvements has been in decreasing the branching factor, it would not be surprising if the longer the game, the more valuable the software changes prove. Perhaps, no matter what hardware difference is claimed, at some length of time, perhaps days or weeks, software changes prove more important than hardware. Conversely, at some short length of time, the speed of hardware is much more important than software changes.
-Sam
That is based on conjecture rather than fact. For example, I have run extensive tests on null-move, and could find no difference in Elo gain/loss for several time controls. And turning it off is not a huge deal. Same thing for LMR. Your assumption is that today's plies are the same as plies 10 years ago, we just get more of them with more and more time because of the reduced branching factor. That is not a given, in my opinion. Which means that the 'additional plies" with longer time controls is not necessarily _that_ much better overall.
You can probably find the posts where I ran some tests with null move or LMR or both disabled... But your argument is flawed in a basic way. What you are really saying is that the _faster_ the hardware gets, the better these algorithms perform. And that is not the point most want to accept.
Again, hardware has brought more gain than software. How much is TBD, but I'd bet 2/3 for hardware, 1/3 for software myself. And anyone that says we need a longer time control to test is only making the point more clear.
I do not know if null move or LMR help more at long time control but
I strongly believe that other factors help more at long time control.
1)better order of moves relative to 1998
Eh? My move ordering has not changed at all since 1998. Or even since 1988. I have been using the same ordering for at least 25 years now. And I believe that everyone else is doing about the same thing. Hash move. Sorted captures. killer/history moves. Etc...
2)better evaluation relative to 1998
I agree to a point. But not because our evaluation is that much better by itself, but because we can get away with doing things today that we could _not_ afford in 1998 because the hardware was too slow.
I think that 2 may be more important but it is obvious that 1 helps more at longer time control.
I don't even believe (1) exists... what move ordering are you using today that was not used in 1998? killers? 1970's in slate/atkin. SEE? 1970s from Coko to Blitz and everything in between. History moves? 1980's.
Uri
1)The order of move 10 years ago is not optimal move ordering so common sense tell me that it is an area when an improvement is possible.
There are clearly ideas for better move ordering relative to the idea that you use.
For example
suppose black plays a6 that threats Bb5 and suppose your program has no moves in the hash.
Does your program search escapes with the bishop first?
If you search only hash captures killers and history there is no way for you to know to search escapes of the bishop b5 first.
So? we program for the _general_ case. The time spent detecting the fact that the bishop is under attack, so that you can try to move it (and presumably to a safe square as well) does not sound like "time well-spent'>
But in any case, does anybody do such today???
2)For the evaluation if your changes are productive today at 1+1 time control then they were also productive in 1998 in tournament time control.
You are thinking backward. They might be effective today at tournament time controls, but in 1998 that would be a couple of hours per move so no one tested them in that way.
The fact that you had no time to test it at that time does not mean that it is not better software for old hardware.
Remember that I am interested in 120/40 time control and not in blitz when I talk about software improvement.
As am I.
I do not talk about correspondence time control that is impossible to test but about time control that the ssdf used.
Uri
however you can't compare ssdf ratings of today with 10 years ago, because in 1998 they were testing at a speed that would be considered "blitz" today.
some points
1)ordering moves does not have to be in the same way in all plies
If calculating something is too expensive when the remaining depth is 1 then it is possible to do it only when the remaining depth is at least 2.
caclulating escapes with the bishop is a private case of calculating escapes with attacked piece and it is possible to have a function that detects attacked piece.
I do not know what rybka3 does and it is one of the things that it may do or not do but there are clearly ideas to try about better order of moves.
2)I agree to something equivalent to 120/40 ponder on with hardware of 1998 for P200MMX.
It means more than 1 second per move on hardware of today even if we accept 200:1 speed difference.
ponder on is practically even slightly more than 4.5 minutes per move because you predict more than half of the moves and there are book moves so it is equivalent to 5 minutes per move ponder off game and even if we assume 200:1 speed difference it is 1.5 seconds per move on Q6600.
only if you use something that is more than 300 times faster than P200MMX for Rybka we need faster time control than 1 second per move as suggested.
Uri Blass wrote:
1)The order of move 10 years ago is not optimal move ordering so common sense tell me that it is an area when an improvement is possible.
There are clearly ideas for better move ordering relative to the idea that you use.
For example
suppose black plays a6 that threats Bb5 and suppose your program has no moves in the hash.
Does your program search escapes with the bishop first?
If you search only hash captures killers and history there is no way for you to know to search escapes of the bishop b5 first.
So? we program for the _general_ case. The time spent detecting the fact that the bishop is under attack, so that you can try to move it (and presumably to a safe square as well) does not sound like "time well-spent'>
But in any case, does anybody do such today???
I thought many people include the SEE of a threat against a piece in the SEE score (used for capture sorting) of moves that triviaally solve that threat (by withdrawing the victim or capturing the attacker). Joker does that, (evein if it is a non-capture), and it seemed to help. As logic dictates it should.
The point is that the info you need to do it is nearly free. You only have to generate and sort captures after a null-move fail low, so you know the piece that is threatened from the null-move refutation. So there really is no overhead in cases where sorting the move in front would not almost certaily give a large time savings.
Crafty's move ordering is really quite primitive and sub-optimal. Only a few months ago you established yourself that sorting the good captures by SEE was 7 Elo points inferior to a sorting scheme (also theoretically suboptimal, b.t.w.) used by others. And 7 Elo points is a lot, it corresponds to a 7% reduction in tree size even if the change was free in terms of nps. After missing such a large effect, I would not be surprised at all if there were many more subtle move-sorting tricks that you missed as well.
Uri Blass wrote:
1)The order of move 10 years ago is not optimal move ordering so common sense tell me that it is an area when an improvement is possible.
There are clearly ideas for better move ordering relative to the idea that you use.
For example
suppose black plays a6 that threats Bb5 and suppose your program has no moves in the hash.
Does your program search escapes with the bishop first?
If you search only hash captures killers and history there is no way for you to know to search escapes of the bishop b5 first.
So? we program for the _general_ case. The time spent detecting the fact that the bishop is under attack, so that you can try to move it (and presumably to a safe square as well) does not sound like "time well-spent'>
But in any case, does anybody do such today???
I thought many people include the SEE of a threat against a piece in the SEE score (used for capture sorting) of moves that triviaally solve that threat (by withdrawing the victim or capturing the attacker). Joker does that, (evein if it is a non-capture), and it seemed to help. As logic dictates it should.
The point is that the info you need to do it is nearly free. You only have to generate and sort captures after a null-move fail low, so you know the piece that is threatened from the null-move refutation. So there really is no overhead in cases where sorting the move in front would not almost certaily give a large time savings.
Crafty's move ordering is really quite primitive and sub-optimal. Only a few months ago you established yourself that sorting the good captures by SEE was 7 Elo points inferior to a sorting scheme (also theoretically suboptimal, b.t.w.) used by others. And 7 Elo points is a lot, it corresponds to a 7% reduction in tree size even if the change was free in terms of nps. After missing such a large effect, I would not be surprised at all if there were many more subtle move-sorting tricks that you missed as well.
May well be lots of things I have not tried. But the "null-move refutation" is not one of them. I tried this a few months back and found no improvement at all in real games. I don't doubt that it might help in tactical positions, but chess is not only about tactics. When I tested this, the Elo dropped very slightly. I decided to do this when Tord mentioned the idea in a thread here some time back.
BTW I don't agree with your "7%" numbers. 70 Elo requires reducing the tree by about 50% total. 7 Elo doesn't sound like "7% tree reduction" to me, although I have not given it a lot of thought.
Uri Blass wrote:
1)The order of move 10 years ago is not optimal move ordering so common sense tell me that it is an area when an improvement is possible.
There are clearly ideas for better move ordering relative to the idea that you use.
For example
suppose black plays a6 that threats Bb5 and suppose your program has no moves in the hash.
Does your program search escapes with the bishop first?
If you search only hash captures killers and history there is no way for you to know to search escapes of the bishop b5 first.
So? we program for the _general_ case. The time spent detecting the fact that the bishop is under attack, so that you can try to move it (and presumably to a safe square as well) does not sound like "time well-spent'>
But in any case, does anybody do such today???
I thought many people include the SEE of a threat against a piece in the SEE score (used for capture sorting) of moves that triviaally solve that threat (by withdrawing the victim or capturing the attacker). Joker does that, (evein if it is a non-capture), and it seemed to help. As logic dictates it should.
The point is that the info you need to do it is nearly free. You only have to generate and sort captures after a null-move fail low, so you know the piece that is threatened from the null-move refutation. So there really is no overhead in cases where sorting the move in front would not almost certaily give a large time savings.
Crafty's move ordering is really quite primitive and sub-optimal. Only a few months ago you established yourself that sorting the good captures by SEE was 7 Elo points inferior to a sorting scheme (also theoretically suboptimal, b.t.w.) used by others. And 7 Elo points is a lot, it corresponds to a 7% reduction in tree size even if the change was free in terms of nps. After missing such a large effect, I would not be surprised at all if there were many more subtle move-sorting tricks that you missed as well.
May well be lots of things I have not tried. But the "null-move refutation" is not one of them. I tried this a few months back and found no improvement at all in real games. I don't doubt that it might help in tactical positions, but chess is not only about tactics. When I tested this, the Elo dropped very slightly. I decided to do this when Tord mentioned the idea in a thread here some time back.
BTW I don't agree with your "7%" numbers. 70 Elo requires reducing the tree by about 50% total. 7 Elo doesn't sound like "7% tree reduction" to me, although I have not given it a lot of thought.
if 70 elo requires reducing the tree to 50% of the original size then it means that
7 elo require reducing the tree to 0.5^(1/10) of the original size.
0.5^(1/10)~=0.933 so it is close to 6.7% tree reduction.
The rule of thumb that works reasonably for most enngines is
Elo = 100 ln(T) + constant.
As ln 2 = 0.693 this predicts 69.3 Elo gain for a doubling of the search time. For factors very close to 1 the logarithm can be expanded as ln(T) = ln(1+(T-1)) ~ T-1.
In this limit you gain 1 Elo per percent of speedup.
bob wrote:Couple of points. First, no way you could get 75K on a P200/mmx. The Pentium Pro 200 would do 75 K, but the basic Pentium, all the way thru the P200/mmx was a different processor architecture, no out of order execution or any of that. I had a p200/mmx laptop that would not break 50K. The Pentium pro 200 was getting around 75K. I ran on it in Jakarta, and in my office, until I got the quad Pentium pro 200 box.
If you take the 400mhz processor, which I am not sure was around in 1988, at least the xeon 2mb L2 that I had, then 150K nodes per second would be the answer. Late in 2008 an i7 was producing 20M nodes per second.. If you divide 20M by 150K you get a raw factor of 133x. If you factor in the 3.3/4.0 speedup for a parallel search you get a factor of 110x.
We could settle on 100:1 for simplicity, although that is _still_ a daunting advantage. I find it amusing that so many are convinced that software is most of the improvement, yet we are quibbling over a 10%-20% speed difference in hardware. It seems, to me, that perhaps most really do believe that hardware is the more important contributor to today's ratings in spite of rhetoric to the contrary...
Once we get to the improvement in hardware as settled, then there are going to be enough arguments to make the test hopeless anyway. 1 sec to 100 secs is a _long_ game. At 60 moves as a reasonable average game length, that turns into almost 2 hours per game. Even at 256 games at a time on my cluster, it takes a long time to play enough games to get good answers. And some are not going to accept 1 second vs 100 seconds. And want 1 minute to 100 minutes, which is not going to happen with me...
One hundred to one does sound like a nice number. You seem convinced that is too little, others are sure it's too much.
Are you considering 1 vs 100 seconds as an increment? That would be unfair to Fritz. The fairest time would have the time they were tuned for as the geometric mean. While everyone I'm sure tries to tune for longer time controls, in practice I think it must be much shorter, perhaps someplace around 30 seconds per game. I think either 2"/40 vs 200"/40 or 1" + 0.03 vs 100" + 3 would be in the right ballpark, assuming Rybka's time management doesn't completely fail at that level.
I do not care about the time the program were tuned for.
progress of software is based on result and not based on intentions.
If program X was better at 120/40 ponder on ssdf games and weaker at blitz at 1998 then I consider program X to be better for the discussion.
120/40 ponder on games mean something equivalent to 4.5 minutes ponder off on P200MMX so it clearly mean more than 1 second on hardware of today.
Uri
You totally miss his point. Programs of a specific period are tuned for the usual time controls of that period, on the hardware of that period. Whether this is an issue with a very long increment or not I don't know. Crafty can work with any increment at all....
I talk about ssdf time control(120/40 ponder on) or something equivalent to it.
Programmers clearly cared about winning ssdf and world championship was even equivalent to slower time control if you consider the fact that programs use a better hardware in WCCC.
They had not enough computer time to test on equivalent time control and I agree that part of the software improvement is thanks to better hardware that allow more testing but the reason for software improvement does not change the fact that it is a software improvement.
Uri
You are _still_ missing the point. Programmers of _today_ are tuning their programs based on the hardware speeds we have today. At longer time controls. Programmers of 10 years ago were tuning their programs based on the hardware speeds of 10 years ago. Part of what we do in software has been enabled by having faster hardware. No way we can compare normal time controls of today against 1998 hardware. 100:1 makes the games intractable, even for my cluster...