Hardware vs Software

Uri Blass · Post by **Uri Blass** » Sun Jan 18, 2009 9:36 am

bob wrote:
BubbaTough wrote:Given that one area of steady progress in software improvements has been in decreasing the branching factor, it would not be surprising if the longer the game, the more valuable the software changes prove. Perhaps, no matter what hardware difference is claimed, at some length of time, perhaps days or weeks, software changes prove more important than hardware. Conversely, at some short length of time, the speed of hardware is much more important than software changes.

-Sam
That is based on conjecture rather than fact. For example, I have run extensive tests on null-move, and could find no difference in Elo gain/loss for several time controls. And turning it off is not a huge deal. Same thing for LMR. Your assumption is that today's plies are the same as plies 10 years ago, we just get more of them with more and more time because of the reduced branching factor. That is not a given, in my opinion. Which means that the 'additional plies" with longer time controls is not necessarily _that_ much better overall.

You can probably find the posts where I ran some tests with null move or LMR or both disabled... But your argument is flawed in a basic way. What you are really saying is that the _faster_ the hardware gets, the better these algorithms perform. And that is not the point most want to accept.

Again, hardware has brought more gain than software. How much is TBD, but I'd bet 2/3 for hardware, 1/3 for software myself. And anyone that says we need a longer time control to test is only making the point more clear.

I do not know if null move or LMR help more at long time control but
I strongly believe that other factors help more at long time control.

1)better order of moves relative to 1998
2)better evaluation relative to 1998

I think that 2 may be more important but it is obvious that 1 helps more at longer time control.

Uri

hgm · Post by **hgm** » Sun Jan 18, 2009 9:46 am

I am surprised this discussion can linger on for such a long time. It is not that difficult to determine on the one hand the rating of old software on modern hardware, and on the other hand ho many Elo points both modern alnd old software lose if you run them with a time odds factor of 1000 and a small hash table to simulate old hardware.

It is not like you need 1-Elo accuracy to determine that. A couple of dozen games should be enough to get a good fix on the 500-1000 Elo difference we are talking about.

Uri Blass · Post by **Uri Blass** » Sun Jan 18, 2009 9:48 am

I can add that it may be interesting if you test new Crafty against old Crafty of 1998 at unequal time control and choose time control like
5+5 for old Crafty against 1+1 for new Crafty and later repeat the experiment at 4+4 against 20+20

My opinion is that new Crafty is going to score better at 4+4 against 20+20

Uri

Uri Blass · Post by **Uri Blass** » Sun Jan 18, 2009 2:03 pm

BubbaTough wrote:
bob wrote:
BubbaTough wrote:Given that one area of steady progress in software improvements has been in decreasing the branching factor, it would not be surprising if the longer the game, the more valuable the software changes prove. Perhaps, no matter what hardware difference is claimed, at some length of time, perhaps days or weeks, software changes prove more important than hardware. Conversely, at some short length of time, the speed of hardware is much more important than software changes.

-Sam

...
That is based on conjecture rather than fact. For example, I have run extensive tests on null-move, and could find no difference in Elo gain/loss for several time controls. And turning it off is not a huge deal. Same thing for LMR. Your assumption is that today's plies are the same as plies 10 years ago, we just get more of them with more and more time because of the reduced branching factor. That is not a given, in my opinion. Which means that the 'additional plies" with longer time controls is not necessarily _that_ much better overall.
...
It is not just based on conjecture, it IS conjecture. Notice key words/phrases such as "Not surprising if" and "Perhaps". I have not made any assumptions about equal plies...unequal plys were definitely on my mind as I typed (otherwise I would not need my qualifiers). My personal experience has been older programs make much worse use of extra time on today's hardware than today's programs. MUCH worse. Thus my speculation...

Though it hardly needs to be said: as always, my speculative hypothesis based on flimsy data might be wrong. I don't have a particularly strong belief that its true myself, it just sounds plausible...perhaps 63% likely.

-Sam

My personal experience is also that older programs earn less from time.
It is based on matches at unequal time control of movei against other programs.

Movei could perform better against weaker programs when the time control was longer and rybka2.3.2a could perform better against movei when the time control was longer when I chose 10:1 time handicap under arena.

Note also that better books is part of software improvement unless the book is too big to be memorized by old hardware but inspite of it
I agree to having the same book because I am interested in engine improvement and not in total software improvements.

Uri

diep · Post by **diep** » Sun Jan 18, 2009 6:05 pm

Uri Blass wrote:
bob wrote:
CRoberson wrote:Yes, we did fall into the discussion of what is each part worth.

So, here is what I'd like to see tested.

1) Crafty without LMR and without Null Move.
2) Crafty without LMR and with Null Move.
3) Crafty with LMR and without Null Move
4) Crafty with both.

I think we could leave all other Crafty parameters as is which means
the 4th experiment is your current base code.

Once these 4 experiments are done. I'd like to see the PV verification
prunning tested - using a 0 window on the sibling nodes.
I can run those. I think I might start this in a few minutes. I will run two 32,000 game tests for each of the above 4 scenarios. Note that these will be very fast games, which may well exaggerate the effectiveness of any pruning idea, but at least these tests will establish an estimated upper bound on the improvement each provides. As far as the PV verification, this is not something I do in crafty and am not sure exactly what this is unless it is known by another name...

I have the test running. 8 x 32,000 games will take around 8 hours so I should have the results around 11:00pm CST. I will post them when they are done, unless the cluster slows down a bit due to other users and it takes a little longer than expected...
fast games can also reduce the effect of pruning and we do not know
if it is going to reduce or increase the effect without testing.

If you take an extreme case(that of course does not happen) then it is obvious that null move is not used at depth=1 so it does not change nothing if Crafty cannot get more than depth 1.

Uri

WRONG,

fast games INCREASE the effect of pruning.

bob · Post by **bob** » Sun Jan 18, 2009 6:27 pm

BubbaTough wrote:
bob wrote:
BubbaTough wrote:Given that one area of steady progress in software improvements has been in decreasing the branching factor, it would not be surprising if the longer the game, the more valuable the software changes prove. Perhaps, no matter what hardware difference is claimed, at some length of time, perhaps days or weeks, software changes prove more important than hardware. Conversely, at some short length of time, the speed of hardware is much more important than software changes.

-Sam

...
That is based on conjecture rather than fact. For example, I have run extensive tests on null-move, and could find no difference in Elo gain/loss for several time controls. And turning it off is not a huge deal. Same thing for LMR. Your assumption is that today's plies are the same as plies 10 years ago, we just get more of them with more and more time because of the reduced branching factor. That is not a given, in my opinion. Which means that the 'additional plies" with longer time controls is not necessarily _that_ much better overall.
...
It is not just based on conjecture, it IS conjecture. Notice key words/phrases such as "Not surprising if" and "Perhaps". I have not made any assumptions about equal plies...unequal plys were definitely on my mind as I typed (otherwise I would not need my qualifiers). My personal experience has been older programs make much worse use of extra time on today's hardware than today's programs. MUCH worse. Thus my speculation...

Though it hardly needs to be said: as always, my speculative hypothesis based on flimsy data might be wrong. I don't have a particularly strong belief that its true myself, it just sounds plausible...perhaps 63% likely.

-Sam

I am working on resurrecting an older version of Crafty. First stop will be to run it on the cluster to see how it fares against the same opponents/positions I am using today...

bob · Post by **bob** » Sun Jan 18, 2009 6:29 pm

Uri Blass wrote:
BubbaTough wrote:
bob wrote:
BubbaTough wrote:Given that one area of steady progress in software improvements has been in decreasing the branching factor, it would not be surprising if the longer the game, the more valuable the software changes prove. Perhaps, no matter what hardware difference is claimed, at some length of time, perhaps days or weeks, software changes prove more important than hardware. Conversely, at some short length of time, the speed of hardware is much more important than software changes.

-Sam

...
That is based on conjecture rather than fact. For example, I have run extensive tests on null-move, and could find no difference in Elo gain/loss for several time controls. And turning it off is not a huge deal. Same thing for LMR. Your assumption is that today's plies are the same as plies 10 years ago, we just get more of them with more and more time because of the reduced branching factor. That is not a given, in my opinion. Which means that the 'additional plies" with longer time controls is not necessarily _that_ much better overall.
...
It is not just based on conjecture, it IS conjecture. Notice key words/phrases such as "Not surprising if" and "Perhaps". I have not made any assumptions about equal plies...unequal plys were definitely on my mind as I typed (otherwise I would not need my qualifiers). My personal experience has been older programs make much worse use of extra time on today's hardware than today's programs. MUCH worse. Thus my speculation...

Though it hardly needs to be said: as always, my speculative hypothesis based on flimsy data might be wrong. I don't have a particularly strong belief that its true myself, it just sounds plausible...perhaps 63% likely.

-Sam
My personal experience is also that older programs earn less from time.
It is based on matches at unequal time control of movei against other programs.

Movei could perform better against weaker programs when the time control was longer and rybka2.3.2a could perform better against movei when the time control was longer when I chose 10:1 time handicap under arena.

Note also that better books is part of software improvement unless the book is too big to be memorized by old hardware but inspite of it
I agree to having the same book because I am interested in engine improvement and not in total software improvements.

Uri

Why use a book? That's a silly way of testing when you imply "I am not interested in book improvement..." I don't test with books of any kind, to get rid of that aspect of bias.

bob · Post by **bob** » Sun Jan 18, 2009 6:33 pm

Uri Blass wrote:
bob wrote:
BubbaTough wrote:Given that one area of steady progress in software improvements has been in decreasing the branching factor, it would not be surprising if the longer the game, the more valuable the software changes prove. Perhaps, no matter what hardware difference is claimed, at some length of time, perhaps days or weeks, software changes prove more important than hardware. Conversely, at some short length of time, the speed of hardware is much more important than software changes.

-Sam
That is based on conjecture rather than fact. For example, I have run extensive tests on null-move, and could find no difference in Elo gain/loss for several time controls. And turning it off is not a huge deal. Same thing for LMR. Your assumption is that today's plies are the same as plies 10 years ago, we just get more of them with more and more time because of the reduced branching factor. That is not a given, in my opinion. Which means that the 'additional plies" with longer time controls is not necessarily _that_ much better overall.

You can probably find the posts where I ran some tests with null move or LMR or both disabled... But your argument is flawed in a basic way. What you are really saying is that the _faster_ the hardware gets, the better these algorithms perform. And that is not the point most want to accept.

Again, hardware has brought more gain than software. How much is TBD, but I'd bet 2/3 for hardware, 1/3 for software myself. And anyone that says we need a longer time control to test is only making the point more clear.
I do not know if null move or LMR help more at long time control but
I strongly believe that other factors help more at long time control.

1)better order of moves relative to 1998

Eh? My move ordering has not changed at all since 1998. Or even since 1988. I have been using the same ordering for at least 25 years now. And I believe that everyone else is doing about the same thing. Hash move. Sorted captures. killer/history moves. Etc...

2)better evaluation relative to 1998

I agree to a point. But not because our evaluation is that much better by itself, but because we can get away with doing things today that we could _not_ afford in 1998 because the hardware was too slow.

I think that 2 may be more important but it is obvious that 1 helps more at longer time control.

I don't even believe (1) exists... what move ordering are you using today that was not used in 1998? killers? 1970's in slate/atkin. SEE? 1970s from Coko to Blitz and everything in between. History moves? 1980's.

Uri

bob · Post by **bob** » Sun Jan 18, 2009 6:36 pm

Dirt wrote:
bob wrote:Couple of points. First, no way you could get 75K on a P200/mmx. The Pentium Pro 200 would do 75 K, but the basic Pentium, all the way thru the P200/mmx was a different processor architecture, no out of order execution or any of that. I had a p200/mmx laptop that would not break 50K. The Pentium pro 200 was getting around 75K. I ran on it in Jakarta, and in my office, until I got the quad Pentium pro 200 box.

If you take the 400mhz processor, which I am not sure was around in 1988, at least the xeon 2mb L2 that I had, then 150K nodes per second would be the answer. Late in 2008 an i7 was producing 20M nodes per second.. If you divide 20M by 150K you get a raw factor of 133x. If you factor in the 3.3/4.0 speedup for a parallel search you get a factor of 110x.

We could settle on 100:1 for simplicity, although that is _still_ a daunting advantage. I find it amusing that so many are convinced that software is most of the improvement, yet we are quibbling over a 10%-20% speed difference in hardware. It seems, to me, that perhaps most really do believe that hardware is the more important contributor to today's ratings in spite of rhetoric to the contrary...

Once we get to the improvement in hardware as settled, then there are going to be enough arguments to make the test hopeless anyway. 1 sec to 100 secs is a _long_ game. At 60 moves as a reasonable average game length, that turns into almost 2 hours per game. Even at 256 games at a time on my cluster, it takes a long time to play enough games to get good answers. And some are not going to accept 1 second vs 100 seconds. And want 1 minute to 100 minutes, which is not going to happen with me...
One hundred to one does sound like a nice number. You seem convinced that is too little, others are sure it's too much.

Are you considering 1 vs 100 seconds as an increment? That would be unfair to Fritz. The fairest time would have the time they were tuned for as the geometric mean. While everyone I'm sure tries to tune for longer time controls, in practice I think it must be much shorter, perhaps someplace around 30 seconds per game. I think either 2"/40 vs 200"/40 or 1" + 0.03 vs 100" + 3 would be in the right ballpark, assuming Rybka's time management doesn't completely fail at that level.

I was considering 1 vs 100 sec increment as barely doable, because the games are too long. I can easily do 0.1 vs 10.0, but it would have to be a linux/xboard ready program.

bob · Post by **bob** » Sun Jan 18, 2009 6:38 pm

Uri Blass wrote:
Dirt wrote:
bob wrote:Couple of points. First, no way you could get 75K on a P200/mmx. The Pentium Pro 200 would do 75 K, but the basic Pentium, all the way thru the P200/mmx was a different processor architecture, no out of order execution or any of that. I had a p200/mmx laptop that would not break 50K. The Pentium pro 200 was getting around 75K. I ran on it in Jakarta, and in my office, until I got the quad Pentium pro 200 box.

If you take the 400mhz processor, which I am not sure was around in 1988, at least the xeon 2mb L2 that I had, then 150K nodes per second would be the answer. Late in 2008 an i7 was producing 20M nodes per second.. If you divide 20M by 150K you get a raw factor of 133x. If you factor in the 3.3/4.0 speedup for a parallel search you get a factor of 110x.

We could settle on 100:1 for simplicity, although that is _still_ a daunting advantage. I find it amusing that so many are convinced that software is most of the improvement, yet we are quibbling over a 10%-20% speed difference in hardware. It seems, to me, that perhaps most really do believe that hardware is the more important contributor to today's ratings in spite of rhetoric to the contrary...

Once we get to the improvement in hardware as settled, then there are going to be enough arguments to make the test hopeless anyway. 1 sec to 100 secs is a _long_ game. At 60 moves as a reasonable average game length, that turns into almost 2 hours per game. Even at 256 games at a time on my cluster, it takes a long time to play enough games to get good answers. And some are not going to accept 1 second vs 100 seconds. And want 1 minute to 100 minutes, which is not going to happen with me...
One hundred to one does sound like a nice number. You seem convinced that is too little, others are sure it's too much.

Are you considering 1 vs 100 seconds as an increment? That would be unfair to Fritz. The fairest time would have the time they were tuned for as the geometric mean. While everyone I'm sure tries to tune for longer time controls, in practice I think it must be much shorter, perhaps someplace around 30 seconds per game. I think either 2"/40 vs 200"/40 or 1" + 0.03 vs 100" + 3 would be in the right ballpark, assuming Rybka's time management doesn't completely fail at that level.
I do not care about the time the program were tuned for.

progress of software is based on result and not based on intentions.
If program X was better at 120/40 ponder on ssdf games and weaker at blitz at 1998 then I consider program X to be better for the discussion.

120/40 ponder on games mean something equivalent to 4.5 minutes ponder off on P200MMX so it clearly mean more than 1 second on hardware of today.

Uri

You totally miss his point. Programs of a specific period are tuned for the usual time controls of that period, on the hardware of that period. Whether this is an issue with a very long increment or not I don't know. Crafty can work with any increment at all....

Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software