Hardware vs Software

Discussion of chess software programming and technical issues.

Moderator: Ras

Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 10:01 pm
Location: Irvine, CA, USA

Re: Hardware vs Software

Post by Dirt »

bob wrote:
Dirt wrote:
bob wrote:First let's settle on a 10 year hardware period. The q6600 is two years old. If you want to use that as a basis, we need to return to early 1997 to choose the older hardware. The Pentium 2 (Klamath) came out around the middle of 1997, which probably means the best was the Pentium pro 200. I suspect we are _still_ talking about 200:1

This is not about simple clock frequency improvements, more modern architectures are faster for other reasons such as better speculative execution, more pipelines, register renaming, etc...
Correct me if I'm wrong, but in moving to a time handicap you seem to be ignoring the parallel search inefficiency we were both just explaining to Louis Zulli. Shouldn't that be taken into account?
I don't see why. I used the same parallel search 10 years ago that I use today, the overhead has not changed.
I don't understand that response. I may have lost track of something in the discussion.

It seemed to me that you have said that a certain multi-core computer in 2008 was giving you 200 times the nodes per second that an equivalent single processor computer was giving you in 1998, and therefore you could simulate a match between them by using 200:1 time odds. I'm suggesting that the parallel search on the multi-core computer reduces the effective speedup to something less than that, exactly how much you would know better than I.
Uri Blass
Posts: 10682
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Hardware vs Software

Post by Uri Blass »

bob wrote:
Dirt wrote:
bob wrote:First let's settle on a 10 year hardware period. The q6600 is two years old. If you want to use that as a basis, we need to return to early 1997 to choose the older hardware. The Pentium 2 (Klamath) came out around the middle of 1997, which probably means the best was the pentium pro 200. I suspect we are _still_ talking about 200:1

This is not about simple clock frequency improvements, more modern architectures are faster for other reasons such as better speculative execution, more pipelines, register renaming, etc...
Correct me if I'm wrong, but in moving to a time handicap you seem to be ignoring the parallel search inefficiency we were both just explaining to Louis Zulli. Shouldn't that be taken into account?
I don't see why. I used the same parallel search 10 years ago that I use today, the overhead has not changed.

The main point both Don and I have _tried_ to make is that given a certain class of hardware, one is willing or able to do things that are not possible on slower hardware. In Cray Blitz, we specifically made use of vectors, and that gave us some features we could use that would be too costly in a normal scalar architecture. So there are three kinds of improvements over the past 10 years.

1. pure hardware

2. pure software

3. hybrid improvements where improved hardware gave us the ability to do things in software we could not do with previous generations of hardware due to speed issues...
Maybe you use the same parallel search 10 years ago that you use today
but I think that other improved their parallel search so I guess that better parallel search is software improvement unless Crafty is the best software of the beginning of 1999.

I also wonder how you can be sure that you used efficient parallel search for more than 8 cores with Crafty when you even had not the possibility to use 8 cores to test in 1999.

The reason that I suggested to use 8 cores for software with equivalent strength to Fritz of january 1999(when you suggested to use top hardware of today) is that I thought that software of that time could not use more than 8 cores but if you insist on more than 8 cores then I have no objection in case that you also use software of the same time and not something that I consider to be equivalent on 1 core(but not with more than 8 cores).

If you think that old Crafty of january 1999 is the best software of january 1999 on big hardware because it can use many processors efficiently then I have no problem with using it in the test or even with using better hardware for it
in case that it can use it.
User avatar
Bill Rogers
Posts: 3562
Joined: Thu Mar 09, 2006 3:54 am
Location: San Jose, California

Re: Hardware vs Software

Post by Bill Rogers »

Hi Guys
I am sort of jumping into this posting. I have a bunch of old chess programs that some day soon I want to run against to modern ones that have already been rated.
A partial list includes:
ChessMaster 2000
ChessMaster 2100
ChessMaster 3000
MyChess by D.K.
Fritz 1 & 2
Cyrus
Pion v.1.xx
and a few others. These or should I say some of these were rated years ago on much slower PC's so I really want to see the difference faster hardware does influence thier Elos.
Bill
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Hardware vs Software

Post by bob »

BubbaTough wrote:
bob wrote: I will accept that a program today running on 4 cores will see some overhead due to the parallel search. But I don't think it is worth arguing about whether we should scale back the speed because of the overhead. That is simply a software issue as well, as it is theoretically possible to have very little overhead. If the software can't quite use the computing power available, that is a software problem, not a hardware limit.
Hmmm. I think there is a big difference between 50x speedup on 4 processors, and 200x on 1. Blaming software for not overcoming alpha-beta inefficiencies in utilizing multiple processors efficiently seems tangential. The fact is if you take an old program and put it on new hardware, it does not get 200x faster because it also cannot take advantage of the extra processors perfectly.

-Sam
The question is "what has hardware done" and "what has software done"???

On 4 cpus, the last time this was carefully measured and discussed here, I ran a _ton_ of tests and put the results on my ftp box. Martin F. took the data and computed a raw speedup of 3.4x on that data even though I usually claim 3.1x in a general case. The hardware of today, in terms of instructions per second is about 200x better than in 1998. That seems to be the only reasonable assumption one can work with. earlier it seemed everyone was convinced that the hardware improvement was _far_ smaller than the software improvement. Now we have to quibble about parallel search overhead?

:)

Can't have it both ways. At worst this is a 25% error. More likely it is a 10-12% error. Do you believe that is significant when comparing hardware vs software???
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Hardware vs Software

Post by bob »

michiguel wrote:
bob wrote:
Don wrote:
Dirt wrote:
Don wrote:
Dirt wrote:
bob wrote:First let's settle on a 10 year hardware period. The q6600 is two years old. If you want to use that as a basis, we need to return to early 1997 to choose the older hardware. The Pentium 2 (Klamath) came out around the middle of 1997, which probably means the best was the Pentium pro 200. I suspect we are _still_ talking about 200:1

This is not about simple clock frequency improvements, more modern architectures are faster for other reasons such as better speculative execution, more pipelines, register renaming, etc...
Correct me if I'm wrong, but in moving to a time handicap you seem to be ignoring the parallel search inefficiency we were both just explaining to Louis Zulli. Shouldn't that be taken into account?
None of this will matter unless it's really a close match - so I would be prepared to simple test single processor Rybka vs whatever and see what happens. If Rybka loses we have a "beta cut-off" and can stop, otherwise we must test something a little more fair and raise alpha.
If the parallel search overhead means that the ratio should really be, say, 150:1 then I don't think Rybka losing really proves your point. If there should be such a reduction, and how large it should be, is a question I am asking.
So if Rybka loses with say a 32 to 1 handicap you are saying that we should give her even less time to see if she still loses?
This is going around in circles. It is easy to quantify the hardware. I'd suggest taking the best of today, the intel I7 (core-3) and the best of late 1998. Limit it to a single chip for simplicity, but no limit on how many cores per chip. I believe this is going to be about a 200:1 time handicap to emulate the difference between the 4-core core-3 from intel and the best of 1998, which was the PII/300 processor.

For comparison, crafty on a quad-core I7 runs at 20M nodes per second, while on the single-cpu PII/300 was running at not quite 100K nodes per second. A clean and simple factor of 200x faster hardware over that period (and again, those quoting moore's law are quoting it incorrectly, it does _not_ say processor speed doubles every 2 years, it says _density_ doubles every 2 years, which is a different thing entirely). Clock speeds have gone steadily upward, but internal processor design has improved even more. Just compare a 2.0ghz core2 cpu against a 4.0ghz older processor to see what I mean.)

so that fixes the speed differential over the past ten years with high accuracy. Forget the discussions about 50:1 or the stuff about 200:1 being too high. As Bill Clinton would say, "It is what it is." And what it is is 200x.

That is almost 8 doublings, which is in the range of +600 Elo. That is going to be a great "equalizer" in this comparison. 200x is a daunting advantage to overcome. And if someone really thinks software has produced that kind of improvement, we need to test it and put it to rest once and for all...

I will accept that a program today running on 4 cores will see some overhead due to the parallel search. But I don't think it is worth arguing about whether we should scale back the speed because of the overhead. That is simply a software issue as well, as it is theoretically possible to have very little overhead. If the software can't quite use the computing power available, that is a software problem, not a hardware limit.
Then you have to accept that Fritz 5 is 622 Elo points below Rybka in current hardware. That is a bit more than the 600 points you estimate harwdare provided in 10 years.

Miguel
I don't accept that at all. That's why I suggested we run a test rather than using ratings that are very old and out of date. how many games has fritz 5.32 played _recently_ on the rating lists? That makes a huge difference and it might be better now since it is still going to beat the top programs on occasion, and with them so much higher its rating would likely drag up as well.

So let's run the test rather than speculating...


I have some Crafty versions that should be right for that time frame. Crafty 15.0 was the first parallel search version. I suspect something in the 16.x versions or possibly 17.x versions was used at the end of 1998. Crafty ran on a quad pentium pro early in 1998 whe
n version 15.0 was done...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Hardware vs Software

Post by bob »

Dirt wrote:
bob wrote:
Dirt wrote:
bob wrote:First let's settle on a 10 year hardware period. The q6600 is two years old. If you want to use that as a basis, we need to return to early 1997 to choose the older hardware. The Pentium 2 (Klamath) came out around the middle of 1997, which probably means the best was the Pentium pro 200. I suspect we are _still_ talking about 200:1

This is not about simple clock frequency improvements, more modern architectures are faster for other reasons such as better speculative execution, more pipelines, register renaming, etc...
Correct me if I'm wrong, but in moving to a time handicap you seem to be ignoring the parallel search inefficiency we were both just explaining to Louis Zulli. Shouldn't that be taken into account?
I don't see why. I used the same parallel search 10 years ago that I use today, the overhead has not changed.
I don't understand that response. I may have lost track of something in the discussion.

It seemed to me that you have said that a certain multi-core computer in 2008 was giving you 200 times the nodes per second that an equivalent single processor computer was giving you in 1998, and therefore you could simulate a match between them by using 200:1 time odds. I'm suggesting that the parallel search on the multi-core computer reduces the effective speedup to something less than that, exactly how much you would know better than I.
But isn't that a _software_ issue??? In any case, if you take the last time this was carefully measured, crafty was searching 3.4x faster on 4 cores. You can probably find that in the archives somewhere. Martin F. did the analysis of a few dozen test runs over a set of positions I made available... This was back when Vincent claimed that Crafty got zero speedup on 2 cpus, so I simply ran the same tests he ran, several times in fact, and gave martin (and everyone else via ftp) the data (I think it is still on my ftp box in fact).

So worst case take 3.4 /4.0 * 200 if you want to be picky... so we can use 170x rather than 200x.

That satisfy everyone? I am amazed that everyone _was_ claiming that hardware was a small part of the overall improvement, that software improvements dwarfed the hardware. And now we are quibbling over 170 vs 200 times faster. Getting a bit worried? :)
BubbaTough
Posts: 1154
Joined: Fri Jun 23, 2006 5:18 am

Re: Hardware vs Software

Post by BubbaTough »

bob wrote:
BubbaTough wrote:
bob wrote: I will accept that a program today running on 4 cores will see some overhead due to the parallel search. But I don't think it is worth arguing about whether we should scale back the speed because of the overhead. That is simply a software issue as well, as it is theoretically possible to have very little overhead. If the software can't quite use the computing power available, that is a software problem, not a hardware limit.
Hmmm. I think there is a big difference between 50x speedup on 4 processors, and 200x on 1. Blaming software for not overcoming alpha-beta inefficiencies in utilizing multiple processors efficiently seems tangential. The fact is if you take an old program and put it on new hardware, it does not get 200x faster because it also cannot take advantage of the extra processors perfectly.

-Sam
The question is "what has hardware done" and "what has software done"???

On 4 cpus, the last time this was carefully measured and discussed here, I ran a _ton_ of tests and put the results on my ftp box. Martin F. took the data and computed a raw speedup of 3.4x on that data even though I usually claim 3.1x in a general case. The hardware of today, in terms of instructions per second is about 200x better than in 1998. That seems to be the only reasonable assumption one can work with. earlier it seemed everyone was convinced that the hardware improvement was _far_ smaller than the software improvement. Now we have to quibble about parallel search overhead?

:)

Can't have it both ways. At worst this is a 25% error. More likely it is a 10-12% error. Do you believe that is significant when comparing hardware vs software???
I don't really have a position, and don't really want anything one way let alone two...and yes I was just quibbling. Your implication that software may be keeping up with hardware over that period time is impressive (even though you phrased it the other way around). After all, hardware is doubling capability every X years, the idea that software is also improving chess performance exponentially over that long a period is truly a testament to the incredible improvements that have been made in software. And given how immature chess programs still are, I see no reason for them not to continue this amazing level of achievement. Truly a fun area to work/play in.

-Sam
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Hardware vs Software

Post by bob »

Uri Blass wrote:
bob wrote:
Dirt wrote:
bob wrote:First let's settle on a 10 year hardware period. The q6600 is two years old. If you want to use that as a basis, we need to return to early 1997 to choose the older hardware. The Pentium 2 (Klamath) came out around the middle of 1997, which probably means the best was the pentium pro 200. I suspect we are _still_ talking about 200:1

This is not about simple clock frequency improvements, more modern architectures are faster for other reasons such as better speculative execution, more pipelines, register renaming, etc...
Correct me if I'm wrong, but in moving to a time handicap you seem to be ignoring the parallel search inefficiency we were both just explaining to Louis Zulli. Shouldn't that be taken into account?
I don't see why. I used the same parallel search 10 years ago that I use today, the overhead has not changed.

The main point both Don and I have _tried_ to make is that given a certain class of hardware, one is willing or able to do things that are not possible on slower hardware. In Cray Blitz, we specifically made use of vectors, and that gave us some features we could use that would be too costly in a normal scalar architecture. So there are three kinds of improvements over the past 10 years.

1. pure hardware

2. pure software

3. hybrid improvements where improved hardware gave us the ability to do things in software we could not do with previous generations of hardware due to speed issues...
Maybe you use the same parallel search 10 years ago that you use today
but I think that other improved their parallel search so I guess that better parallel search is software improvement unless Crafty is the best software of the beginning of 1999.

I also wonder how you can be sure that you used efficient parallel search for more than 8 cores with Crafty when you even had not the possibility to use 8 cores to test in 1999.

The reason that I suggested to use 8 cores for software with equivalent strength to Fritz of january 1999(when you suggested to use top hardware of today) is that I thought that software of that time could not use more than 8 cores but if you insist on more than 8 cores then I have no objection in case that you also use software of the same time and not something that I consider to be equivalent on 1 core(but not with more than 8 cores).

If you think that old Crafty of january 1999 is the best software of january 1999 on big hardware because it can use many processors efficiently then I have no problem with using it in the test or even with using better hardware for it
in case that it can use it.
I do not believe _any_ commercial program has a better parallel search than what is in Crafty, and what has been in it for 10+ years. There have been changes made, but in 1998 there was no NUMA hardware (AMD started this in the X86 world) so the more recent NUMA-related stuff is completely irrelevant to the 1998 discussion or even todays intel core-3 (I7) processor... In that light, Crafty's parallel search today is almost identical to what it was in 1998.

Again, I have suggested (a) P2/300 single-chip since that was the best available at the end of 1998. And for today, the latest is the Intel Core-3 (I7). the raw speed difference between the two, using Crafty as a benchmark, is a factor of 200. If you want to measure a set of positions with time-to-solution used rather than raw NPS, then today's hardware is around 170-175 times faster. Note this is a single-chip discussion, where in actuality, I can put together far larger systems today than I could in 1998. I elected to keep this simple by using a single chip, which is actually a little unfair to the "hardware side".

The best chip of 1998 was a single core. The best of today is a quad-core. In 1998 you could buy a dual-chip p2/300 if you wanted, I had one. Today you can easily buy a quad core-3 box with 16 cores, which is yet _another_ factor of two. So we should maybe go with a factor of 340:1 rather than 170:1. And larger configurations are available from places like Sun, etc. So we could make that 1,000:1 if you want..


You are talking yourself into one hell of a deep hole here. I believe that with 340:1 time odds, we could take an old buggy gnuchess and give Rybka absolute fits...

I have not studied this much, but I started a test earlier today giving glaurung 1+1, and crafty 100+100 for the time controls. I completed 450 games before quitting, and the result was one draw, the rest wins. with 450 games, if crafty was less than 600 Elo better than Glaurung 2, I would have expected about 1 loss out of every 64 games. For 800 I would expect 1 loss for every 256 games. one loss out of 900 games gives an idea of just what 100:1 does... I'm not sure this experiment will really be that interesting. And considering it should be either 170 or 340 depending on which hardware we consider, it only gets worse.

I have not checked the SSDF to see where G2 (most recent version) compares to Rybka. But it had better be at least 600 worse or this is not going to be so interesting IMHO.

But I do think it would be interesting to assess Elo gain for hardware vs for software, just so we would know. I know the programmers want to take credit for most of the gains. But I'll still bet that the engineers are responsible for the biggest gain...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Hardware vs Software

Post by bob »

BubbaTough wrote:
bob wrote:
BubbaTough wrote:
bob wrote: I will accept that a program today running on 4 cores will see some overhead due to the parallel search. But I don't think it is worth arguing about whether we should scale back the speed because of the overhead. That is simply a software issue as well, as it is theoretically possible to have very little overhead. If the software can't quite use the computing power available, that is a software problem, not a hardware limit.
Hmmm. I think there is a big difference between 50x speedup on 4 processors, and 200x on 1. Blaming software for not overcoming alpha-beta inefficiencies in utilizing multiple processors efficiently seems tangential. The fact is if you take an old program and put it on new hardware, it does not get 200x faster because it also cannot take advantage of the extra processors perfectly.

-Sam
The question is "what has hardware done" and "what has software done"???

On 4 cpus, the last time this was carefully measured and discussed here, I ran a _ton_ of tests and put the results on my ftp box. Martin F. took the data and computed a raw speedup of 3.4x on that data even though I usually claim 3.1x in a general case. The hardware of today, in terms of instructions per second is about 200x better than in 1998. That seems to be the only reasonable assumption one can work with. earlier it seemed everyone was convinced that the hardware improvement was _far_ smaller than the software improvement. Now we have to quibble about parallel search overhead?

:)

Can't have it both ways. At worst this is a 25% error. More likely it is a 10-12% error. Do you believe that is significant when comparing hardware vs software???
I don't really have a position, and don't really want anything one way let alone two...and yes I was just quibbling. Your implication that software may be keeping up with hardware over that period time is impressive (even though you phrased it the other way around). After all, hardware is doubling capability every X years, the idea that software is also improving chess performance exponentially over that long a period is truly a testament to the incredible improvements that have been made in software. And given how immature chess programs still are, I see no reason for them not to continue this amazing level of achievement. Truly a fun area to work/play in.

-Sam
I think you misunderstood what I have been saying. I do not believe that software has come even _close_ to keeping up with hardware, in terms of Elo improvement. If I were guessing, I would suspect a 2/3 - 1/3 split and that is probably optimistic. That is, if you assume that over the past 10 years programs have gained +600 Elo (not my number, someone else made that statement) then I believe that 400 came from hardware, 200 from software.

I have posted bits and pieces suggesting this over the past few months. Some think that ideas like null-move are worth +200. It is not. LMR is worth 200. It is not. They are certainly all worth something. But not nearly what everyone believes. I know programmers don't want to hear that, but as a more impartial observer (even though I am obviously a long-time chess programmer) I am well aware of what hardware has done, having watched it from the late 60's to date...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Hardware vs Software

Post by bob »

Don wrote:
bob wrote:
Don wrote:
bob wrote:That is why I made my suggestion. I _know_ that I ran on a P2/300 xeon in 1998 late. The box I had was a dual CPU, but I would agree that one "chip" would be the best test. And I also ran on an I7 at some odd clock speed, 2.9xx ghz a couple of months ago. With 4 cores and hyperthreading disabled. And I hit around 20M on that box.

Hence my factor of 200:1 for 1998 vs 2008 which seems to be reasonable. And that is a _substantial_ hurdle for a good program vs a bad program to overcome, if the bad program gets the 200:1 odds. SO we can learn what the hardware has offered. But we are left with software. I can probably dredge up a 1998 era version of Crafty if I can figure out what was current at the time. I know I have a 1996 version that was run in Jakarta so I can probably come close. And in Jakarta Crafty finished in the top 4 or 5 at the WMCCC event so it was very competitive at the time (and not running on a dual-cpu box either, it used a single cpu pentium pro 200). So measuring the software improvement from 1998 to present could be approximated by taking Crafty of 1998 vs Rybka of today. But I can't run Rybka not having it. I suggested agreeing on how much better Rybka is than Glaurung 2 and then using that, which I do have and can run hundreds of games at a time on the clusters here...
If you can get me a linux version of that particularly Crafty, I can run the test on my 64 bit linux machine as I do have Rybka 64 bit.
Any version I can find is linux-compatible since that is how it has been developed. Also winboard compatible although going back to 1998 will mean no protocol version 2...

Let me search to see what I can find first, as I have to figure out what versions were current in 1998...
I just remembered that my tester is UCI based - but I could make an adaptor to go from xboard to UCI if one doesn't already exist.
If you can do it, I can find a 1998 era crafty for sure now, I am hoping I can somehow translate version number to date. Perhaps some of the zip files I have will have the actual dates on them...

I think 170:1 is now a safe number to use, as that factors in the parallel search overhead loss and drops 200:1 down to 170:1...