Crafty tests show that Software has advanced more.

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Crafty tests show that Software has advanced more.

Post by bob »

Don wrote:
mhull wrote:I wonder why you don't bring up the fact that Rebel could have been compiled to run on faster hardware in 1995, but instead you continue with your P90 comparison with modern hardware. It seems like you are applying a double-standard to crafty in this instance.
I just don't understand your point. Rebel was probably compiled in 1995 to be as fast as it could be at the time, so how could Ed have compiled it to be even faster? I'm sure he would have if he could have. Are you claiming that he didn't know what he was doing?

I explained the 64 bit vs 32 bit issue and someone else explained it better than I did.
His point was simple. Today, we have 8 new registers in X86. But not unless the program is compiled with a 64 bit compiler and run on a 64 bit OS. And suddenly it will run significantly faster because of a reduced number of register spills required during optimization. That is independent of the actual 64 bit data width issue. But if you don't recompile it, you totally ignore a significant hardware feature present for several years now.

Now that wasn't that hard to understand, was it? Take an old program, run it on new hardware, in a crippled mode, and sure it will look like a piece of crap. I've explained this several times, yet you insist on repeating the mistake over and over. Rebel will run faster on today's hardware if you re-compile it. Oh, but you can't because you don't have the source? Then you simply picked the wrong benchmark, as I have repeatedly stated. Crafty, on the other hand, _can_ be compiled to use the new registers without changing one line of code. And it helps in two ways. replacing long long with long eliminates the duplicate instructions needed for AND/OR/XOR/SHIFT stuff. And it gives us 8 more registers to better optimize memory accesses with as well.

You might as well find an old perl chess engine and use that to make the comparison. It would be no less valid.


By the way, the 100 to 1 figure is not accurate and I already admitted that in one of my posts. To be a (more or less) fair comparison, Rebel should be compiled on an i7 with a modern compiler. Then it would show more than 100 to 1 but it would surely still be less than 200 to 1. Unfortunately we don't have the means to do this so I went with what I could get. However Bob has the means with his old Crafty version, unfortunately that overstates the hardware difference by breaking Bob's own rule that to be fair the program you are comparing should be optimized equally well for the hardware they are running on.

I think you might understand what I am saying if I cast the issue differently. Suppose you purposely design a program to run especially slow on a 32 bit computer but to run fast on a 64 bit computer? Wouldn't it look especially good going to 64 bits?
However, I didn't do that. I designed Crafty to be as fast as possible on 32 bit platforms. And it is as fast as anything I have seen, on those boxes. So saying it didn't run well on 32 bits is wrong. It does run better on 64 bits for several reasons, but it ran quite well on 32 bits for _many_ years.


So basically ANY 64 bit program is designed to run especially slow on a 32 bit operating system and does not take full advantage of a 32 bit computer. The authors CHOSE that design knowing full well that it was not the best way to write a program that runs on a 32 bit computer. Bob Hyatt knew back in 1995 that Crafty would not run as well on a 32 bit computer but was smart enough to design for the future - knowing that in the short term his program would take a hit.
So you believe that Slate did chess 4.0, using 64 bit words, knowing it would _never_ be efficient? Even though it was the fastest program of the era in spite of needing two instructions to update a 64 bit bitboard? But wait, that is exactly what happened with Crafty. I felt bitboards were _superior_ not because of coming 64 bit hardware, but because they had some properties not available in normal array-based board representations.

Crafty was never designed with the idea "OK, this will suck until 64 bit cpus are available." It was designed, instead, with the idea "OK, this is going to be very fast on 32 bit hardware and will be at least as good as the non-bitboard programs in terms of speed, hopefully better. And one day it will go even faster when 64 bit hardware comes around."

So stop telling everyone why I designed Crafty as I did, how I designed it as I did, and leave that to the one that actually wrote the thing. Slate never thought he would have 64 bit CDC equipment, and never did. Yet he went that route because he thought it was better. I happen to agree. It is better whether you use 32 bit or 64 bit hardware. It is just better still on 64 bit hardware. But it hardly sucks on 32 bit, which is why you are incorrectly implying.



That's a very reasonable decision, but it makes comparisons on old hardware off by something like 3 to 2.
I have no idea where such a number comes from, and would not use numbers that don't have a reliable origin or source. When you start compounding over many years, and the number you are compounding has a high error margin, you end up with garbage.




Even so, the point you are making about could-haves in 1995 has already been addressed here. And your Rebel test is a tacit admission of the validity of that point, IMO. The point is that it matters not what could have been. It's totally beside the point.
I don't think you really get it.

For instance, I still have a Crafty 9.x executable that I used to run on my Mac Plus (68000 at 8 Mhz). I still have this and it still runs on my old 68030 Mac. A comparison between crafty on that hardware could be made (in principle) with modern compile on the latest Intel machine (if we had the source for that version). Performance could be compared to a modern compile, just like with the 1995 Intel-targeted crafty that Bob is doing now. ELO could be estimated for both platforms and the HW speed delta determined.
Bob did something like that already. I think a reasonably valid way for Bob to recalibrate is to make a 32 bit version of the NEW program and compare the NPS to the old logs. A 32 bit program is not optimal on 64 bit hardware, but a program written to be optimial only on a 64 bit platform is not optimal on 32 bit hardware either, so this test would be closer to the truth.

But does it really matter now? Bob has been proved wrong using his own data so it's a moot point.
For the new members, "hand waving" != "proof" in any venue I am aware of.


So I confess my own failure to understand what more this particular hobby horse you are riding can possibly tell us, except that you are very upset about some something beyond my feeble grasp.
I'm not upset at all - I'm only slightly frustrated that I cannot seem to explain a simple concept.
User avatar
mhull
Posts: 13447
Joined: Wed Mar 08, 2006 9:02 pm
Location: Dallas, Texas
Full name: Matthew Hull

Re: Crafty tests show that Software has advanced more.

Post by mhull »

Don wrote:I don't think you really get it.
I get it totally. You think crafty in IA32 is crippleware, and so the speedup is more than for single-threaded, 32-bit optimized programs.

But none of those 32-bit optimized programs are in active development today and Rybka is bit-boards too, you know, just like Crafty. So if you want' to run Rybka on IA-32, then that's crippleware too.

And those MC68000 programs are also dead. Jeez, I guess we can't have a fair comparison? Oooooo-kay.

So I see your point, I just don't see its how it has any relevance to the real issue.
Matthew Hull
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Crafty tests show that Software has advanced more.

Post by bob »

mhull wrote:
Don wrote:I don't think you really get it.
I get it totally. You think crafty in IA32 is crippleware, and so the speedup is more than for single-threaded, 32-bit optimized programs.

But none of those 32-bit optimized programs are in active development today and Rybka is bit-boards too, you know, just like Crafty. So if you want' to run Rybka on IA-32, then that's crippleware too.

And those MC68000 programs are also dead. Jeez, I guess we can't have a fair comparison? Oooooo-kay.

So I see your point, I just don't see its how it has any relevance to the real issue.
I do not believe Don is actually this dense. But he is acting the part. Does _anybody_ believe that these two questions:

(1) what has hardware done to computer chess skill over the years?

(2) what part has software added to that skill increase?

Have exactly _one_ answer? Don is certainly implying that he believes this. That we can come up with one number that says what we all got from hardware vs from software.

But what about:

(1) non-parallel programs? They don't get as much from new hardware as parallel programs do.

(2) poor parallel implementations? They don't get as much from new hardware as good implementations do.

(3) poorly written programs. They don't get as much from their programmer as well written programs get.

So one-size-fits-all is a stupid concept. It is perfectly reasonable to believe that for each chess program around, each gets a different fraction of its total skill from software vs hardware. Clearly Crafty gets a bunch. 1000x is a huge Elo gain. Poorly written programs (poorly written in that they don't take advantage of major parts of current architecture) will get much less. That does _not_ make my hardware gain wrong. Or imaginary. Or exaggerated. Doesn't affect _my_ gains from hardware whatsoever.

These numbers are likely unique for each program. But, for _reasonable_ programs, (those that use major parts of today's processor architecture) I'd suspect that there is significant correlation between the two parts. If you want a distorted number and choose to test a program that was compiled before we had r8-r15, or before we had 64 bit registers at all, then it is going to get a lot less from new hardware than sensible programs get. So why bother with those numbers?

I'm sure I can state with certainty that Crafty has gotten more from hardware than software over the last 15 years, based on hardware being 1000x+ faster, while software has given me +360 Elo.

I am going to try to quantify this 1000x gain, but I am going to try it in steps. By playing 10.18 using 1/4 the normal time, vs the gauntlet using normal time. That is 2 doublings. Should be able to do that 2 or 3 times (dividing by 1/4) before the numbers get too small. By the time I get to 6 doublings, a trend should be visible that might make going all the way back to 10-11 doublings unnecessary...

More later.

Note that these numbers will be, as I have _always_ said "Crafty numbers". But they will be balls-on accurate crafty numbers, not guesses, or estimations, or hyperbole.
User avatar
mhull
Posts: 13447
Joined: Wed Mar 08, 2006 9:02 pm
Location: Dallas, Texas
Full name: Matthew Hull

Re: Crafty tests show that Software has advanced more.

Post by mhull »

To get some sense of the ELO delta between crafty on the oldest hardware we could find versus fairly recent hardware, try running old harware 10.x under scrappy and new hardware 10.x under crafty on FICS or ICC, and let the folks have at them, perhaps with no computers allowed to play them.

Would the resulting average ELO delta be accurate enough to determine ELO attributable to hardware speed up?

Then maybe run newest crafty on the old hardware and measure the software ELO delta between it and 10.x.
Matthew Hull
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Crafty tests show that Software has advanced more.

Post by Don »

bob wrote:
Don wrote:I didn't really expect that Bob's test would show this as I consider his test rather biased in favor of hardware. Nevertheless, it is still showing that software is a bigger contributor to computer chess advancement over the years than hardware is.

Here are some of his intermediate results:

Code: Select all

   Crafty-23.4        2703    4    4 30000   66%  2579   22% 
   Crafty-23.3        2693    4    4 30000   65%  2579   22% 
   Crafty-23.1        2622    4    4 30000   55%  2579   23% 
   Glaurung 2.2       2606    3    3 60277   46%  2636   22% 
   Toga2              2599    3    3 60275   45%  2636   23% 
   Fruit 2.1          2501    3    3 60248   32%  2636   21% 
   Glaurung 1.1 SMP   2444    3    3 60267   26%  2636   17% 
   Crafty-10.18       2326   19   19  1327   20%  2580   14% 
Here is the calculation to show that software is the bigger contributor:

It's well known that each hardware doubling is worth about 60 ELO of rating improvement. (For example Crafty running on a quad is almost exactly 100 ELO stronger than the single processor equivalent program.)

Bob's test shows that Crafty gained 377 ELO with small error margins. Bob agreed that we should add about 300 ELO to represent true Software advancement because Rybka 4 represents the state of the art in 2010 and it's over 300 ELO stronger than Crafty.

So this test estimates that we have gained 377 + 300 = 677 ELO over a 15 year period.

So the question is how much speed do we need in order to gain 677 ELO if a doubling is worth 60 ELO?

677 / 60 = 11.3 doublings. 11.3 doublings is a factor of 2521. We need a computer well over 2,500 times faster to get 677 ELO.

Bob estimated that hardware increased only 1500 times. Therefore more of the improvement has come from software than hardware using his estimates of hardware improvements.

I would like to mention that I believe Bob's numbers are flawed for several reasons I will briefly outline here and in fact the software is even MORE than Bob estimates.

The first reason is that his numbers do not reconcile with a test I did using Rebel. We compared rebel on old and new hardware. The 1 processor speedup for Rebel is about 100 to 1. Allowing for running on a Octal, you could multiply this by 8 to get 800 to 1. For chess, an octal does NOT give you a true 800 to 1 speedup but Bob is using the Nodes per second calculation anyway. This number still disagrees with Bob's number by about 2 to 1.

Another reason Bob's numbers are distorted is that he decided arbitrarily which machines should be compared. It's a question of defining something to remain a constant such as price, form factor, etc. For example we could say that anything you can purchase for less than 1000 bucks, or anything that is called a "workstation" and that you can easily move around. Of all the possible things to remain constant and with much hand waving he decided the constant should be that it must be Intel hardware. Of all the possible things to compare, this is the one that exaggerates the difference the most. In 1995 more powerful machines were available than the P90, so calling the P90 state of the art is a joke. But calling the i7 state of the art is not.

Fine. For state of the art I choose deep blue. 1,000,000,000 nodes per second. Pick _any_ 1995 platform you want and let's compare speedup. Or turn it around and I pick a Cray T932 which is more computer than any single chip PC today in any type of measurement. So we have 0 hardware improvement.

Or we use the machine that _everybody_ was using in 1995, which was Intel/windows, and we use the machine that _everybody_ is using today, which is the i3/i5/i7.

Personally, I have no problem determining which is the test to run. Everybody that has run on a T932, raise your hand. Looking around I see _one_ hand up. Everyone that has run on the big SP cluster with special-purpose chess processors, raise your hand. Again, I see _one_ hand up.

The point is that if you ask _anyone_ here what they were using in 1995, from the SSDF list, to ICC, to WMCCC/WCCC events, the most common answer, by probably 30-1 is going to be an Intel PC. That's the machine class almost everyone today is using as well. So the noise about the alphas and such is pure nonsense. Because if we include alphas we have to include every other rarely used box, of which there are many, and they were/are extremely fast. And extremely expensive...
I don't give a hoot about what everyone was running - I wasn't running on a P90 back then, I was running on an Alpha. The issue is about software advancement and hardware advancement. If you want to know how much hardware has advanced since a certain date you have to pick hardware that represent the state of the art. What's so difficult to understand about that?

If you want to compare hardware that doesn't represent the state of the art in 1995 to hardware that does in 2010, then go ahead. Just don't pretend that it's a correct comparison.

If it will make you happy then we can stop saying that we are talking about hardware advancement and say that this is about Intel advances.





In order to measure the hardware difference Bob chose to use 2 different versions of Crafty, both of which are optimized to run on 64 bit systems.

Jeez, Don, can't you at least read and get this right? My speed comparison was with crafty 10.x from 1995. I had numbers for the P5/133, I ran it on my hardware to get the speedup today. Not two different versions. The _exact_ same version. With about two dozen changes to add the xboard protocol changes to make it work on my cluster. Not a single change to the engine itself.
The main problem is the 64 bit vs 32 bit difference. It was just a few hours ago that I found out that you succeded in recompling the old version and I was still going by your earlier statements.

However that is a very MINOR issue compared to the 32 vs 64 bit issue.



Why do you insist on continuing to make such a stupid statement (two different versions.) It is _clearly_ false. I doubt a single person here (perhaps excepting yourself) has somehow misunderstood that specific detail, which has been explained enough for anyone to finally see the light.

To emphasize: version 10.x was run in 1995 on 1995 hardware. I had a few test positions that backed up my 30K recollection. I ran the same few positions using that same version, but used my E5345 (single cpu) machine. It ran at 4M nodes per second. I ran crafty 23.4 on the same positions, same processor. Almost exactly the same NPS. I posted the numbers yesterday.

Again: 10.x on P5/133 searches 30K. On a P5/90, 20K. On an E5345 single CPU, 4M. It will take a little work to get the smp search to work, because the pthread library changed from way back and it doesn't compile cleanly. In addition, the lock stuff (xchg lock) has to be modified to work with 64 bit stuff as it uses the wrong register names. Those versions scaled perfectly with NPS, as the current version does. So 30M+ is the expected number. I will verify this once I get the pthread stuff to working over the next couple of days.

Should I repeat it one more time? Not two different versions. _same_ version.
No need to repeat - but you need to explain how breaking your own rule is fair all of a sudden. The P133 hardware is 32 bit. The i7 is 64 bit. The chess program is 64 bit. YOU are the one that insists that to be fair the programs being compared should be optimized to run on the hardware they were designed for. The program you are using was NOT optimized for a P133, It was optimized for a future 64 bit machine.


As coincidence would have it, Crafty runs on 64 bit hardware and looks especially good on 64 bit hardware. So he looks at some log files and eventually produces the number 1500 as the value for how much hardware has advanced over the last 15 years and claims he is being generous to do that. The log files show the speed of a 1995 Crafty running on 32 bit hardware. But even back then Crafty was designed to run on a 64 bit machine.
That is a false statement. Crafty was designed to use 64 bit values for the bitboards. It was _designed_ to run on 32 bit hardware, which was what we had in the PC world back then. You only have to look at the move generation stuff (COMPACT_ATTACKS, USE_SPLIT_SHIFTS, etc) that was explicitly designed to work efficiently on 32 bit boxes. Yes it gains some on 64 bit hardware. But in 1995 it was most certainly designed to run well on 32 bit hardware.
You designed it for the future, not the present in 1995. You were all over the forum with that 15 years ago.

Your program is like Stockfish, it was designed specifically for 64 bit architectures but you did what you could to make it run as well as possible on 32 bit machines. But a bitboard program will never run as well on a 32 bit machine as a mailbox program. For example which 64 bit programs could come close to Fritz and Nimzo in nodes per second 15 year ago on 32 bit hardware?

Hell, I'd bet you dollars to donuts you had 64 bit stuff in your 1995 code. Hashing, anyone? I've always used 64 bit hashing. Back then I did it as two 32 bit chunks, but it would clearly fit 64 bit hardware better. So was _your_ stuff designed for 64 bits only? Didn't think so.

I had 32 bit "mailbox" style program and 64 bit programs. I've done it all, been there done that as they say.

But the 64 bit programs have always run like a dog on 32 bit hardware. You can do some things to improve that situation but you can never quite get the full speed of a true 32 bit program on a 32 bit platform.


Moores law is a much perverted and misquoted and reformulated statement of how quickly transistor density changes over the years. I think Moore said that density doubles every 18 months and then way back in 1975 modified his own "law" to every 2 years. It has often been loosely translates that performance doubles every 18 months. This was actually a reformalation based on observation by an Intel colleague of Moore's. In fact, performance on average does NOT double every 18 months, it takes longer. (I have NEVER seen a doubling in performance when I upgrade even every 2 or 3 years although sometimes it's close.)

So Bob's estimate is not in harmony with this (admittedly crude) rule of thumb that nevertheless is widely accepted. Over 15 years even if you assume a full doubling every 18 months you would get 1024 improvement. I think almost everyone things 18 months is on the very generous side.
Keep saying that to yourself enough, and perhaps you will believe it. But I have no "estimate". I have an absolute measured value. Taking the P5/90 on one end, and a 6-core i7 on the other, the speed increase for Crafty is 1500x. I did not claim that was the speed gain for any other program. I don't care about any other program. I did not 'estimate' a thing, I simply took out my "ruler" and measured both as accurately as possible. What you are talking about is something that might have been typed by a roomfull of monkeys, because it is valid words, and somewhat valid grammatical constructions, but the meaning is missing.

So get off the "estimation" and "fabrication" and "exaggeration" bandwagon and offer something useful and logical. I've explained my numbers. Feel free to shoot either the 22K or the 30M numbers down. We can certainly get a 3rd party to verify the 10.x on current hardware. We've already had confirmation by someone running crafty and seeing 22K nps on I think a P5/100mhz. Which is right in line with 20K at 90mhz and 30K at 133mhz.

So shoot at what you think is wrong, but don't try to restate what I have done, I have been precise in what I have measured. And it is nothing at all related to what you are claiming I have measured.
It's not your numbers that are off, it's your methodology for the reasons I have stated. You picked a very specific thing to measure, and no doubt measured it accurately, but my contentions is that you just measured the wrong thing.

I am also not yet ready to grant Rybka another +300 Elo on software improvements.
I was expecting that you would try to back out of this sooner or later.

Look at the numbers on all the ratings list, Crafty is more than 300 ELO weaker than Rybka 4. I'm interested to hear about how you will also find that invalid.

It may well be that Crafty has a serious flaw somewhere.
That doesn't affect the +300 figure as you have never been within 300 of Rybka 3 or Rybka 4.

However it could affect the relative difference in the modern vs the old Crafty if the old one is broken.

Nevertheless, you have been proved wrong anyway. Even if you are off by 100 ELO the point has already been made that software and hardware are roughly equal in their contributions to the success of computer chess to any reasonable degree of measurement.

By the way, your 1500x figure should be taken as a figure that is too high.
Here is what you said:

Taking the P5/90 on one end, and a 6-core i7 on the other, the speed increase for Crafty is 1500x.
This is a nodes per second increase in speed based on using a modern 6 core machine and comparing it to a single P90. Like I say, this is probably an ACCURATE figure for estimating the nodes per second increase but it's not an accurate figure for measuring how much ELO you should gain which is the relevant point.

Nevertheless, that actually doesn't change the number that much but it does some.

For Crafty, it appears that 2 doublings due to more cores (going from 1 to 4) is worth 50 ELO per doubling. Of course with additional cores it's worth even less.
Ideal would be to run rybka thru the same test I am doing. But that's not an option for me since there is no source available.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Crafty tests show that Software has advanced more.

Post by Don »

bob wrote:
Dirt wrote:
Don wrote:I think people overestimate how much hardware has to improve to get a small improvement in ELO strength. If you can find 2 ELO per month, you are just about staying even with hardware advancement.
Good point. I realized that software gains were probably more important than hardware speed-ups a couple of years ago when Uri Blass pointed out that the SSDF results implied it. Kudos to the programmers.
Now, if only that math were correct, everyone would be happy. But 2 Elo per month is nonsense.
Crafty improved 277 in 15 years by your own metric. That works out to almost exactly 2 ELO per month. This is SOFTWARE improvements, not combined. As you have already showed us, hardware has improved even LESS, but I am trying to be generous by saying that hardware and software are equal contributers. This gives 4 ELO per month combined.

So even if it turns out software is slightly more or hardware is slightly more, the contribution for each is close to 2 ELO per month - give or take half an ELO or so.

It's true that your program has not improved as much as others as you have pointed out. But even so we are probably still within half an elo in any directions. Perhaps Fritz has improved 2.1 per month and Crafty only 2.0 per month or something like that.


Typically we have seen 60-70 Elo for every doubling of speed for many years. Thompson and then Berliner ran these kinds of tests several times. If you figure 2x speed every 2 years, which is actually slow (for some of us, anyway, more in a minute) that is about 30-35 Elo per year purely on speed. Or 3 Elo per month.
Well then you had better reconcile you numbers. I have already used your data to show that hardware are software is roughly equal - the numbers show if anything that software is the dominant improvement.

Or, if you take the speedup numbers I have actually measured using 10.18, you get 1500x. But let me back off of that a bit. 250x is pure single-cpu speed gain, going from 20K in 1995 to 4M on one cpu in 2006 hardware, more on an i7 but I don't have any single CPU numbers so let's stop at 4M. 20K to 4M is 200x faster. using 8 cpus, Crafty's speedup is roughly 1 + 7 * .7 = 5.9 or a bit better. Call that 6x. So 200x times 6 = 1200x effective speedup, which doesn't give credit for all cores, and factors in parallel search efficiency. 1200x is 10+ doublings. That will definitely be more than 360 Elo for me. In a few days I can say exactly how much more once I figure out a way to test at 20K on modern hardware, and do it correctly...
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Crafty tests show that Software has advanced more.

Post by Michael Sherwin »

Don wrote:
Michael Sherwin wrote:Another valid test to give data on this issue is to play Komodo against a time handicapped R4 to see what handicap would make them even.
Rybka is much stronger than Komodo. The two issues are software and hardware so this test would have to somehow relate the two, right?

I think people overestimate how much hardware has to improve to get a small improvement in ELO strength. If you can find 2 ELO per month, you are just about staying even with hardware advancement. You can look back 10 years and notice that computers are enormously faster than they used to be and attribute all of that to why they play stronger, but the 2 ELO per month is apparently just as much of the reason as Bob's experiments indicate.
The fact that R4 is stronger in software than Komodo (or Crafty or ...) is the point. Now take the difference in elo and theoretically compute how much hardware improvement is needed to equal R4. Simulate that hardware difference by handicapping R4. See on what side of equal the results fall on. That should give some data as to whether hardware improvements or software improvements are more valuable.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
User avatar
mhull
Posts: 13447
Joined: Wed Mar 08, 2006 9:02 pm
Location: Dallas, Texas
Full name: Matthew Hull

Re: Crafty tests show that Software has advanced more.

Post by mhull »

Don wrote:The main problem is the 64 bit vs 32 bit difference.
If you believe that, then why are you testing 32-bit Rebel on 64-bit modern hardware? That's rendering Rebel as cripple-ware, because its not optimized for 64-bit. So it's unfair by your definition.

No matter what program you choose, the same hardware boundary becomes an issue according to your argument.

Why can't you just measure speedup, regardless of hardware, in terms of 10x, 20x, 100x, etc. and ELO at those speedups for the same version? Isn't this what it all really boils down to?

Or do you just like to argue for its own sake? ;)
Matthew Hull
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Crafty tests show that Software has advanced more.

Post by Don »

mhull wrote:
Don wrote:The main problem is the 64 bit vs 32 bit difference.
If you believe that, then why are you testing 32-bit Rebel on 64-bit modern hardware? That's rendering Rebel as cripple-ware, because its not optimized for 64-bit. So it's unfair by your definition.
A 32 bit program is not crippled on a 64 bit machine. Run a 32 bit program on a 32 bit machine and then time it on a 64 bit machine and you will see it runs just as well.

Then do the same experiment with a 64 bit program and your eyes will be opened.

There is this argument that 32 bit is not the right way to write a program that runs on a 64 bit machine. But I don't think anyone has actually proved that. It's difficult to prove because it's a whole different way of writing a program so you cannot just compare 2 programs.

The primary argument in favor of 64 bit is Rybka, the best program happens to be 64 bit. But I have no doubt whatsoever that had Rybka chose the 32 bit way it would still be strongest programs.

My personal belief? I think 64 bit is probably a slight advantage on 64 bit hardware, but 64 bit programs are mostly a fad inspired by the fact that Rybka is 64 bit. There is no proof either way. If something came out much stronger than Rybka and it was written as a 32 bit program, you would almost certainly see a bunch of new 32 bit programs.

Most of the stuff in computer chess is inspired by a combination of fad and what works. When it's not clear authors go with what "fruit" or some other program does.

No matter what program you choose, the same hardware boundary becomes an issue according to your argument.
I agree. The Rebel comparison is not fair and the Crafty comparison is not fair either.

Why can't you just measure speedup, regardless of hardware, in terms of 10x, 20x, 100x, etc. and ELO at those speedups for the same version? Isn't this what it all really boils down to?

Or do you just like to argue for its own sake? ;)
We can measure speedup on any individual program and get an accurate number FOR THAT PROGRAM but it is no good for measuring how much the hardware has improved in general.

As you already can clearly see the Rebel speedup comes out different than the Crafty speedup.

Unfortunately, we have been struggling with this because Rebel isn't representative because it was not recompiled for modern hardware and Crafty is not representative because it does not represent the best way to write a 32 bit program. Bob's rule here is that you should use the best representative program for the hardware.

This point is so clear I don't see how you are not getting it. What we SHOULD be arguing is whether Crafty is representative or not. Bob is now trying to make a claim that it is. Even though I disagree on this, at least it's an appropriate question.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Crafty tests show that Software has advanced more.

Post by bob »

Don wrote:
bob wrote:
Dirt wrote:
Don wrote:I think people overestimate how much hardware has to improve to get a small improvement in ELO strength. If you can find 2 ELO per month, you are just about staying even with hardware advancement.
Good point. I realized that software gains were probably more important than hardware speed-ups a couple of years ago when Uri Blass pointed out that the SSDF results implied it. Kudos to the programmers.
Now, if only that math were correct, everyone would be happy. But 2 Elo per month is nonsense.
Crafty improved 277 in 15 years by your own metric.
Where are you getting 277???

I suppose I can keep re-posting. Here is the results I posted previously:

Code: Select all

    Crafty-23.4-2        2749    3    3 30000   66%  2626   22%  
    Crafty-23.4-1        2746    3    3 30000   66%  2626   22%  
    Crafty-10.18-1       2388    4    4 30000   22%  2626   14%  
    Crafty-10.18-2       2387    4    4 30000   22%  2626   14%  
If you take 23.4 as 2748, the rounded average of the two values above, and take 10.18 as 2388, again the rounded average, then 2748 - 2388 = 277 in your math system? In mine it is 360 in mine. When you use your 277 / 15 is also not 24 to get your 2 elo per month.

That works out to almost exactly 2 ELO per month. This is SOFTWARE improvements, not combined. As you have already showed us, hardware has improved even LESS, but I am trying to be generous by saying that hardware and software are equal contributers. This gives 4 ELO per month combined.
Again, you accuse me of "showing something" I did not show. I _did_ clearly show that Crafty today is roughly 1200x faster than in 1995, factoring in the SMP loss to get a realistic speed improvement for SMP search. 1200x is 10+ doublings. Do you now say that a single doubling is less than36 Elo? Because that's what you have to have to believe to say my 1200x speedup is < 360 Elo provided by software alone.

Personally, I don't believe that for a minute. And I am getting ready to make a couple of runs, halving the time Crafty has for each run, to see what 2 "halvings" does to the Elo. For your statement to hold water, you'd better be hoping for something like -25 Elo for running 1/2 as fast. You aren't going to get it I don't think...



So even if it turns out software is slightly more or hardware is slightly more, the contribution for each is close to 2 ELO per month - give or take half an ELO or so.
First, software = 360 over 15 years, or

It's true that your program has not improved as much as others as you have pointed out. But even so we are probably still within half an elo in any directions. Perhaps Fritz has improved 2.1 per month and Crafty only 2.0 per month or something like that.


Typically we have seen 60-70 Elo for every doubling of speed for many years. Thompson and then Berliner ran these kinds of tests several times. If you figure 2x speed every 2 years, which is actually slow (for some of us, anyway, more in a minute) that is about 30-35 Elo per year purely on speed. Or 3 Elo per month.
Well then you had better reconcile you numbers. I have already used your data to show that hardware are software is roughly equal - the numbers show if anything that software is the dominant improvement.
Interesting since you are using data _I_ have not yet provided, I have only provided an actual hardware speedup factor of 1200x from 1995 to present. I have not tried to convert that to elo, other than one anecdotal attempt to say that 10 doublings is at least +500 Elo and closer to +700 if you take the usual +70 per doubling.



Or, if you take the speedup numbers I have actually measured using 10.18, you get 1500x. But let me back off of that a bit. 250x is pure single-cpu speed gain, going from 20K in 1995 to 4M on one cpu in 2006 hardware, more on an i7 but I don't have any single CPU numbers so let's stop at 4M. 20K to 4M is 200x faster. using 8 cpus, Crafty's speedup is roughly 1 + 7 * .7 = 5.9 or a bit better. Call that 6x. So 200x times 6 = 1200x effective speedup, which doesn't give credit for all cores, and factors in parallel search efficiency. 1200x is 10+ doublings. That will definitely be more than 360 Elo for me. In a few days I can say exactly how much more once I figure out a way to test at 20K on modern hardware, and do it correctly...