What rule am I breaking? You want to run a program compiled for 1995 hardware (Rebel) and run that _same_ binary on 2010 hardware, and use that number? That _really_ shows what hardware has done? It completely negates the last 8+ years of hardware advances, from 64 bits (I think the first opteron was released in 2003) to multiple cores, to improved CPU design with 8 additional registers. You are completely removing all of that from the equation, and then claiming that _your_ number is representative. That's nonsense.Don wrote:I don't give a hoot about what everyone was running - I wasn't running on a P90 back then, I was running on an Alpha. The issue is about software advancement and hardware advancement. If you want to know how much hardware has advanced since a certain date you have to pick hardware that represent the state of the art. What's so difficult to understand about that?bob wrote:Don wrote:I didn't really expect that Bob's test would show this as I consider his test rather biased in favor of hardware. Nevertheless, it is still showing that software is a bigger contributor to computer chess advancement over the years than hardware is.
Here are some of his intermediate results:
Here is the calculation to show that software is the bigger contributor:
Code: Select all
Crafty-23.4 2703 4 4 30000 66% 2579 22% Crafty-23.3 2693 4 4 30000 65% 2579 22% Crafty-23.1 2622 4 4 30000 55% 2579 23% Glaurung 2.2 2606 3 3 60277 46% 2636 22% Toga2 2599 3 3 60275 45% 2636 23% Fruit 2.1 2501 3 3 60248 32% 2636 21% Glaurung 1.1 SMP 2444 3 3 60267 26% 2636 17% Crafty-10.18 2326 19 19 1327 20% 2580 14%
It's well known that each hardware doubling is worth about 60 ELO of rating improvement. (For example Crafty running on a quad is almost exactly 100 ELO stronger than the single processor equivalent program.)
Bob's test shows that Crafty gained 377 ELO with small error margins. Bob agreed that we should add about 300 ELO to represent true Software advancement because Rybka 4 represents the state of the art in 2010 and it's over 300 ELO stronger than Crafty.
So this test estimates that we have gained 377 + 300 = 677 ELO over a 15 year period.
So the question is how much speed do we need in order to gain 677 ELO if a doubling is worth 60 ELO?
677 / 60 = 11.3 doublings. 11.3 doublings is a factor of 2521. We need a computer well over 2,500 times faster to get 677 ELO.
Bob estimated that hardware increased only 1500 times. Therefore more of the improvement has come from software than hardware using his estimates of hardware improvements.
I would like to mention that I believe Bob's numbers are flawed for several reasons I will briefly outline here and in fact the software is even MORE than Bob estimates.
The first reason is that his numbers do not reconcile with a test I did using Rebel. We compared rebel on old and new hardware. The 1 processor speedup for Rebel is about 100 to 1. Allowing for running on a Octal, you could multiply this by 8 to get 800 to 1. For chess, an octal does NOT give you a true 800 to 1 speedup but Bob is using the Nodes per second calculation anyway. This number still disagrees with Bob's number by about 2 to 1.
Another reason Bob's numbers are distorted is that he decided arbitrarily which machines should be compared. It's a question of defining something to remain a constant such as price, form factor, etc. For example we could say that anything you can purchase for less than 1000 bucks, or anything that is called a "workstation" and that you can easily move around. Of all the possible things to remain constant and with much hand waving he decided the constant should be that it must be Intel hardware. Of all the possible things to compare, this is the one that exaggerates the difference the most. In 1995 more powerful machines were available than the P90, so calling the P90 state of the art is a joke. But calling the i7 state of the art is not.
Fine. For state of the art I choose deep blue. 1,000,000,000 nodes per second. Pick _any_ 1995 platform you want and let's compare speedup. Or turn it around and I pick a Cray T932 which is more computer than any single chip PC today in any type of measurement. So we have 0 hardware improvement.
Or we use the machine that _everybody_ was using in 1995, which was Intel/windows, and we use the machine that _everybody_ is using today, which is the i3/i5/i7.
Personally, I have no problem determining which is the test to run. Everybody that has run on a T932, raise your hand. Looking around I see _one_ hand up. Everyone that has run on the big SP cluster with special-purpose chess processors, raise your hand. Again, I see _one_ hand up.
The point is that if you ask _anyone_ here what they were using in 1995, from the SSDF list, to ICC, to WMCCC/WCCC events, the most common answer, by probably 30-1 is going to be an Intel PC. That's the machine class almost everyone today is using as well. So the noise about the alphas and such is pure nonsense. Because if we include alphas we have to include every other rarely used box, of which there are many, and they were/are extremely fast. And extremely expensive...
If you want to compare hardware that doesn't represent the state of the art in 1995 to hardware that does in 2010, then go ahead. Just don't pretend that it's a correct comparison.
If it will make you happy then we can stop saying that we are talking about hardware advancement and say that this is about Intel advances.
The main problem is the 64 bit vs 32 bit difference. It was just a few hours ago that I found out that you succeded in recompling the old version and I was still going by your earlier statements.
In order to measure the hardware difference Bob chose to use 2 different versions of Crafty, both of which are optimized to run on 64 bit systems.
Jeez, Don, can't you at least read and get this right? My speed comparison was with crafty 10.x from 1995. I had numbers for the P5/133, I ran it on my hardware to get the speedup today. Not two different versions. The _exact_ same version. With about two dozen changes to add the xboard protocol changes to make it work on my cluster. Not a single change to the engine itself.
However that is a very MINOR issue compared to the 32 vs 64 bit issue.
No need to repeat - but you need to explain how breaking your own rule is fair all of a sudden. The P133 hardware is 32 bit. The i7 is 64 bit. The chess program is 64 bit. YOU are the one that insists that to be fair the programs being compared should be optimized to run on the hardware they were designed for. The program you are using was NOT optimized for a P133, It was optimized for a future 64 bit machine.
Why do you insist on continuing to make such a stupid statement (two different versions.) It is _clearly_ false. I doubt a single person here (perhaps excepting yourself) has somehow misunderstood that specific detail, which has been explained enough for anyone to finally see the light.
To emphasize: version 10.x was run in 1995 on 1995 hardware. I had a few test positions that backed up my 30K recollection. I ran the same few positions using that same version, but used my E5345 (single cpu) machine. It ran at 4M nodes per second. I ran crafty 23.4 on the same positions, same processor. Almost exactly the same NPS. I posted the numbers yesterday.
Again: 10.x on P5/133 searches 30K. On a P5/90, 20K. On an E5345 single CPU, 4M. It will take a little work to get the smp search to work, because the pthread library changed from way back and it doesn't compile cleanly. In addition, the lock stuff (xchg lock) has to be modified to work with 64 bit stuff as it uses the wrong register names. Those versions scaled perfectly with NPS, as the current version does. So 30M+ is the expected number. I will verify this once I get the pthread stuff to working over the next couple of days.
Should I repeat it one more time? Not two different versions. _same_ version.
You don't like my using a program from 1995, that was designed to run on 1995 hardware and be fast, and now because it happens to be able to use the last 8 years of hardware improvements, that is a no-no. It is breaking some mythical rule that you attribute to me but I do not recall ever making such.
You also apparently would not use _any_ 64 bit program from today to do this analysis, you are stuck on using a very old, out-of-date, architecturally inefficient program (rebel) without even trying to get Ed to re-compile it to at least use part of the new hardware.
And you say _I_ am biasing things in my favor. My test is about as good as it gets. Pick a program from 1995 that is still being worked on today. Might be 2-3 of them perhaps. Can you get the source from their 1995 version to compile today for modern hardware? I can, for mine.
I've chosen the most accurate test I can think of. And in every post I have _always_ made the note that this is "Crafty's hardware speedup" or "Crafty's programming improvements" from 1995 to 2010. There is no single magic number that fits all. There is a very precise number that fits me.
I am currently running some 1/2 speed and then 1/4 speed runs to see what 2 doublings does to 10.18. I'm expecting something in the range of 70. Just for fun, here is the current results after 3600 games:
Crafty-10.18-1 2417 4 4 30000 22% 2655 14%
Crafty-10.18-2 2416 4 4 30000 22% 2655 14%
Crafty-10.18 2340 12 12 3613 15% 2655 11%
If you subtract, it looks like -75 Elo if I run 10.18 at 1/2 the normal speed, while leaving all the opponents running at normal speed. Given that, if it holds up, Crafty got +360 from software, 750+ (using 10 doublings, although it is actually more than 10 by some fraction, something like 10.25). So you might want to re-think what you say my data has proven with respect to which has given the greatest gain. This seems to support the common 2x faster = +70 Elo we have been measuring for years. More once these finish and I get the 1/4 speed results.
Nope, sorry. I designed it in 1995 because I wanted to give bitboards a try. After having talked to Slate for years, I set out to see what they could do. I spent a ton of time optimizing things for 32 bit hardware. yes, I knew 64 bit hardware would one day become the norm. But I did _not_ try to write a program that would be crappy until that happened. Reasonable 64 bit stuff arrived with the AMD opteron in the 2003 time frame somewhere. You _really_ think I wrote something that was not effective, and used it for 8 years, just waiting? Wonder why Slate did 64 bit stuff on a 60 bit CPU and had similar performance issues? Stupidity???
You designed it for the future, not the present in 1995. You were all over the forum with that 15 years ago.
That is a false statement. Crafty was designed to use 64 bit values for the bitboards. It was _designed_ to run on 32 bit hardware, which was what we had in the PC world back then. You only have to look at the move generation stuff (COMPACT_ATTACKS, USE_SPLIT_SHIFTS, etc) that was explicitly designed to work efficiently on 32 bit boxes. Yes it gains some on 64 bit hardware. But in 1995 it was most certainly designed to run well on 32 bit hardware.As coincidence would have it, Crafty runs on 64 bit hardware and looks especially good on 64 bit hardware. So he looks at some log files and eventually produces the number 1500 as the value for how much hardware has advanced over the last 15 years and claims he is being generous to do that. The log files show the speed of a 1995 Crafty running on 32 bit hardware. But even back then Crafty was designed to run on a 64 bit machine.
Absolutely false. And easily proven. Name your benchmark and let's go. Move generation speed? The most common requirement in a chess program is to generate captures only, for the q-search that represents 90% of the nodes until we reach simple endgames. Tell me how you efficiently generate just captures, compared to how I do it in bitboards. I generate all pawn moves in one gulp. Tell me how you do that in a mailbox more efficiently. I'm not going to go thru this ridiculous argument. Bitboards are no worse than mailbox on 32 bit hardware. That's pure urban legend. Crafty has been searching as fast as any program around, from 1995 to date. This kind of misinformation doesn't fly and certainly can't be substantiated.
Your program is like Stockfish, it was designed specifically for 64 bit architectures but you did what you could to make it run as well as possible on 32 bit machines. But a bitboard program will never run as well on a 32 bit machine as a mailbox program.
Crafty certainly did. And Crafty was C and not assembly like frans used for years.For example which 64 bit programs could come close to Fritz and Nimzo in nodes per second 15 year ago on 32 bit hardware?
We are measuring hardware speed improvement from 1995 to present. We are measuring software improvement from 1995 to present. What should I measure? Lines of code? Number of conditional jumps? Number of months with "r" in them? Why would I measure anything except what we are talking about?Maybe _your_ 64 bit programs "ran like a dog" on 32 bit hardware. Mine did not. Bruce was about as good as it gets when it comes to speed, and we had discussions and measured things all the time. Bitboards were not particularly superior (until you factor in the generate only captures issue, or some easy-to-do-using-bitboard evaluation tricks that turn a mailbox loop into a single AND instruction (is this pawn passed?) and such. But they most certainly were _not_ inferior, otherwise Slate would have been just as handicapped, but we know he wasn't. Duchess was a bitboard program, on a 32 bit IBM mainframe. We weren't all idiots back in the 70's and 80's.
Hell, I'd bet you dollars to donuts you had 64 bit stuff in your 1995 code. Hashing, anyone? I've always used 64 bit hashing. Back then I did it as two 32 bit chunks, but it would clearly fit 64 bit hardware better. So was _your_ stuff designed for 64 bits only? Didn't think so.
I had 32 bit "mailbox" style program and 64 bit programs. I've done it all, been there done that as they say.
But the 64 bit programs have always run like a dog on 32 bit hardware. You can do some things to improve that situation but you can never quite get the full speed of a true 32 bit program on a 32 bit platform.
It's not your numbers that are off, it's your methodology for the reasons I have stated. You picked a very specific thing to measure, and no doubt measured it accurately, but my contentions is that you just measured the wrong thing.Keep saying that to yourself enough, and perhaps you will believe it. But I have no "estimate". I have an absolute measured value. Taking the P5/90 on one end, and a 6-core i7 on the other, the speed increase for Crafty is 1500x. I did not claim that was the speed gain for any other program. I don't care about any other program. I did not 'estimate' a thing, I simply took out my "ruler" and measured both as accurately as possible. What you are talking about is something that might have been typed by a roomfull of monkeys, because it is valid words, and somewhat valid grammatical constructions, but the meaning is missing.
Moores law is a much perverted and misquoted and reformulated statement of how quickly transistor density changes over the years. I think Moore said that density doubles every 18 months and then way back in 1975 modified his own "law" to every 2 years. It has often been loosely translates that performance doubles every 18 months. This was actually a reformalation based on observation by an Intel colleague of Moore's. In fact, performance on average does NOT double every 18 months, it takes longer. (I have NEVER seen a doubling in performance when I upgrade even every 2 or 3 years although sometimes it's close.)
So Bob's estimate is not in harmony with this (admittedly crude) rule of thumb that nevertheless is widely accepted. Over 15 years even if you assume a full doubling every 18 months you would get 1024 improvement. I think almost everyone things 18 months is on the very generous side.
So get off the "estimation" and "fabrication" and "exaggeration" bandwagon and offer something useful and logical. I've explained my numbers. Feel free to shoot either the 22K or the 30M numbers down. We can certainly get a 3rd party to verify the 10.x on current hardware. We've already had confirmation by someone running crafty and seeing 22K nps on I think a P5/100mhz. Which is right in line with 20K at 90mhz and 30K at 133mhz.
So shoot at what you think is wrong, but don't try to restate what I have done, I have been precise in what I have measured. And it is nothing at all related to what you are claiming I have measured.
I would try to back out of _anything_ that has no scientific basis. We have no idea about Rybka's background. We know it came from Fruit. But we can't trace it back to 1995. It might be that Rybka would get more from the hardware than I did, had it been started in 1995, and perhaps it might have gotten more (or less) from software improvements. Who knows? Who can measure this. I can at least accurately state what I have gotten, once I get my tests run. And notice that I am running tests, not just posting contradictory argument after contradictory argument...I was expecting that you would try to back out of this sooner or later.
I am also not yet ready to grant Rybka another +300 Elo on software improvements.
Simple concept. Read carefully. Then re-read before responding. What if Crafty has some significant design flaw that I don't know about? And otherwise could be better (or worse) than Rybka. Do you _know_ where the +300 for Rybka comes from? Bugs in Crafty? Improvements in Rybka? Better use of hardware in Rybka? If you don't know, what does that 300 point gap mean, other than "Rybka is 300 points better, but we don't know whether it is better software, better use of hardware, advances in rybka, bugs in Crafty, etc."
Look at the numbers on all the ratings list, Crafty is more than 300 ELO weaker than Rybka 4. I'm interested to hear about how you will also find that invalid.
I'm not prepared to just assume facts not in evidence.
Jeez, put brain in gear before putting fingers in motion. A major bug in Crafty doesn't affect that +300 at all? A major bug could be the _majority_ of that +300 for all anybody knows. Only way to deal with Rybka would be to use 1995 Rybka and 2010 Rybka just as I am doing. Unfortunately, there is no 1995 Rybka. And you would be complaining bitterly anyway because Rybka is a bitboard program and that is grossly unfair to run it on 1995 hardware, according to you.That doesn't affect the +300 figure as you have never been within 300 of Rybka 3 or Rybka 4.
It may well be that Crafty has a serious flaw somewhere.
Or if the new one is broken. How can anyone say which?
However it could affect the relative difference in the modern vs the old Crafty if the old one is broken.
I have not been proved wrong at all. At present, it appears that from 1995 to 2010, 2/3 of the Elo came from hardware, 1/3 from software. I've posted the numbers to support this. I will support more accurate numbers later after the tests complete.
Nevertheless, you have been proved wrong anyway. Even if you are off by 100 ELO the point has already been made that software and hardware are roughly equal in their contributions to the success of computer chess to any reasonable degree of measurement.
If you'd read everything before posting, you will find that I scaled the 1500 back to 1200. The 1500 number is 250x per core for 6 cores. I scaled that by the fairly accurate speedup = 1 + (ncpus -1) * .7. For 6 cores, that turns into 4.5x. 4.5x times 250x = what? About 1200x? Certainly over 1000x?
By the way, your 1500x figure should be taken as a figure that is too high.
Here is what you said:This is a nodes per second increase in speed based on using a modern 6 core machine and comparing it to a single P90. Like I say, this is probably an ACCURATE figure for estimating the nodes per second increase but it's not an accurate figure for measuring how much ELO you should gain which is the relevant point.
Taking the P5/90 on one end, and a 6-core i7 on the other, the speed increase for Crafty is 1500x.
Nevertheless, that actually doesn't change the number that much but it does some.
For Crafty, it appears that 2 doublings due to more cores (going from 1 to 4) is worth 50 ELO per doubling. Of course with additional cores it's worth even less.
Ideal would be to run rybka thru the same test I am doing. But that's not an option for me since there is no source available.