Crafty tests show that Software has advanced more.
Posted: Sun Sep 12, 2010 3:10 am
I didn't really expect that Bob's test would show this as I consider his test rather biased in favor of hardware. Nevertheless, it is still showing that software is a bigger contributor to computer chess advancement over the years than hardware is.
Here are some of his intermediate results:
Here is the calculation to show that software is the bigger contributor:
It's well known that each hardware doubling is worth about 60 ELO of rating improvement. (For example Crafty running on a quad is almost exactly 100 ELO stronger than the single processor equivalent program.)
Bob's test shows that Crafty gained 377 ELO with small error margins. Bob agreed that we should add about 300 ELO to represent true Software advancement because Rybka 4 represents the state of the art in 2010 and it's over 300 ELO stronger than Crafty.
So this test estimates that we have gained 377 + 300 = 677 ELO over a 15 year period.
So the question is how much speed do we need in order to gain 677 ELO if a doubling is worth 60 ELO?
677 / 60 = 11.3 doublings. 11.3 doublings is a factor of 2521. We need a computer well over 2,500 times faster to get 677 ELO.
Bob estimated that hardware increased only 1500 times. Therefore more of the improvement has come from software than hardware using his estimates of hardware improvements.
I would like to mention that I believe Bob's numbers are flawed for several reasons I will briefly outline here and in fact the software is even MORE than Bob estimates.
The first reason is that his numbers do not reconcile with a test I did using Rebel. We compared rebel on old and new hardware. The 1 processor speedup for Rebel is about 100 to 1. Allowing for running on a Octal, you could multiply this by 8 to get 800 to 1. For chess, an octal does NOT give you a true 800 to 1 speedup but Bob is using the Nodes per second calculation anyway. This number still disagrees with Bob's number by about 2 to 1.
Another reason Bob's numbers are distorted is that he decided arbitrarily which machines should be compared. It's a question of defining something to remain a constant such as price, form factor, etc. For example we could say that anything you can purchase for less than 1000 bucks, or anything that is called a "workstation" and that you can easily move around. Of all the possible things to remain constant and with much hand waving he decided the constant should be that it must be Intel hardware. Of all the possible things to compare, this is the one that exaggerates the difference the most. In 1995 more powerful machines were available than the P90, so calling the P90 state of the art is a joke. But calling the i7 state of the art is not.
In order to measure the hardware difference Bob chose to use 2 different versions of Crafty, both of which are optimized to run on 64 bit systems. As coincidence would have it, Crafty runs on 64 bit hardware and looks especially good on 64 bit hardware. So he looks at some log files and eventually produces the number 1500 as the value for how much hardware has advanced over the last 15 years and claims he is being generous to do that. The log files show the speed of a 1995 Crafty running on 32 bit hardware. But even back then Crafty was designed to run on a 64 bit machine.
Moores law is a much perverted and misquoted and reformulated statement of how quickly transistor density changes over the years. I think Moore said that density doubles every 18 months and then way back in 1975 modified his own "law" to every 2 years. It has often been loosely translates that performance doubles every 18 months. This was actually a reformalation based on observation by an Intel colleague of Moore's. In fact, performance on average does NOT double every 18 months, it takes longer. (I have NEVER seen a doubling in performance when I upgrade even every 2 or 3 years although sometimes it's close.)
So Bob's estimate is not in harmony with this (admittedly crude) rule of thumb that nevertheless is widely accepted. Over 15 years even if you assume a full doubling every 18 months you would get 1024 improvement. I think almost everyone things 18 months is on the very generous side.
Here are some of his intermediate results:
Code: Select all
Crafty-23.4 2703 4 4 30000 66% 2579 22%
Crafty-23.3 2693 4 4 30000 65% 2579 22%
Crafty-23.1 2622 4 4 30000 55% 2579 23%
Glaurung 2.2 2606 3 3 60277 46% 2636 22%
Toga2 2599 3 3 60275 45% 2636 23%
Fruit 2.1 2501 3 3 60248 32% 2636 21%
Glaurung 1.1 SMP 2444 3 3 60267 26% 2636 17%
Crafty-10.18 2326 19 19 1327 20% 2580 14%
It's well known that each hardware doubling is worth about 60 ELO of rating improvement. (For example Crafty running on a quad is almost exactly 100 ELO stronger than the single processor equivalent program.)
Bob's test shows that Crafty gained 377 ELO with small error margins. Bob agreed that we should add about 300 ELO to represent true Software advancement because Rybka 4 represents the state of the art in 2010 and it's over 300 ELO stronger than Crafty.
So this test estimates that we have gained 377 + 300 = 677 ELO over a 15 year period.
So the question is how much speed do we need in order to gain 677 ELO if a doubling is worth 60 ELO?
677 / 60 = 11.3 doublings. 11.3 doublings is a factor of 2521. We need a computer well over 2,500 times faster to get 677 ELO.
Bob estimated that hardware increased only 1500 times. Therefore more of the improvement has come from software than hardware using his estimates of hardware improvements.
I would like to mention that I believe Bob's numbers are flawed for several reasons I will briefly outline here and in fact the software is even MORE than Bob estimates.
The first reason is that his numbers do not reconcile with a test I did using Rebel. We compared rebel on old and new hardware. The 1 processor speedup for Rebel is about 100 to 1. Allowing for running on a Octal, you could multiply this by 8 to get 800 to 1. For chess, an octal does NOT give you a true 800 to 1 speedup but Bob is using the Nodes per second calculation anyway. This number still disagrees with Bob's number by about 2 to 1.
Another reason Bob's numbers are distorted is that he decided arbitrarily which machines should be compared. It's a question of defining something to remain a constant such as price, form factor, etc. For example we could say that anything you can purchase for less than 1000 bucks, or anything that is called a "workstation" and that you can easily move around. Of all the possible things to remain constant and with much hand waving he decided the constant should be that it must be Intel hardware. Of all the possible things to compare, this is the one that exaggerates the difference the most. In 1995 more powerful machines were available than the P90, so calling the P90 state of the art is a joke. But calling the i7 state of the art is not.
In order to measure the hardware difference Bob chose to use 2 different versions of Crafty, both of which are optimized to run on 64 bit systems. As coincidence would have it, Crafty runs on 64 bit hardware and looks especially good on 64 bit hardware. So he looks at some log files and eventually produces the number 1500 as the value for how much hardware has advanced over the last 15 years and claims he is being generous to do that. The log files show the speed of a 1995 Crafty running on 32 bit hardware. But even back then Crafty was designed to run on a 64 bit machine.
Moores law is a much perverted and misquoted and reformulated statement of how quickly transistor density changes over the years. I think Moore said that density doubles every 18 months and then way back in 1975 modified his own "law" to every 2 years. It has often been loosely translates that performance doubles every 18 months. This was actually a reformalation based on observation by an Intel colleague of Moore's. In fact, performance on average does NOT double every 18 months, it takes longer. (I have NEVER seen a doubling in performance when I upgrade even every 2 or 3 years although sometimes it's close.)
So Bob's estimate is not in harmony with this (admittedly crude) rule of thumb that nevertheless is widely accepted. Over 15 years even if you assume a full doubling every 18 months you would get 1024 improvement. I think almost everyone things 18 months is on the very generous side.