(1) take a 1995 program, use it's speed on 1995 hardware, and then run it on today's hardware and measure the performance improvement. This is flawed, because the old program was optimized to the old hardware. There are tons of new instructions, new mechanisms (register renaming, OOE, multi-level cache, etc) that the old program will likely use inefficiently since it didn't exist back then. This "compresses" the speed difference significantly.
(2) take a 2010 program and run it on 1995 hardware. Same problem. We do things today (magic move generation is one example) that depend on today's hardware. Running this on 1995 hardware is a performance issue since 64 bit multiplies have to be done in pieces. Ditto for our hash probing that is aware of 64 bit block/line sizes, that did not exist back in 1995. This will also "compress" the speed difference artificially.
(3) take a 1995 program and run it on the 1995 hardware it was optimized for. Then take a successor of that program from today and run it on the best hardware of today. Whether this be Crafty, Fritz, or whatever is not that important. Main thing is to take a program that was optimized to 1995 hardware in 1995, and compare it to the same program optimized for 2010 hardware today. This gives a real comparison.
Of course, there is a question of "what do you compare"? Depths are not equal thanks to today's reductions and pruning, so comparing time to depth is no good. Perhaps time-to-solution for a reasonable set of positions. But this is going to take some effort since something that takes a few minutes in 1995 might turn into a second today. One could compare NPS, which for the same program is pretty constant. But I think the idea of a few tactical positions that have a very concrete solution offers the best comparison. It factors in SMP, without over-counting it since SMP loss (extra nodes searched) will figure in to the time-to-solution.
I think this latter idea is the way to compare, however I am not yet having a lot of luck with getting a 32 bit 1995 version of Crafty to run on my 64 bit cluster. The old tricky rotated bitboard stuff with compact-attacks and such do not like 64 bit registers at all. Still struggling with this. Which means option 3 is probably the best one. Only problem would be to find some good positions (say the Nolot positions) that were run against Crafty (or whatever program is used) back in 1995, since finding 1995 hardware is not exactly an easy task today...
And as I write this I am still not sure exactly how to measure performance. Do we believe our searches are more efficient in terms of finding something in fewer nodes, today? So I suppose we could factor in time to solution and nodes required, to try to factor out search improvements.
What a confusing issue.
