Don wrote:bob wrote:
I do not quite see the point for all the tangents. It would seem to me we have a pretty good idea of what/how to test.
Says you.
Rebel is only 100 times faster on modern hardware. You add a zero to this and consider it a reasonable test. You are rounding every possible variable in YOUR direction and calling it "a pretty good idea of what to test."
Bullshit. I have not rounded _anything_ in my favor. 1995 Crafty searched 30K on P5/133. 20K on P5/90. 1995 Crafty searches 30M on my current box, and has another 10% once I get PGO going.
So how can you say 30M / 22K = 1000x is "rounding every possible variable in MY direction?"
If you want to be snotty, let's use the eact number which is 1500x based on the above actual data, not on guessing or extrapolation. Once I make sure everything works correctly, I will be more than happy to make this source available, although it is only going to run on 64 bits. One can always download 10.18 from my ftp box to get the original 32-bit only version. Only problem is that the original version will not work with current winboard/xboard as the protocol has changed dramatically since 1995 when Tim Mann and I were whacking it everywhere.
Your first rounding error is that you compare Crafty on 32 bit to Crafty on 64 bit. Crafty is not optimal on 32 bit hardware so you are breaking your own rule.
Don, give it a rest. We started the discussion concerning two issues:
(1) what part of today's strength came from hardware advances between 1995 and 2010?
(2) what part of today's strength came from programming advances between 1995 and 2010?
For hardware, we had 32 bit P5/90 in 1995. For software, we had Crafty. Which was highly competitive with anyone in 1995. And Crafty ran on 32 bit hardware, just as surely as Slate used 60 bit hardware for his 64 bit chess engines in the 70's. Bitboards are not optimal on 32 bit hardware. But they are certainly the equal of non-bitboard programs on that hardware. So what?
I have simply taken 1995 crafty on 1995 hardware, and then run 1995 crafty on 2006 hardware and compared the search speeds. If you use actual numbers, I see 1500x. I've rounded that _down_ in your favor to 1000x. That's not a guestimated number, it is not a contrived number. It just compares 1995 HW to 2006 HW for Crafty.
The 1500x or 1000x is a real number. You apparently want to run every program you can find and use the worst ratio. I simply want to run my program because I have the numbers from 1995 and today (now).
Software improvements for my program are getting computed as I post this. If we can trust some big rating list, we can then figure out how much more Rybka has produced via software. And that gives us the software improvement.
I'm not trying to make up any numbers. I started off thinking new crafty might be 2x faster, as I mentioned. And was prepared to accept whatever number popped out as the "crafty hardware performance improvement" number. The old version is doing better than I expected. It is as fast in NPS, and appears to be within 350 Elo in terms of rating.
Nothing is rounded. Nothing is estimated. Nothing is cherry-picked.
Your second "rounding" error is to compare the best possible hardware that a high end hobbiest might own today to another platform that was considered INFERIOR in it's day compared to other (not just Alpha) workstations that were available.
Sorry, but your math sucks. The best hardware a high-end hobbiest might have today is at least a 24 core box. I have at least 3 students with those that I could have send you an email for confirmation. We have limited this to a decent single-chip i7, which is _not_ the high-end of the platforms, the duals and quads (chips, not cores) are the higher-end, and there are platforms beyond that.
P5/90 was not "inferior". A quad alpha would toast it. A normal alpha running a 32 bit program would not toast the thing at all. How many 64 bit chess engines do you think there were in 1995? I can count 'em on one hand and two were mine (Cray Blitz and Crafty). No micro-bitboarders in 1995 besides yours-truly. So the standard program was 32 bits. Mine needed 64. And ran pretty fast anyway.
Ask around and identify some "hobbiests" that had alphas in 1995. I doubt you can find a single computer chess person, other than those at a university or lab, that could even put hands-on an alpha.
You justify this by claiming that nobody cares about anything other than Intel hardware which is probably true, but has nothing to do with MY contention of hardware advancement. Your test won't prove anything I said is wrong - but of course if you redefine what people say in order to suit your own purposes, then you can "pretend" you are proving them wrong in politician fashion.
Then you define the rule. But you can _not_ ignore deep thought and deep blue if you are going to venture out beyond Intel. That ruins the discussion immediately.
I am proceeding. I am currently measuring old and new software on current hardware. I am going to figure out how to slow the hardware down and measure old and new on old hardware. I'll report the numbers. If you aren't happy with 'em, feel free to compute whatever you want. This seems to be about the fairest way I can think of. The 1000x is not even important in my tests, because I _know_ how fast Crafty searched in 1995, and I am pretty sure I can make it search at that _same_ speed in 2010 to see what one of those 1995 p5/90's wouild do today. Then I will know _exactly_ what hardware and software offers for Crafty, and can extrapolate with pretty good accuracy to get to Rybka's level for the true software-only improvement. At present, it seems that we might be at +600 for software. May end up that hardware is about the same. Don't know yet. But I am going to find out without getting bogged down in alphas, rs6000's, sparcs and ultra-sparcs, MIPS, Crays, Fujitsus, Hitachis, deep blues, belles, and you name it other hardware platforms. Nobody cares. Everybody has been using PCs since 1995. Except for 2-3-4 of us, and since 1994 I have used absolutely nothing but the x86/amd64 processors for chess tournaments. I used to bring the biggest hammer of all, but even I moved to the PC exclusively.
Your next rounding error is to run time odds games based on the Nodes per second difference when everyone knows that 6 cores is not better than a machine that is 6 times faster.
So what? I am not using 6 cores in any of these tests. So I fail to see your point. I used the 6-core observation purely as a method to compute hardware speedup since 1995. That 1000x number will not influence my test results at all, I am simply going to make old crafty search at p5/90 speed and see how much weaker it gets. That's not so hard to understand, is it?
Your 1000 to 1 test is just not a reasonable test. You can FIX the last "rounding error" by calculating the time difference based on just 1 core instead of the quad. Then run the single processor OLD program against an MP version of the new program.
Why would I do that. The old program had a parallel search. I can't use that today? We had parallel machines in 1995. We had parallel 386 boxes in 1986. I guess I therefore miss your point. I claim that _raw hardware_ is 1500x faster in running Crafty. I doubt you can refute that since anyone can compute the numbers. But I am not using that in my testing anywhere. I just wanted a number. That's a major Elo boost, however. But I am not going to turn 1500x into an estimated Elo gain. I'm going to test and produce an _actual_ elo difference between old at 22k and old at 4M (which is just using 1 core). I may (later) try old with 8 cores, but right now I am giving you every possible edge I can. One core. Not even the fastest current processor. And yet you are complaining about things that have no bearing on the results whatsoever.
But just saying that it does 1000X more nodes per second, therefore it should get 1000 to 1 time odds is asinie. If you like doing that then let's run a match between Komodo and Crafty. You can run Crafty on 4 cores as long as you let Komodo run on a single cores machine that is 4x faster than each of your cores. We will both be doing the same number of (hardware adjusted) nodes per second so it must be fair, right?
You can make up whatever stuff you want. You've not seen _me_ talk about 1000:1 time odds. You have seen me talk about 22K in 1995 and that is the speed I am going to eventually make old crafty run at on my new hardware, to see what slowing it from 4M to 22K is going to do. I may well then come back and run old up to 32M using 8 cores to get a _better_ estimate of hardware improving the rating. But I have not done that yet.
But that would not cure the other rounding errors.
I have a 1995 version of Crafty that seems to be running correctly on my cluster, and seems to be running well over 1000x faster than it did on 1995 P90. I am getting close to being able to announce just how strong (or weak) that 1995 program is on today's hardware, We can subtract that from Crafty's rating, add whatever fudge factor is needed to raise the standard to Rybka, and voila' we have the software gain from 1995.
Good for you. Run a fair test and I might be interested.
If you want to salvage this, allow for the test to be verifiable. It should be possible for anyone who is willing to independently verify your results, run their own tests under their own conditions and do any experiment they need to in order to be satisfied. If you don't do that, then this is meaningless. If you make it possible then I am still interested in seeing this test done under fair conditions.
I'm not quite sure what you are implying, but chew on this. Of the two of us, which one makes _all_ of their source code publicly available so that any claims they make can be verified easily?
think about it.
You live in secrecy, and then accuse me of being dishonest in my test results? I always make everything available. you should try it. Means unexpected errors don't slip into a crack either.
Make all the source code available including the original source code and your fixed source code. If possible make is so that we can run both programs under win-board protocol and provide a way for us to verify that it's the correct source code, perhaps an old web site where the sources of old crafty versions are posted.
This is NOT an accusation of dishonesty or anything like that, but it's just common sense.
Sorry, but get real. It _is_ an accusation. But it is a point I always address anyway. I'm not about to claim my old program will work with winboard/xboard. It might or might not. I simply made it work with my referee, which doesn't need the ping/pong/done=0/done=1 crap. But I will make the source available with the proviso that (a) it is 64 bit only, and (b) it is minimally compliant with new xboard/winboard stuff...
Anyone can run a test and make a mistake that gives unintended results. I have done it myself where I almost came to the wrong conclusion because one program was crashing and racking up losses for instance. It's very easy to overlook some testing issue that you didn't' think of and it's part of the wisdom of independent verification.