bob wrote: Old on new hardware will be the most interesting as I would not be surprised at all to see the old program beat R3 with R3 on old hardware and old program on new hardware...
That's what I expect to see - but there is a big question mark about whether the old program can actually utilize the new hardware. Your test would help sort that out and I think it is a good test to do.
I expect rybka to win at 120/40 time control even if we replace old programs by Crafty21.5 that has a similiar level to top programs of 1999(assuming that you give rybka the best hardware of 1999 that is a quad and give Crafty21.5 the best octal of today(I assume that programs of 1999 could not use efficiently something better than an octal).
Note that I suggest to use contempt=0 for rybka(the default contempt=15 is better against significantly weaker opponent but when the target is to win a match it is better to use contempt=0 based on the results that I read)
Uri
I have to point out that this is not a fair test the way you present it for several reasons. I knew from the beginning that it was stacked very much in favor of Rybka but I think you are trying to stack it more. I noticed you are now talking about 1999, not 1998 but let's say January of 1999 to be more precise. Also, even though quads may have been available, nobody was testing on quads back then. Everybody today it seems has a quad in their home (at least serious chess enthusiasts) and the ratings lists are testing with quads. Were quads being tested on the ratings lists in 1998?
I would agree to using the best of what the testing agencies were using at the time. I think that limits todays hardware to a quad (even though we can go much higher) and 10 years ago hardware to a single processor machine even though you could go higher.
Of course you would like to ramp up the hardware on both ends as much as possible because you realize that modern programs were designed for better hardware (which is part of MY argument.) So this would not contribute to making your point and you should not want this. Are you looking for the truth, or just to trying to construct a match that you can win? If you really want to know the truth here you cannot keep pushing for every possible advantage. This would become like one of those battle of the sexes tennis matches where the female is given all kinds of advantages because we all know this isn't really about whether men or women play better.
Rybka is very strong - we should both be interested in a fair test, not trying to nitpick every possible advantage to stack the odds in our favor - that wouldn't prove anything. If the test is too obviously unfair it just makes the results meaningless and people will debate it. (I'm sure they will anyway, but let's not make it too obvious that we secretly want to have more reasons to worship and adore Rybka.)
Of course it's clear that we cannot possibly construct a perfectly fair test but don't we need to be reasonable?
This is partly why I want to just do a prelim odds match - just to see if there is anything to talk about.
bob wrote: Old on new hardware will be the most interesting as I would not be surprised at all to see the old program beat R3 with R3 on old hardware and old program on new hardware...
That's what I expect to see - but there is a big question mark about whether the old program can actually utilize the new hardware. Your test would help sort that out and I think it is a good test to do.
I expect rybka to win at 120/40 time control even if we replace old programs by Crafty21.5 that has a similiar level to top programs of 1999(assuming that you give rybka the best hardware of 1999 that is a quad and give Crafty21.5 the best octal of today(I assume that programs of 1999 could not use efficiently something better than an octal).
Note that I suggest to use contempt=0 for rybka(the default contempt=15 is better against significantly weaker opponent but when the target is to win a match it is better to use contempt=0 based on the results that I read)
Uri
I'd go for that test in a heartbeat because Crafty of 1999 could use a 16-way box just fine and those are easy to find today. Even 8-way.
In 1998 I ran on a Pentium II xeon at 300mhz. That's probably a good estimate of what was generally available in late 1998 which is about ten years ago since we just barely started 2009 and should probably consider 2008 as "the year". I ran on an i7 late in 2008 for comparison.
To keep this simple, I would suggest a single-chip machine, 300mhz P2 for 1998, single-chip quad-core I7 at nearly 3ghz for today's hardware. Yes you can do a dual I7 but you could also do a dual P2, and you could do quads in both as well, so a single chip test would be representative...
That is probably about a factor of 200x in computing. Crafty on the P2300 was under 100K nodes per second. On the quad i7 it is hitting around 20M.
So we could use some platform of today and play 200:1 time odds and see what happens. I believe I know what will happen but it will be interesting to try. I'd be more than happy to try this but do not have any commercial programs to test against and only run linux on everything I have...
Here's a test I can run trivially:
First, let's choose glaurung 2 rather than Rybka. You can decide how much better Rybka is in terms of Elo and we will always use Glaurung + Delta-x where Delta-x is the difference between R3 and G2.
Now I can run any sort of handicap match you want, at any time control, and we don't need tens of thousands of games which will help.
So for 1998 hardware we run either program at a 200:1 time handicap. For 2008 hardware we run that program straight up with no handicap.
Interested???
I can take current Crafty and Glaurung back to 1998 hardware. I can run them on current hardware (simulated via time). Only thing I can't do, which makes the experiment less interesting, is I can't take a 1998 program and its 2008 counterpart to test which would show what happened over that span with software, where the time handicap would show what happened over that span with hardware. But we could discover some interesting information about hardware improvements anyway.
I did not talk about current Crafty that is stronger than the best software of 1999 but about Crafty21.5
I expected something like 50:1 speed advantage.
About using 16 processors I do not believe that the private commercial programs that played in WCCC could do it.
quad of the beginning of 1999 against octal of today is less than 200:1.
In the conditions that you give with the best Crafty I expect Crafty to win
because I see that Crafty improved since version 21.5 and 200:1 is certainly better than 50:1.
Uri Blass wrote:
P200 was also not the top hardware of 1999.
If you are interested in top hardware of 1999 against top hardware of today then I guess that rybka3 can also use more than one processor
so I expect even better results for rybka3.
Uri
I'm basically looking for a 32 to 1 advantage based on Moores law because we are talking about 10 years and a doubling every 2 years. So even if we run all the testing on a single processor this is the value I would expect to be fair.
The raw CPU speed hasn't fully kept up with Moores law, but we must also consider huge increase in memory and multi-processors. And this discussion did not start off being about the last 10 years but much further back. So however this test is constructed it should give the 10 year old top program a 32 to 1 (or equivalent) advantage, otherwise we are arguing about something else.
Even though the memory will be crippled, I don't think it will much of a disadvantage for Rybka. It would not NEED the same amount of memory running 32 times slower.
I think this test is going to be pretty unfair anyway because there are intangibles that we cannot easily take into consideration. Each program was optimized for the hardware it was designed to run on with the appropriate compiler optimizations, etc. Some benchmarking might help to resolve this however. Maybe it cancels out for each program.
Anyway, I would like to just gather the numbers for now and we can argue about what they mean later - sort it out then.
I believe it is better than that. Moore's law deals with chip density, not necessarily chip speed. In my old records, I found that in November 1998 I ran on a P2/300 box that was loaned to me for a month or so. Crafty was getting under 100K nodes per second on that box. Most recent testing on a 2.9?? ghz I7 was around 20M nodes per second using 4 cores, no hyperthreading. A factor of 200:1 roughly, which is wildly significant.
I believe it's better than that too but I am giving as many concessions as possible (and they want more.) I think they don't really want a fair match because that is obviously un-winnable for them.
I think a doubling every 2 years was a conservative (rounded up) estimate and that it is really much better than this.
That is why I made my suggestion. I _know_ that I ran on a P2/300 xeon in 1998 late. The box I had was a dual CPU, but I would agree that one "chip" would be the best test. And I also ran on an I7 at some odd clock speed, 2.9xx ghz a couple of months ago. With 4 cores and hyperthreading disabled. And I hit around 20M on that box.
Hence my factor of 200:1 for 1998 vs 2008 which seems to be reasonable. And that is a _substantial_ hurdle for a good program vs a bad program to overcome, if the bad program gets the 200:1 odds. SO we can learn what the hardware has offered. But we are left with software. I can probably dredge up a 1998 era version of Crafty if I can figure out what was current at the time. I know I have a 1996 version that was run in Jakarta so I can probably come close. And in Jakarta Crafty finished in the top 4 or 5 at the WMCCC event so it was very competitive at the time (and not running on a dual-cpu box either, it used a single cpu pentium pro 200). So measuring the software improvement from 1998 to present could be approximated by taking Crafty of 1998 vs Rybka of today. But I can't run Rybka not having it. I suggested agreeing on how much better Rybka is than Glaurung 2 and then using that, which I do have and can run hundreds of games at a time on the clusters here...
bob wrote:That is why I made my suggestion. I _know_ that I ran on a P2/300 xeon in 1998 late. The box I had was a dual CPU, but I would agree that one "chip" would be the best test. And I also ran on an I7 at some odd clock speed, 2.9xx ghz a couple of months ago. With 4 cores and hyperthreading disabled. And I hit around 20M on that box.
Hence my factor of 200:1 for 1998 vs 2008 which seems to be reasonable. And that is a _substantial_ hurdle for a good program vs a bad program to overcome, if the bad program gets the 200:1 odds. SO we can learn what the hardware has offered. But we are left with software. I can probably dredge up a 1998 era version of Crafty if I can figure out what was current at the time. I know I have a 1996 version that was run in Jakarta so I can probably come close. And in Jakarta Crafty finished in the top 4 or 5 at the WMCCC event so it was very competitive at the time (and not running on a dual-cpu box either, it used a single cpu pentium pro 200). So measuring the software improvement from 1998 to present could be approximated by taking Crafty of 1998 vs Rybka of today. But I can't run Rybka not having it. I suggested agreeing on how much better Rybka is than Glaurung 2 and then using that, which I do have and can run hundreds of games at a time on the clusters here...
If you can get me a linux version of that particularly Crafty, I can run the test on my 64 bit linux machine as I do have Rybka 64 bit.
bob wrote: Old on new hardware will be the most interesting as I would not be surprised at all to see the old program beat R3 with R3 on old hardware and old program on new hardware...
That's what I expect to see - but there is a big question mark about whether the old program can actually utilize the new hardware. Your test would help sort that out and I think it is a good test to do.
I expect rybka to win at 120/40 time control even if we replace old programs by Crafty21.5 that has a similiar level to top programs of 1999(assuming that you give rybka the best hardware of 1999 that is a quad and give Crafty21.5 the best octal of today(I assume that programs of 1999 could not use efficiently something better than an octal).
Note that I suggest to use contempt=0 for rybka(the default contempt=15 is better against significantly weaker opponent but when the target is to win a match it is better to use contempt=0 based on the results that I read)
Uri
I have to point out that this is not a fair test the way you present it for several reasons. I knew from the beginning that it was stacked very much in favor of Rybka but I think you are trying to stack it more. I noticed you are now talking about 1999, not 1998 but let's say January of 1999 to be more precise. Also, even though quads may have been available, nobody was testing on quads back then. Everybody today it seems has a quad in their home (at least serious chess enthusiasts) and the ratings lists are testing with quads. Were quads being tested on the ratings lists in 1998?
I would agree to using the best of what the testing agencies were using at the time. I think that limits todays hardware to a quad (even though we can go much higher) and 10 years ago hardware to a single processor machine even though you could go higher.
Of course you would like to ramp up the hardware on both ends as much as possible because you realize that modern programs were designed for better hardware (which is part of MY argument.) So this would not contribute to making your point and you should not want this. Are you looking for the truth, or just to trying to construct a match that you can win? If you really want to know the truth here you cannot keep pushing for every possible advantage. This would become like one of those battle of the sexes tennis matches where the female is given all kinds of advantages because we all know this isn't really about whether men or women play better.
Rybka is very strong - we should both be interested in a fair test, not trying to nitpick every possible advantage to stack the odds in our favor - that wouldn't prove anything. If the test is too obviously unfair it just makes the results meaningless and people will debate it. (I'm sure they will anyway, but let's not make it too obvious that we secretly want to have more reasons to worship and adore Rybka.)
Of course it's clear that we cannot possibly construct a perfectly fair test but don't we need to be reasonable?
This is partly why I want to just do a prelim odds match - just to see if there is anything to talk about.
I am clearly interested in a fair test.
Note that I suggested to give Crafty an octal and to give Rybka a quad.
I have no objection to give both ssdf hardware and the advantage from P200 single processor to Q6600(4 processors) is not close to 200:1 based on my knowledge(I guess that it may be 50:1 or at most 100:1).
odd match may be interesting.
I feel sure that even the 32 bit version of rybka can win match with 32:1 time handicap(assuming not very fast time control) because older rybka2.3.2a 32 bits(weaker than rybka3) could win a match of 100 games against movei(stronger than fritz5.32) with 100:7 time handicap.
I am not sure about results with significantly bigger time handicap and 200:1 that bob suggest may be too big handicap.
bob wrote:That is why I made my suggestion. I _know_ that I ran on a P2/300 xeon in 1998 late. The box I had was a dual CPU, but I would agree that one "chip" would be the best test. And I also ran on an I7 at some odd clock speed, 2.9xx ghz a couple of months ago. With 4 cores and hyperthreading disabled. And I hit around 20M on that box.
Hence my factor of 200:1 for 1998 vs 2008 which seems to be reasonable. And that is a _substantial_ hurdle for a good program vs a bad program to overcome, if the bad program gets the 200:1 odds. SO we can learn what the hardware has offered. But we are left with software. I can probably dredge up a 1998 era version of Crafty if I can figure out what was current at the time. I know I have a 1996 version that was run in Jakarta so I can probably come close. And in Jakarta Crafty finished in the top 4 or 5 at the WMCCC event so it was very competitive at the time (and not running on a dual-cpu box either, it used a single cpu pentium pro 200). So measuring the software improvement from 1998 to present could be approximated by taking Crafty of 1998 vs Rybka of today. But I can't run Rybka not having it. I suggested agreeing on how much better Rybka is than Glaurung 2 and then using that, which I do have and can run hundreds of games at a time on the clusters here...
If you can get me a linux version of that particularly Crafty, I can run the test on my 64 bit linux machine as I do have Rybka 64 bit.
I agree that a match against version of Crafty from 1998 with 200:1 time handicap may be interesting.
bob wrote: Old on new hardware will be the most interesting as I would not be surprised at all to see the old program beat R3 with R3 on old hardware and old program on new hardware...
That's what I expect to see - but there is a big question mark about whether the old program can actually utilize the new hardware. Your test would help sort that out and I think it is a good test to do.
I expect rybka to win at 120/40 time control even if we replace old programs by Crafty21.5 that has a similiar level to top programs of 1999(assuming that you give rybka the best hardware of 1999 that is a quad and give Crafty21.5 the best octal of today(I assume that programs of 1999 could not use efficiently something better than an octal).
Note that I suggest to use contempt=0 for rybka(the default contempt=15 is better against significantly weaker opponent but when the target is to win a match it is better to use contempt=0 based on the results that I read)
Uri
I'd go for that test in a heartbeat because Crafty of 1999 could use a 16-way box just fine and those are easy to find today. Even 8-way.
In 1998 I ran on a Pentium II xeon at 300mhz. That's probably a good estimate of what was generally available in late 1998 which is about ten years ago since we just barely started 2009 and should probably consider 2008 as "the year". I ran on an i7 late in 2008 for comparison.
To keep this simple, I would suggest a single-chip machine, 300mhz P2 for 1998, single-chip quad-core I7 at nearly 3ghz for today's hardware. Yes you can do a dual I7 but you could also do a dual P2, and you could do quads in both as well, so a single chip test would be representative...
That is probably about a factor of 200x in computing. Crafty on the P2300 was under 100K nodes per second. On the quad i7 it is hitting around 20M.
So we could use some platform of today and play 200:1 time odds and see what happens. I believe I know what will happen but it will be interesting to try. I'd be more than happy to try this but do not have any commercial programs to test against and only run linux on everything I have...
Here's a test I can run trivially:
First, let's choose glaurung 2 rather than Rybka. You can decide how much better Rybka is in terms of Elo and we will always use Glaurung + Delta-x where Delta-x is the difference between R3 and G2.
Now I can run any sort of handicap match you want, at any time control, and we don't need tens of thousands of games which will help.
So for 1998 hardware we run either program at a 200:1 time handicap. For 2008 hardware we run that program straight up with no handicap.
Interested???
I can take current Crafty and Glaurung back to 1998 hardware. I can run them on current hardware (simulated via time). Only thing I can't do, which makes the experiment less interesting, is I can't take a 1998 program and its 2008 counterpart to test which would show what happened over that span with software, where the time handicap would show what happened over that span with hardware. But we could discover some interesting information about hardware improvements anyway.
I did not talk about current Crafty that is stronger than the best software of 1999 but about Crafty21.5
I expected something like 50:1 speed advantage.
About using 16 processors I do not believe that the private commercial programs that played in WCCC could do it.
quad of the beginning of 1999 against octal of today is less than 200:1.
In the conditions that you give with the best Crafty I expect Crafty to win
because I see that Crafty improved since version 21.5 and 200:1 is certainly better than 50:1.
Uri
Your numbers are wrong. A quad in 1999 would hit maybe 350K at best. Running Crafty which could easily use 16 back then. Today we can get a quad chip, quad core, that can hit 80M with Crafty. That is _still_ 200:1...
bob wrote:That is why I made my suggestion. I _know_ that I ran on a P2/300 xeon in 1998 late. The box I had was a dual CPU, but I would agree that one "chip" would be the best test. And I also ran on an I7 at some odd clock speed, 2.9xx ghz a couple of months ago. With 4 cores and hyperthreading disabled. And I hit around 20M on that box.
Hence my factor of 200:1 for 1998 vs 2008 which seems to be reasonable. And that is a _substantial_ hurdle for a good program vs a bad program to overcome, if the bad program gets the 200:1 odds. SO we can learn what the hardware has offered. But we are left with software. I can probably dredge up a 1998 era version of Crafty if I can figure out what was current at the time. I know I have a 1996 version that was run in Jakarta so I can probably come close. And in Jakarta Crafty finished in the top 4 or 5 at the WMCCC event so it was very competitive at the time (and not running on a dual-cpu box either, it used a single cpu pentium pro 200). So measuring the software improvement from 1998 to present could be approximated by taking Crafty of 1998 vs Rybka of today. But I can't run Rybka not having it. I suggested agreeing on how much better Rybka is than Glaurung 2 and then using that, which I do have and can run hundreds of games at a time on the clusters here...
If you can get me a linux version of that particularly Crafty, I can run the test on my 64 bit linux machine as I do have Rybka 64 bit.
Any version I can find is linux-compatible since that is how it has been developed. Also winboard compatible although going back to 1998 will mean no protocol version 2...
Let me search to see what I can find first, as I have to figure out what versions were current in 1998...
bob wrote:That is why I made my suggestion. I _know_ that I ran on a P2/300 xeon in 1998 late. The box I had was a dual CPU, but I would agree that one "chip" would be the best test. And I also ran on an I7 at some odd clock speed, 2.9xx ghz a couple of months ago. With 4 cores and hyperthreading disabled. And I hit around 20M on that box.
Hence my factor of 200:1 for 1998 vs 2008 which seems to be reasonable. And that is a _substantial_ hurdle for a good program vs a bad program to overcome, if the bad program gets the 200:1 odds. SO we can learn what the hardware has offered. But we are left with software. I can probably dredge up a 1998 era version of Crafty if I can figure out what was current at the time. I know I have a 1996 version that was run in Jakarta so I can probably come close. And in Jakarta Crafty finished in the top 4 or 5 at the WMCCC event so it was very competitive at the time (and not running on a dual-cpu box either, it used a single cpu pentium pro 200). So measuring the software improvement from 1998 to present could be approximated by taking Crafty of 1998 vs Rybka of today. But I can't run Rybka not having it. I suggested agreeing on how much better Rybka is than Glaurung 2 and then using that, which I do have and can run hundreds of games at a time on the clusters here...
If you can get me a linux version of that particularly Crafty, I can run the test on my 64 bit linux machine as I do have Rybka 64 bit.
Any version I can find is linux-compatible since that is how it has been developed. Also winboard compatible although going back to 1998 will mean no protocol version 2...
Let me search to see what I can find first, as I have to figure out what versions were current in 1998...
I just remembered that my tester is UCI based - but I could make an adaptor to go from xboard to UCI if one doesn't already exist.