I was simply responding to Charles' request to test with/without null-move and LMR since they are two well-known software ideas although null-move is probably older than he suspects. I am trying to find a 1996 version of Crafty to run on the cluster at today's speeds to see how it does... I can then run it at 1996's speeds to get the Elo from purely hardware gains...michiguel wrote:The same can be said about hardware, they just put more Ghz on them.bob wrote:No idea what you are talking about. This was an absolutely perfect attempt to quantify the effect of null-move and LMR. Null-move existed in 1995. Null-move existed in 1990. So that is _barely_ an enhancement from the last 20 years. Beal's original paper was somewhere in the 1988 range.michiguel wrote:You may be right but this is not a valid comparison! you should compare the improvement between Crafty model 1996 vs. Crafty model 2008, not the contribution of two single techniques as implemented in 2008.bob wrote:Here are the results. Crafty-22.9X1 is normal crafty. 22.9X2 is normal except null-move completely commented out. 22.9X3 is normal except that LMR has been completely disabled. 22.9X4 is normal but with both LMR and null-move removed. the -1 or -2 just means run #1 or run#2 to give a fell for what kind of variation there is between runs.
normal is roughly 2637 in this test. Removing null-move or LMR drops this by approximately 40 Elo. Removing both drops the rating by around 120 Elo.Code: Select all
Rank Name Elo + - games score oppo. draws 1 Glaurung 2.1 2695 4 4 62256 65% 2585 20% 2 Toga2 2695 4 3 62256 64% 2585 20% 3 Crafty-22.9X1-1 2638 4 4 31128 51% 2629 21% 4 Crafty-22.9X1-2 2636 5 5 31128 51% 2629 21% 5 Fruit 2.1 2597 3 3 62256 52% 2585 22% 6 Crafty-22.9X2-2 2596 4 4 31128 45% 2629 21% 7 Crafty-22.9X3-2 2596 4 4 31128 46% 2629 20% 8 Crafty-22.9X2-1 2594 4 5 31128 45% 2629 21% 9 Crafty-22.9X3-1 2591 4 5 31128 45% 2629 20% 10 Glaurung 1.1 SMP 2530 3 4 62256 43% 2585 19% 11 Crafty-22.9X4-1 2517 5 5 31128 35% 2629 19% 12 Crafty-22.9X4-2 2514 5 5 31128 35% 2629 18%
Null-move and LMR are the two biggest search enhancements of the past 15 years. And they added +120 Elo. I could always try normal crafty, but take the NPS from about 1M on this hardware (I only test with 1 cpu here) and back it down to about 75K which is what I was getting in 1996 on a pentium pro 200, roughly a factor of 15x and see how that impacts performance. Although I should probably factor in the single-core pentium pro vs a quad-core xeon for today, which runs around 10M nps, as that is a more representative example of what hardware speeds have done since 1996. So 75K to 10M is a factor of 24 or so. But something tells me that factor of 24-25x is _way_ more than 120 Elo...
The discussion was not about 1988 vs 2008 for a 20 year span, it was about specific techniques developed during that time frame and how much they actually improve a program, vs the speed differential between machines over the last 20 years. I shortened it to 1995 because Crafty was available then, and I have a detailed record of what has been done over the last 13 years....
I can probably produce that result. Let me see what the oldest version I have is, and see if I can figure out when it was current. I certainly know when the Jakarta version was done, since we know when the WCCC in Jakarta was played. But that would be an invalid test as well, because software "improvements" in the present context means "new ideas developed since XXX". Many of Crafty's improvements have been the result of tuning, but based on old ideas. I do not call those "software improvements" because they are not "general enhancements for all programs (such as null-move or LMR is) but are specific to a specific program's previously existing code...
What is the Elo difference between Crafty 1996 vs Crafty 2008 running in equal hardware? This is not the optimum either but it is closer to something more meaningful.
Tuning individual components of software is software, as the title of the thread says. Null move, LMR etc. do not work in a vacuum.
I see your point but the comparison is not fair. You are not including bitbboards, all the nice eval things that you can do with them, etc. etc.
Yes, bitboards are old, but efficient implementations are not. We are not even taking into account the synergy between software and hardware.
Crafty98+Hardware98 ---> Elo A
Crafty08+Hardware08 ---> Elo B
Hardware + Software improvement = X = Elo B - Elo A
Crafty98+Hardware08 ---> Elo C
Hardware improvement = Y = Elo C - Elo A
Sofwtare improvement = X - Y
Is a better approximation to all the components combined, at least from your perspective That is still crude, because it does not take into account synergy, but at least we can start talking about it.
As a general statement, a more fair comparison will be "TopProgram98" and "TopProgram08", maybe SSDF has this already in some form.
Miguel
Miguel
Other suggestions???
BTW how many are surprised that removing both is only a 120 Elo loss???
Hardware vs Software
Moderator: Ras
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Hardware vs Software - test results
-
- Posts: 2091
- Joined: Mon Mar 13, 2006 2:31 am
- Location: North Carolina, USA
Re: Hardware vs Software
I agree that it doesn't fit the timeline we've been discussing, butbob wrote:I could do that but why? PVS existed in 1978 and was used in (at the time, Blitz, suggested by Murray Campbell). Ken Thompson used PVS in 1980 belle chess hardware. So that is 30 years old... I could tell you about the first program to ever use that in an ACM event, quite by accident, if you are interested...CRoberson wrote:Here is what I meant by PV verification:
Change that to:Code: Select all
for all moves { if first move v = - S(-beta,-alpha) else { v = -S(-alpha-1,-alpha) if (v>alpha) && (v<beta) v = -S(-beta,-alpha) } }
Code: Select all
for all moves { v = -S(-beta,-alpha) }
I'd like to know the answer to that one. Also, I agree that a 1992-96
version of Crafty test could be very enlightening. IIRC, Crafty came out
in 1995?
Something I did last summer, I stripped out all the eval except
raw material value. Based on my tests (much fewer than yours),
the new version of Telepath dropped 500 rating points.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Hardware vs Software
That test is not so easy to run. There are several places where Search() is recursively called, which means a significant number of changes to get rid of the PVS code is needed. However, I can run the evaluation test and will queue that up right now behind two other tests I have running. More on this one tomorrow morning...CRoberson wrote:I agree that it doesn't fit the timeline we've been discussing, butbob wrote:I could do that but why? PVS existed in 1978 and was used in (at the time, Blitz, suggested by Murray Campbell). Ken Thompson used PVS in 1980 belle chess hardware. So that is 30 years old... I could tell you about the first program to ever use that in an ACM event, quite by accident, if you are interested...CRoberson wrote:Here is what I meant by PV verification:
Change that to:Code: Select all
for all moves { if first move v = - S(-beta,-alpha) else { v = -S(-alpha-1,-alpha) if (v>alpha) && (v<beta) v = -S(-beta,-alpha) } }
Code: Select all
for all moves { v = -S(-beta,-alpha) }
I'd like to know the answer to that one. Also, I agree that a 1992-96
version of Crafty test could be very enlightening. IIRC, Crafty came out
in 1995?
Something I did last summer, I stripped out all the eval except
raw material value. Based on my tests (much fewer than yours),
the new version of Telepath dropped 500 rating points.
-
- Posts: 154
- Joined: Thu May 31, 2007 9:05 pm
- Location: Madrid, Spain
Re: Hardware vs Software - test results
If you find any old versions, could you also upload them to your ftp box if you don't mind?
Thanks
Thanks
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Hardware vs Software
I ran this overnight. I simply made Evaluate() return the material score only. It was almost exactly a 400 point drop in Elo from the version with the most recent evaluation.
Code: Select all
Crafty-22.9R01 2650 5 5 31128 51% 2644 21%
Crafty-22.9R02 2261 5 6 31128 9% 2644 7%
-
- Posts: 10800
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Hardware vs Software
I am surprised because I expected bigger difference.bob wrote:I ran this overnight. I simply made Evaluate() return the material score only. It was almost exactly a 400 point drop in Elo from the version with the most recent evaluation.
Code: Select all
Crafty-22.9R01 2650 5 5 31128 51% 2644 21% Crafty-22.9R02 2261 5 6 31128 9% 2644 7%
I could expect something like this from piece square table evaluation(no pawn structure mobility or king safety) but I believe that the difference between material only evaluation and normal evaluation is something like 1000 elo and not in the order of 400 elo.
It may be interesting to see some games that Crafty(only material) could win.
Uri
-
- Posts: 28354
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Hardware vs Software
It is amazing how much you can recoup from that by not much more than a little bit of piece-square evaluation, as I learned when building micro-Max. A weak attraction to the center for the light pieces and King, discouragements of King moves in the middle game (which you could implement as a PST initialized in the root, where the King gets a bonus for being where it is in the root), and a hefty bonus for Pawns on 6th or 7th rank already leads to quite reasonable play. Reasonable Pawn structure is maintained by penalizing Pawn moves when a Pawn 2 files left or right is missing. (This is basically the only non-PST-like term.)CRoberson wrote:Something I did last summer, I stripped out all the eval except raw material value. Based on my tests (much fewer than yours), the new version of Telepath dropped 500 rating points.
Tuning the piece values is quite critical, in such a material-dominated eval: it is very important that N> 3P, B > N and B+N > R+P. Having any of these reversed, or even equal, loses a lot of games.
-
- Posts: 313
- Joined: Wed Mar 08, 2006 8:18 pm
Re: Hardware vs Software
Interesting testings.... I hope you continue to run tests to determine the aprox. value of different chess ideas. Something like this is really worth doing and publishing.bob wrote:I ran this overnight. I simply made Evaluate() return the material score only. It was almost exactly a 400 point drop in Elo from the version with the most recent evaluation.
Code: Select all
Crafty-22.9R01 2650 5 5 31128 51% 2644 21% Crafty-22.9R02 2261 5 6 31128 9% 2644 7%
Could you please try Don Beal's random mobility estimator thing? Ever since you mentioned it I've been really curious as to how well that would reall work, compared to just material and / or some other simple & quick mobility scoring method.
Anyway, what were the time conditions for this and the hardware used?
And do you think these results would carry over to slower games?
Carey
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Hardware vs Software
CRoberson wrote:While at the 2008 ACCA Pan American Computer Chess Championships,
Bob claimed he didn't believe software played a serious role in all the
rating improvements we've seen. He thought hardware deserved the
credit (assuming I understood the statement correctly. We were jumping
across several subjects and back that night.).
I beleive software has had much to do with it for several reasons.
I will start with one. The EBF with only MiniMax is 40. With Alpha-Beta
pruning, it drops to 6. In the early 1990's, the EBF was 4. Now, it is 2.
Dropping the EBF from 2 to 4 is huge. Lets look at a 20 ply search.
The speedup of EBF=2 vs EBF=4 is:
4^20/2^20 = 2^20 = 1,048,576
So, that is over a 1 million x speed up. Has hardware produced that much
since 1992?
Also, I believe eval improvements have caused an improvement in
rating scores.
An example of nonhardware improvements is on the SSDF rating list.
Rybka 1.0 beta score 2775 on a 450 MHz AMD.
During the last 12 years or so Hardware improvement was about 400 Elo points, Software improvement 600-700 points (Rybka). I bet R.Hyatt is right in his statement if talking about Crafty.
Kai
-
- Posts: 13447
- Joined: Wed Mar 08, 2006 9:02 pm
- Location: Dallas, Texas
- Full name: Matthew Hull
Re: Hardware vs Software
A more harmonious (and hard to find) tuning combination of commonly known elements is no more a software improvement than a more pleasing (and hard to find) combination of dials on a Moog is a synthesizer improvement.Laskos wrote:CRoberson wrote:While at the 2008 ACCA Pan American Computer Chess Championships,
Bob claimed he didn't believe software played a serious role in all the
rating improvements we've seen. He thought hardware deserved the
credit (assuming I understood the statement correctly. We were jumping
across several subjects and back that night.).
I beleive software has had much to do with it for several reasons.
I will start with one. The EBF with only MiniMax is 40. With Alpha-Beta
pruning, it drops to 6. In the early 1990's, the EBF was 4. Now, it is 2.
Dropping the EBF from 2 to 4 is huge. Lets look at a 20 ply search.
The speedup of EBF=2 vs EBF=4 is:
4^20/2^20 = 2^20 = 1,048,576
So, that is over a 1 million x speed up. Has hardware produced that much
since 1992?
Also, I believe eval improvements have caused an improvement in
rating scores.
An example of nonhardware improvements is on the SSDF rating list.
Rybka 1.0 beta score 2775 on a 450 MHz AMD.
During the last 12 years or so Hardware improvement was about 400 Elo points, Software improvement 600-700 points (Rybka). I bet R.Hyatt is right in his statement if talking about Crafty.
Kai
Matthew Hull